Visual spatial attention

Visual spatial attention is a form of visual attention that involves directing attention to a location in space. Similar to its temporal counterpart visual temporal attention, these attention modules have been widely implemented in video analytics in computer vision to provide enhanced performance and human interpretable explanation[1][2][3] of deep learning models.

Spatial attention allows humans to selectively process visual information through prioritization of an area within the visual field. A region of space within the visual field is selected for attention and the information within this region then receives further processing. Research shows that when spatial attention is evoked, an observer is typically faster and more accurate at detecting a target that appears in an expected location compared to an unexpected location.[4] Attention is guided even more quickly to unexpected locations, when these locations are made salient by external visual inputs (such as a sudden flash). According to the V1 Saliency Hypothesis, the human primary visual cortex plays a critical role for such an exogenous attentional guidance.[5]

Spatial attention is distinctive from other forms of visual attention such as object-based attention and feature-based attention.[6] These other forms of visual attention select an entire object or a specific feature of an object regardless of its location, whereas spatial attention selects a specific region of space and the objects and features within that region are processed.

Measures of visual spatial attentionEdit

Spatial cueing experimentsEdit

A key property of visual attention is that attention can be selected based on spatial location and spatial cueing experiments have been used to assess this type of selection. In Posner's cueing paradigm,[4] the task was to detect a target that could be presented in one of two locations and respond as quickly as possible. At the start of each trial, a cue is presented that either indicates the location of the target (valid cue) or indicates the incorrect location thus misdirecting the observer (invalid cue). In addition, on some trials there is no information given about the location of the target, as no cue is presented (neutral trials). Two distinct cues were used; the cue was either a peripheral ‘flicker’ around the target's location (peripheral cue) or the cue was centrally displayed as a symbol, such as an arrow pointing to the location of the target (central cue). Observers are faster and more accurate at detecting and recognising a target if the location of the target is known in advance.[4][7] Furthermore, misinforming subjects about the location of the target leads to slower reaction times and poorer accuracy relative to performance when no information about the location of the target is given.[4][7]

Spatial cueing tasks typically assess covert spatial attention, which refers to attention that can change spatially without any accompanying eye movements. To investigate covert attention, it is necessary to ensure that observer's eyes remain fixated at one location throughout the task. In spatial cueing tasks, subjects are instructed to fixate on a central fixation point. Typically it takes 200 ms to make a saccadic eye movement to a location.[8] Therefore, the combined duration of the cue and target is typically presented in less than 200 ms. This ensures that covert spatial attention is being measured and the effects are not due to overt eye movements. Some studies specifically monitor eye movements to ensure that the observer's eyes are continually fixated on the central fixation point.[9]

The central and peripheral cues in spatial cueing experiments can assess the orienting of covert spatial attention. These two cues appear to use different mechanisms for orienting spatial attention. The peripheral cues tend to attract attention automatically, recruiting bottom-up attentional control processes. Conversely, central cues are thought to be under voluntary control and therefore use top-down processes.[10] Studies have shown that peripheral cues are difficult to ignore, as attention is oriented towards the peripheral cue even when the observer knows the cue does not predict the location of the target.[7] Peripheral cues also cause an allocation of attention much faster than central cues, as central cues require greater processing time to interpret the cue.[10]

Spatial probe experimentsEdit

In spatial cueing tasks, the spatial probe (cue) causes an allocation of attention to a particular location. Spatial probes have also been often used in other types of tasks to assess how spatial attention is allocated.

Spatial probes have been used to assess spatial attention in visual searches. Visual search tasks involve the detection of a target among a set of distractors. Attention to the location of items in the search can be used to guide visual searches. This was demonstrated by valid cues improving the identification of targets relative to the invalid and neutral conditions.[11] A visual search display can also influence how fast an observer responds to a spatial probe. In a visual search task, a small dot appeared after a visual display and it was found that observers were faster at detecting the dot when it was located at the same location as the target.[12] This demonstrated that spatial attention had been allocated to the target location.

The use of multiple tasks simultaneously in an experiment can also demonstrate the generality of spatial attention, as allocation of attention to one task can influence performance in other tasks.[13][14] For example, it was found that when attention was allocated to detecting a flickering dot (spatial probe), this increased the likelihood of identifying nearby letters.[14]

Distribution of spatial attentionEdit

The distribution of spatial attention has been subject to considerable research. Consequently, this has led to the development of different metaphors and models that represent the proposed spatial distribution of attention.

Spotlight metaphorEdit

According to the ‘spotlight’ metaphor, the focus of attention is analogous to the beam of a spotlight.[15] The moveable spotlight is directed at one location and everything within its beam is attended and processed preferentially, while information outside the beam is unattended. This suggests that the focus of visual attention is limited in spatial size and moves to process other areas in the visual field.

Zoom-lens metaphorEdit

Research has suggested that the attentional focus is variable in size.[16] Eriksen and St James[17] proposed the ‘zoom-lens’ metaphor, which is an alternative to the spotlight metaphor and takes into account the variable nature of attention. This account likens the distribution of attention to a zoom-lens that can narrow or widen the focus of attention. This supports findings that show attention can be distributed both over a large area of the visual field and also function in a focused mode.[18] In support of this analogy, research has shown that there is an inverse relationship between the size of the attentional focus and the efficiency of processing within the boundaries of a zoom-lens.[19]

Gradient modelEdit

The Gradient Model is an alternative theory on the distribution of spatial attention. This model proposes that attentional resources are allocated in a gradient pattern, with concentrated resources in the centre of focus and resources decrease in a continuous fashion away from the centre.[20] Downing[9] conducted research using an adaptation of Posner's cueing paradigm that supported this model. The target could appear in 12 potential locations, marked by boxes. Results showed that attentional facilitation was strongest at the cued location and gradually decreased with distance away from the cued location. However, not all research has supported the gradient model. For example, Hughes and Zimba [21] conducted a similar experiment, using a highly distributed visual array and did not use boxes to mark the potential locations of the target. There was no evidence of a gradient effect, as the faster responses were when the cue and target were in the same hemifield and slower responses when they were in different hemifields. The boxes played an important role in attention as a later experiment, used the boxes and consequently found a gradient pattern.[22] Therefore, it is considered that the size of the gradient can adjust according to the circumstances. A broader gradient may be adopted when there is an empty display, as attention can spread and is only restricted by hemifield borders.

Splitting spatial attentionEdit

It is debated in research on visual spatial attention whether it is possible to split attention across different areas in the visual field. The ‘spotlight’ and ‘zoom-lens’ accounts postulate that attention uses a single unitary focus. Therefore, spatial attention can only be allocated to adjacent areas in the visual field and consequently cannot be split. This was supported by an experiment that altered the spatial cueing paradigm by using two cues, a primary and a secondary cue. It was found that the secondary cue was only effective in focusing attention when its location was adjacent to the primary cue.[15] In addition, it has been demonstrated that observers are unable to ignore stimuli presented in areas situated between two cued locations.[23] These findings have proposed that attention cannot be split across two non-contiguous regions. However, other studies have demonstrated that spatial attention can be split across two locations. For example, observers were able to attend simultaneously to two different targets located in opposite hemifields.[19] Research has even suggested that humans are able to focus attention across two to four locations in the visual field.[24] Another perspective is that spatial attention can be split only under certain conditions. This perspective suggests that the splitting of spatial attention is flexible. Research demonstrated that whether spatial attention is unitary or divided depends on the goals of the task.[25] Therefore, if dividing attention is beneficial to the observer then a divided focus of attention will be utilised.

One of the main difficulties in establishing whether spatial attention can be divided is that a unitary focus model of attention can also explain a number of the findings. For example, when two non-contiguous locations are attended to, it may not be that attention has been split between these two locations but instead it may be that the unitary focus of attention has expanded.[24] Alternatively, the two locations may not be attended to simultaneously and instead the area of focus is moving quickly from one location to another.[26] Consequently, it appears very difficult to prove undoubtedly that spatial attention can be split.

Deficits in visual spatial attentionEdit


Hemineglect [1], also known as unilateral visual neglect, attentional neglect, hemispatial neglect or spatial neglect, is a disorder incorporating a significant deficit in visuospatial attention. Hemineglect refers to the inability of patients with unilateral brain damage to detect objects in the side of space contralateral to the lesion (contralesional); i.e. damage to the right cerebral hemisphere resulting in neglect of objects on the left side of space,[27] and is characterized by hemispheric asymmetry. Performance is generally preserved in the side ipsilateral to the lesion (ipsilesional).[27] Hemineglect is more frequent and arguably more severe following damage to the right cerebral hemisphere of right-handed subjects.[27] It has been proposed that the right parietal lobes are comparatively more responsible for the allocation of spatial attention, therefore damage to this hemisphere often produces more severe effects.[28] Additionally, it is difficult to map with accuracy the visual sensory deficits in the neglected hemifield.

Neglect is diagnosed using a variety of paper-and-pencil tasks. A common method is the Complex Figure Test (CFT). The CFT requires patients to copy a complicated line drawing, and then reproduce it from memory. Often patients will neglect features present on the contralesional side of space and objects. Patients with neglect will perform similarly when reproducing mental images of familiar places and objects. A common error is the failure to include numbers on the left side of a picture when drawing an analogue clock from memory, for example, all of the numbers may be positioned on the right side of the clock face.[10]

Another paper-and-pencil task is the line bisection task. In this exercise, patients are required to divide a horizontal line halfway along. Patients with neglect will often bisect the line to the right of the true centre, leaving the left portion of the line unattended to.[27]

Object cancellation tasks are also used to determine the extent of potential deficit. During this task, patients are required to cancel out (cross out) all of the objects in a cluttered display (e.g. lines, geometric shapes, letters, etc.).[10] Patients with damage primarily to the right parietal area fail in the detection of objects in the left visuospatial field, and these are often not crossed out by the patient. In addition, those patients who may be severely affected tend to fail in detecting their errors on visual inspection.


Extinction is a phenomenon observable during double simultaneous stimulation of both left and right visual fields. Patients with extinction will fail to perceive the stimulus in the contralesional visual field when presented in conjunction with a stimulus in the ipsilesional field.[10] However, when presented on its own, patients can correctly perceive the contralesional stimulus. Thus, patients with neglect fail to report stimuli present in the aberrant field, whereas patients with extinction fail to report stimuli in the aberrant field only when double simultaneous presentations occur in both hemifields.[10] Analogous to neglect, extinction affects the contralesional visuospatial field in majority of patients with unilateral damage.[27] Anatomical correlates of visuospatial neglect and extinction do not overlap absolutely, with extinction proposed to be associated with subcortical lesions.[27]

A common method in quick detection of visuospatial extinction is a Finger Confrontation Model. Utilized as standard bedside evaluation, the task requires the patient to indicate (either verbally or by pointing) in which visual field the doctor's hand or finger is moving, while the doctor makes a wiggling motion with his index.[10] This enables the doctor to distinguish between deficits resembling neglect and those which may indicate extinction, by presenting either a single stimulus in the contralesional field or two simultaneous stimuli in both the contralesional and ipsilesional visual fields. This quick test can be used immediately in a hospital setting for quick diagnosis, and can be particularly useful following strokes and seizures.

Regions associated with impairment of visuospatial attentionEdit

Parietal damageEdit

The posterior parietal region is arguably the most extensively studied in relation to visuospatial attention. Patients with parietal lobe damage most often fail to attend to stimuli located on the contralesional hemisphere, as seen in patients with hemineglect/unilateral visual neglect.[10] As such, they may fail to acknowledge a person sitting to their left, they may neglect to eat food positioned on their left, or make head or eye movements to the left.[10] Computed tomography (CT) studies have demonstrated that the inferior parietal lobule in the right hemisphere is the most frequently damaged in patients with severe neglect.[29]

Parietal damage may decrease the ability to reduce decision noise.[10] Spatial cues appear to reduce the uncertainty of a visuospatial decision. Disruption to spatial orienting, as seen in hemineglect, suggests that patients with damage to the parietal region may experience an increased difficulty in decision-making regarding targets located in the contralesional field.[10]

Damage to the parietal region may also increase illusory conjunctions of features. Illusory conjunctions occur when people report combinations of features which did not occur.[28] For example, when presented with an orange square and a purple circle, the participant may report a purple square or an orange circle. Although it would typically require special circumstances for a non-impaired person to produce an illusory conjunction, it appears that some patients with damage to the parietal cortex may demonstrate a vulnerability to such visuospatial impairments.[27] Results from parietal patients suggest that the parietal cortex, and therefore spatial attention, may be implicated in solving this problem of binding features.[10]

Frontal lobe damageEdit

Lesions to the frontal cortices have long been known to precede spatial neglect and other visuospatial deficits. Specifically, frontal lobe damage has been associated with a deficit in the control of over attention (the production of eye movements). Lesions to the superior frontal lobe areas that include the frontal eye fields seem to disrupt some forms of overt eye movements.[10] It has been demonstrated by Guitton, Buchtel, & Douglas[30] that eye movement directed away from an abruptly appearing visual target (“antisaccade”) is remarkably impaired in patients with damage to the frontal eye fields, who frequently made reflexive eye movements to the target. When frontal eye field patients did make antisaccades, they had increased latency of their eye movements compared to controls. This suggests that the frontal lobes, specifically the dorsolateral region containing the frontal eye fields, play an inhibitory role in preventing reflexive eye movements in overt attention control.[30] Further, the frontal eye fields or surrounding areas may be critically associated with neglect following dorsolateral frontal lesions.[29]

Frontal lobe lesions also appear to produce deficits in visuospatial attention related to covert attention (the orienting of attention without the requirement eye movement). Using Posner's Spatial Cueing Task, Alivesatos and Milner (1989; see [10]) found that participants with frontal lobe damage demonstrated a comparably smaller attentional benefit from the valid cues than control participants or participants with temporal lobe damage. Voluntary orienting of frontal lobe patients appear to be impaired.

The right lateral frontal lobe region was also found to be associated with left-sided visual neglect in an investigation carried out by Husain & Kennard.[29] A region of overlap was found in the location of lesions in four of five patients with left-sided visual neglect, specifically the dorsal aspect of the inferior frontal gyrus and the underlying white matter. Additionally, overlap of lesion areas was also detected in the dorsal region of Brodmann area 44 (anterior to the premotor cortex). These results further implicate the frontal lobe in directing attention in visual space.

Thalamic nuclei damage (pulvinar nucleus)Edit

The thalamic nuclei have been speculated to be involved in directing attention to locations in visual space.[31] Specifically, the pulvinar nucleus appears to be implicated in the subcortical control of spatial attention, and lesions in this area can cause neglect.[10] Evidence[31] suggests that the pulvinar nucleus of the thalamus might be responsible for engaging in spatial attention at a previously cued location. A study by Rafal and Posner[31] found that patients who had acute pulvinar lesions were slower to detect a target which appeared in the contralesional visuospatial field compared to the appearance of a target in the ipsilesional field during a spatial cuing task. This suggests a deficit in the ability to use attention to improve performance in detection and processing of visual targets in the contralesional region.[31]

Use in camouflageEdit

Camouflage relies on deceiving the cognition of the observer, such as a predator. Some camouflage mechanisms such as distractive markings likely function by competing for visual attention with stimuli that would give away the presence of the camouflaged object (such as a prey animal). Such markings have to be conspicuous, and positioned away from the outline so as to avoid drawing attention to it, in contrast to disruptive markings which work best when in contact with the outline.[32]

See alsoEdit


  1. ^ "NIPS 2017". Interpretable ML Symposium. 2017-10-20. Retrieved 2018-09-12.
  2. ^ Zang, Jinliang; Wang, Le; Liu, Ziyi; Zhang, Qilin; Hua, Gang; Zheng, Nanning (2018). "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition". IFIP Advances in Information and Communication Technology. Cham: Springer International Publishing. pp. 97–108. arXiv:1803.07179. doi:10.1007/978-3-319-92007-8_9. ISBN 978-3-319-92006-1. ISSN 1868-4238. S2CID 4058889.
  3. ^ Wang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-06-21). "Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network" (PDF). Sensors. 18 (7): 1979. Bibcode:2018Senso..18.1979W. doi:10.3390/s18071979. ISSN 1424-8220. PMC 6069475. PMID 29933555.
  4. ^ a b c d Posner, M. I. (1980). "Orienting of attention" (PDF). Quarterly Journal of Experimental Psychology. 32 (1): 3–25. doi:10.1080/00335558008248231. PMID 7367577. S2CID 2842391.
  5. ^ Li. Z. 2002 A saliency map in primary visual cortex Trends in Cognitive Sciences vol. 6, Pages 9-16, and Zhaoping, L. 2014, The V1 hypothesis—creating a bottom-up saliency map for preattentive selection and segmentation in the book Understanding Vision: Theory, Models, and Data
  6. ^ Tootell, R. B., Hadjikhani, N., Hall, E. K., Marrett, S., Vanduffel, W., Vaughan, J. T., & Dale, A. M. (1998). "The retinotopy of visual spatial attention" (PDF). Neuron. 21 (6): 1409–1422. doi:10.1016/S0896-6273(00)80659-5. PMID 9883733. S2CID 6336492.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  7. ^ a b c Jonides, J. (1981). Voluntary versus automatic control over the mind's eye's movement (PDF). Hillsdale (NJ): Erlbaum. pp. 187–203.
  8. ^ Carpenter, R. H. S. (1988). Movements of the eyes (2nd rev. & enlarged ed.). London, England: Pion Limited.
  9. ^ a b Downing, C. J. (1988). "Expectancy and visual-spatial attention: Effects on perceptual quality". Journal of Experimental Psychology: Human Perception and Performance. 14 (2): 188–202. doi:10.1037/0096-1523.14.2.188. PMID 2967876.
  10. ^ a b c d e f g h i j k l m n o Vecera, S. P., & Rizzo, M. (2003). "Spatial attention: normal processes and their breakdown" (PDF). Neurologic Clinics of North America. 21 (3): 575–607. doi:10.1016/S0733-8619(02)00103-2. PMID 13677814.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  11. ^ Prinzmetal, M., Presti, D. E., Posner, M. I. (1986). "Does attention affect visual feature integration?". Journal of Experimental Psychology: Human Perception and Performance. 12 (3): 361–369. CiteSeerX doi:10.1037/0096-1523.12.3.361. PMID 2943864.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  12. ^ Kim, M. S., Cave, K. R. (1995). "Spatial attention in visual search for features and feature conjunctions" (PDF). Psychological Science. 6 (6): 376–380. doi:10.1111/j.1467-9280.1995.tb00529.x. S2CID 35789409.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  13. ^ Hoffman, J. E., & Nelson, B. (1981). "Spatial selectivity in visual search". Perception & Psychophysics. 30 (3): 283–290. doi:10.3758/BF03214284. PMID 7322804.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  14. ^ a b Hoffman, James E.; Nelson, Billie; Houck, Michael R. (1983). "The role of attentional resources in automatic detection". Cognitive Psychology. 15 (3): 379–410. doi:10.1016/0010-0285(83)90013-0. PMID 6627907. S2CID 45492061.
  15. ^ a b Posner, Michael I.; Snyder, Charles R.; Davidson, Brian J. (1980). "Attention and the detection of signals" (PDF). Journal of Experimental Psychology: General. 109 (2): 160–174. CiteSeerX doi:10.1037/0096-3445.109.2.160. PMID 7381367.
  16. ^ Cave, Kyle R.; Bichot, Narcisse P. (1999). "Visuospatial attention: Beyond a spotlight model" (PDF). Psychonomic Bulletin & Review. 6 (2): 204–223. doi:10.3758/BF03212327. PMID 12199208. S2CID 1089770.
  17. ^ Eriksen, Charles W.; St. James, James D. (October 1986). "Visual attention within and around the field of focal attention: A zoom lens model". Perception & Psychophysics. 40 (4): 225–240. doi:10.3758/BF03211502. PMID 3786090.
  18. ^ Barriopedro, Maria I.; Botella, Juan (1998). "New evidence for the zoom lens model using the RSVP technique". Perception & Psychophysics. 60 (8): 1406–1414. doi:10.3758/BF03208001. PMID 9865080.
  19. ^ a b Castiello, Umberto; Umiltà, Carlo (1992). "Splitting focal attention". Journal of Experimental Psychology: Human Perception and Performance. 18 (3): 837–848. doi:10.1037/0096-1523.18.3.837. PMID 1500879.
  20. ^ LaBerge, David; Brown, Vincent (1989). "Theory of attentional operations in shape identification" (PDF). Psychological Review. 96 (1): 101–124. CiteSeerX doi:10.1037/0033-295X.96.1.101.
  21. ^ Hughes, Howard C.; Zimba, Lynn D. (1985). "Spatial maps of directed visual attention". Journal of Experimental Psychology: Human Perception and Performance. 11 (4): 409–430. doi:10.1037/0096-1523.11.4.409. PMID 3161984.
  22. ^ Hughes, H.C.; Zimba, L.D. (1987). "Natural boundaries for the spatial spread of directed visual attention". Neuropsychologia. 25 (1): 5–18. doi:10.1016/0028-3932(87)90039-X. PMID 3574650. S2CID 31298971.
  23. ^ Pan, K.; Eriksen, C. W. (1993). "Attentional distribution in the visual field during same-different judgments as assessed by response competition". Perception & Psychophysics. 53 (2): 134–144. CiteSeerX doi:10.3758/bf03211723. PMID 8433911. S2CID 13506368.
  24. ^ a b Awh, Edward; Pashler, Harold (2000). "Evidence for split attentional foci". Journal of Experimental Psychology: Human Perception and Performance. 26 (2): 834–846. CiteSeerX doi:10.1037/0096-1523.26.2.834. PMID 10811179.
  25. ^ Jefferies, Lisa N.; Enns, James T.; Di Lollo, Vincent (2014). "The flexible focus: Whether spatial attention is unitary or divided depends on observer goals". Journal of Experimental Psychology: Human Perception and Performance. 40 (2): 465–470. doi:10.1037/a0034734. hdl:10072/173492. PMID 24188402.
  26. ^ Jans, Bert; Peters, Judith C.; De Weerd, Peter (2010). "Visual spatial attention to multiple locations at once: The jury is still out". Psychological Review. 117 (2): 637–682. doi:10.1037/a0019082. PMID 20438241.
  27. ^ a b c d e f g Vallar, G (1998). "Spatial hemineglect in humans". Trends in Cognitive Sciences. 2 (3): 87–97. doi:10.1016/S1364-6613(98)01145-0. PMID 21227084. S2CID 15366153.
  28. ^ a b Anderson, J (2010). Cognitive Psychology and Its Implications. New York: Worth Publishers. p. 7.
  29. ^ a b c Husain, M; Kennard, C (1996). "Visual neglect associated with frontal lobe infarction". Journal of Neurology. 243 (9): 652–657. doi:10.1007/BF00878662. PMID 8892067. S2CID 11280313.
  30. ^ a b Guitton, D; Buchtel, H; Douglas, R (1985). "Frontal lobe lesions in man cause difficulties in suppressing reflexive glances and in generating goal-directed saccades". Experimental Brain Research. 58 (3): 455–472. doi:10.1007/BF00235863. hdl:2027.42/46554. PMID 4007089. S2CID 10551663.
  31. ^ a b c d Rafal, R; Posner, M (1987). "Deficits in human visual spatial attention following thalamic lesions". Proceedings of the National Academy of Sciences of the United States of America. 84 (20): 7349–7353. Bibcode:1987PNAS...84.7349R. doi:10.1073/pnas.84.20.7349. PMC 299290. PMID 3478697.
  32. ^ Dimitrova, M.; Stobbe, N.; Schaefer, H. M.; Merilaita, S. (2009). "Concealed by conspicuousness: distractive prey markings and backgrounds". Proceedings of the Royal Society B: Biological Sciences. 276 (1663): 1905–1910. doi:10.1098/rspb.2009.0052. PMC 2674505. PMID 19324754.