A visual routine is a means of extracting information from a visual scene.

Shimon Ullman, in his studies on human visual cognition, proposed that the human visual system's task of perceiving shape properties and spatial relations is split into two successive stages: an early "bottom-up" state during which base representations are generated from the visual input, and a later "top-down" stage during which high-level primitives dubbed "visual routines" extract the desired information from the base representations.[1] In humans, the base representations generated during the bottom-up stage correspond to retinotopic maps (more than 15 of which exist in the cortex) for properties like color, edge orientation, speed of motion, and direction of motion. These base representations rely on fixed operations performed uniformly over the entire field of visual input, and do not make use of object-specific knowledge, task-specific knowledge, or other higher-level information.[2]

The visual routines proposed by Ullman are high-level primitives which parse the structure of a scene, extracting spatial information from the base representations. These visual routines are composed of a sequence of elementary visual operators specific to the task at hand. Visual routines differ from the fixed operations of the base representations in that they are not applied uniformly over the entire visual field --- rather, they are only applied to objects or areas specified by the routines.[1]

Ullman lists the following as examples of visual operators: shifting the processing focus, indexing a salient item for further processing, spreading activation over an area delimited by boundaries, tracing boundaries, and marking a location or object for future reference. When combined into visual routines, these elementary operators can be used to perform relatively sophisticated spatial tasks such as counting the number of objects satisfying a certain property, or recognizing a complex shape.[1]

A number of researchers have implemented visual routines for processing camera images, to perform tasks like determining the object a human in the camera image is pointing at.[3][4][5] Researchers have also applied the visual routines approach to artificial map representations, for playing real-time 2D video games. In those cases, however, the map of the video game was provided directly, alleviating the need to deal with real-world perceptual tasks like object recognition and occlusion compensation.

References edit

  1. ^ a b c "Ullman's Visual Routines, and Tekkotsu Sketches" (PDF).
  2. ^ Huang, J.; Wechsler, H. (April 2000). "Visual routines for eye location using learning and evolution". IEEE Transactions on Evolutionary Computation. 4 (1): 73–82. doi:10.1109/4235.843496. ISSN 1089-778X.
  3. ^ Johnson, M. P. (August 1996). "Automated creation of visual routines using genetic programming". Proceedings of 13th International Conference on Pattern Recognition. Vol. 1. pp. 951–956 vol.1. doi:10.1109/ICPR.1996.546164. ISBN 978-0-8186-7282-8. S2CID 1701864.
  4. ^ Aste, Marco; Rossi, Massimo; Cattoni, Roldano; Caprile, Bruno (1998-06-01). "Visual routines for real-time monitoring of vehicle behavior". Machine Vision and Applications. 11 (1): 16–23. CiteSeerX 10.1.1.48.5736. doi:10.1007/s001380050086. ISSN 0932-8092. S2CID 25480778.
  5. ^ Rao, Satyajit. "Visual Routines and Attention" (PDF). MIT Computer Science and Artificial Intelligence Laboratory.