Open main menu

Wikipedia β

A child being sensed by a simple gesture recognition algorithm detecting hand location and movement
Gesture recognition is usually processed in middleware, the results are transmitted to the user applications.

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current focuses in the field include emotion recognition from face and hand gesture recognition. Users can use simple gestures to control or interact with devices without physically touching them. Many approaches have been made using cameras and computer vision algorithms to interpret sign language. However, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques.[1] Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a richer bridge between machines and humans than primitive text user interfaces or even GUIs (graphical user interfaces), which still limit the majority of input to keyboard and mouse.

Gesture recognition enables humans to communicate with the machine (HMI) and interact naturally without any mechanical devices. Using the concept of gesture recognition, it is possible to point a finger at the computer screen so that the cursor will move accordingly. This could make conventional input devices such as mouse, keyboards and even touch-screens redundant.

Gesture Recognition Features:

  • More Accurate
  • High Stability
  • Time saving to unlock a device

The major application areas of gesture recognition in the current scenario are:

  • Automotive sector
  • Consumer electronics sector
  • Transit sector
  • Gaming sector
  • To unlock smartphones

Gesture recognition technology has been considered to be the highly successful technology as it saves time to unlock any device.

Gesture recognition can be conducted with techniques from computer vision and image processing.

The literature includes ongoing work in the computer vision field on capturing gestures or more general human pose and movements by cameras connected to a computer.[2][3][4][5]

Gesture recognition and pen computing: Pen computing reduces the hardware impact of a system and also increases the range of physical world objects usable for control beyond traditional digital objects like keyboards and mice. Such implementations could enable a new range of hardware that does not require monitors. This idea may lead to the creation of holographic display. The term gesture recognition has been used to refer more narrowly to non-text-input handwriting symbols, such as inking on a graphics tablet, multi-touch gestures, and mouse gesture recognition. This is computer interaction through the drawing of symbols with a pointing device cursor.[6][7][8] (see Pen computing)


Gesture typesEdit

In computer interfaces, two types of gestures are distinguished:[9] We consider online gestures, which can also be regarded as direct manipulations like scaling and rotating. In contrast, offline gestures are usually processed after the interaction is finished; e. g. a circle is drawn to activate a context menu.

  • Offline gestures: Those gestures that are processed after the user interaction with the object. An example is the gesture to activate a menu.
  • Online gestures: Direct manipulation gestures. They are used to scale or rotate a tangible object.

Touchless interfaceEdit

Touchless user interface is an emerging type of technology in relation to gesture control. Touchless user interface (TUI) is the process of commanding the computer via body motion and gestures without touching a keyboard, mouse, or screen[10]. For example, Microsoft's Kinect is a touchless game interface; however, products such as the Wii are not considered entirely touchless because they are tethered to controllers. Touchless interface in addition to gesture controls are becoming widely popular as they provide the abilities to interact with devices without physically touching them.

Types of touchless technologyEdit

There are a number of devices utilizing this type of interface such as, smartphones, laptops, games, and television. Although touchless technology is mostly seen in gaming software, interest is now spreading to other fields including, automotive and healthcare industries. Soon to come, touchless technology and gesture control will be implemented in cars in levels beyond voice recognition. See BMW Series 7.

Future of touchless technologyEdit

There are already a vast number of companies all over the world who are producing gesture recognition technology, such as: [11]

Intel Corp.Edit

White Paper: Explore Intel's user experience research, which shows how touchless multifactor authentication (MFA) can help healthcare organizations mitigate security risks while improving clinician efficiency, convenience, and patient care. This touchless MFA solution combines facial recognition and device recognition capabilities for two-factor user authentication.[12] Read here.

Microsoft Corp. in the U.S.Edit

The aim of the project then is to explore the use of touchless interaction within surgical settings, allowing images to be viewed, controlled and manipulated without contact through the use of camera-based gesture recognition technology. In particular, the project seeks to understand the challenges of these environments for the design and deployment of such systems, as well as articulate the ways in which these technologies may alter surgical practice. While our primary concerns here are with maintaining conditions of asepsis, the use of these touchless gesture-based technologies offers other potential uses.[13]

Elliptic LabsEdit

Elliptic Labs software suite delivers gesture and proximity functions by re-using the existing earpiece and microphone, previously used only for audio. Ultrasound signals sent through the air from speakers integrated in smartphones and tablets bounce against your hand/object/head and are recorded by microphones, also integrated in these devices. In this way, Elliptic Labs’ technology recognizes your hand gestures and uses them to move objects on a screen, similarly to the way bats use echolocation to navigate.[14]

While these companies stand at the forefront of touchless technology for the future in this time, there are many other companies and products that are currently trending as well and may also add value to this new field. Here are some of many examples:

Tobii Rex: eye-tracking device from Sweden

Airwriting: technology that allows messages and texts to be written in the air

eyeSight: allows for navigation of a screen without physically touching the device

Leap Motion: motion sensor device

Myoelectric Armband: allows for communication of bluetooth devices [15]

Input devicesEdit

The ability to track a person's movements and determine what gestures they may be performing can be achieved through various tools. The kinetic user interfaces (KUIs)[16] are an emerging type of user interfaces that allow users to interact with computing devices through the motion of objects and bodies. Examples of KUIs include tangible user interfaces and motion-aware games such as Wii and Microsoft's Kinect,and other interactive projects.[17]

Although there is a large amount of research done in image/video based gesture recognition, there is some variation within the tools and environments used between implementations.

  • Wired gloves. These can provide input to the computer about the position and rotation of the hands using magnetic or inertial tracking devices. Furthermore, some gloves can detect finger bending with a high degree of accuracy (5-10 degrees), or even provide haptic feedback to the user, which is a simulation of the sense of touch. The first commercially available hand-tracking glove-type device was the DataGlove,[18] a glove-type device which could detect hand position, movement and finger bending. This uses fiber optic cables running down the back of the hand. Light pulses are created and when the fingers are bent, light leaks through small cracks and the loss is registered, giving an approximation of the hand pose.
  • Depth-aware cameras. Using specialized cameras such as structured light or time-of-flight cameras, one can generate a depth map of what is being seen through the camera at a short range, and use this data to approximate a 3d representation of what is being seen. These can be effective for detection of hand gestures due to their short range capabilities.[19]
  • Stereo cameras. Using two cameras whose relations to one another are known, a 3d representation can be approximated by the output of the cameras. To get the cameras' relations, one can use a positioning reference such as a lexian-stripe or infrared emitters.[20] In combination with direct motion measurement (6D-Vision) gestures can directly be detected.
  • Gesture-based controllers. These controllers act as an extension of the body so that when gestures are performed, some of their motion can be conveniently captured by software. An example of emerging gesture-based motion capture is through skeletal hand tracking, which is being developed for virtual reality and augmented reality applications. An example of this technology is shown by tracking companies uSens and Gestigon, which allow users to interact with their surrounding without controllers.[21][22]

Another example of this is mouse gesture trackings, where the motion of the mouse is correlated to a symbol being drawn by a person's hand, as is the Wii Remote or the Myo armband or the mForce Wizard wristband, which can study changes in acceleration over time to represent gestures.[23][24][25] Devices such as the LG Electronics Magic Wand, the Loop and the Scoop use Hillcrest Labs' Freespace technology, which uses MEMS accelerometers, gyroscopes and other sensors to translate gestures into cursor movement. The software also compensates for human tremor and inadvertent movement.[26][27][28] AudioCubes are another example. The sensors of these smart light emitting cubes can be used to sense hands and fingers as well as other objects nearby, and can be used to process data. Most applications are in music and sound synthesis,[29] but can be applied to other fields.


Different ways of tracking and analyzing gestures exist, and some basic layout is given is in the diagram above. For example, volumetric models convey the necessary information required for an elaborate analysis, however they prove to be very intensive in terms of computational power and require further technological developments in order to be implemented for real-time analysis. On the other hand, appearance-based models are easier to process but usually lack the generality required for Human-Computer Interaction.

Depending on the type of the input data, the approach for interpreting a gesture could be done in different ways. However, most of the techniques rely on key pointers represented in a 3D coordinate system. Based on the relative motion of these, the gesture can be detected with a high accuracy, depending on the quality of the input and the algorithm’s approach.
In order to interpret movements of the body, one has to classify them according to common properties and the message the movements may express. For example, in sign language each gesture represents a word or phrase. The taxonomy that seems very appropriate for Human-Computer Interaction has been proposed by Quek in "Toward a Vision-Based Hand Gesture Interface".[30] He presents several interactive gesture systems in order to capture the whole space of the gestures:

  1. Manipulative
  2. Semaphoric
  3. Conversational

Some literature differentiates 2 different approaches in gesture recognition: a 3D model based and an appearance-based.[31] The foremost method makes use of 3D information of key elements of the body parts in order to obtain several important parameters, like palm position or joint angles. On the other hand, Appearance-based systems use images or videos for direct interpretation.

A real hand (left) is interpreted as a collection of vertices and lines in the 3D mesh version (right), and the software uses their relative position and interaction in order to infer the gesture.

3D model-based algorithmsEdit

The 3D model approach can use volumetric or skeletal models, or even a combination of the two. Volumetric approaches have been heavily used in computer animation industry and for computer vision purposes. The models are generally created from complicated 3D surfaces, like NURBS or polygon meshes.

The drawback of this method is that is very computational intensive, and systems for real time analysis are still to be developed. For the moment, a more interesting approach would be to map simple primitive objects to the person’s most important body parts ( for example cylinders for the arms and neck, sphere for the head) and analyse the way these interact with each other. Furthermore, some abstract structures like super-quadrics and generalised cylinders may be even more suitable for approximating the body parts. The exciting thing about this approach is that the parameters for these objects are quite simple. In order to better model the relation between these, we make use of constraints and hierarchies between our objects.

The skeletal version (right) is effectively modelling the hand (left). This has fewer parameters than the volumetric version and it's easier to compute, making it suitable for real-time gesture analysis systems.

Skeletal-based algorithmsEdit

Instead of using intensive processing of the 3D models and dealing with a lot of parameters, one can just use a simplified version of joint angle parameters along with segment lengths. This is known as a skeletal representation of the body, where a virtual skeleton of the person is computed and parts of the body are mapped to certain segments. The analysis here is done using the position and orientation of these segments and the relation between each one of them( for example the angle between the joints and the relative position or orientation)

Advantages of using skeletal models:

  • Algorithms are faster because only key parameters are analyzed.
  • Pattern matching against a template database is possible
  • Using key points allows the detection program to focus on the significant parts of the body
These binary silhouette(left) or contour(right) images represent typical input for appearance-based algorithms. They are compared with different hand templates and if they match, the correspondent gesture is inferred.

Appearance-based modelsEdit

These models don’t use a spatial representation of the body anymore, because they derive the parameters directly from the images or videos using a template database. Some are based on the deformable 2D templates of the human parts of the body, particularly hands. Deformable templates are sets of points on the outline of an object, used as interpolation nodes for the object’s outline approximation. One of the simplest interpolation function is linear, which performs an average shape from point sets, point variability parameters and external deformators. These template-based models are mostly used for hand-tracking, but could also be of use for simple gesture classification.

A second approach in gesture detecting using appearance-based models uses image sequences as gesture templates. Parameters for this method are either the images themselves, or certain features derived from these. Most of the time, only one ( monoscopic) or two ( stereoscopic ) views are used.


There are many challenges associated with the accuracy and usefulness of gesture recognition software. For image-based gesture recognition there are limitations on the equipment used and image noise. Images or video may not be under consistent lighting, or in the same location. Items in the background or distinct features of the users may make recognition more difficult.

The variety of implementations for image-based gesture recognition may also cause issue for viability of the technology to general usage. For example, an algorithm calibrated for one camera may not work for a different camera. The amount of background noise also causes tracking and recognition difficulties, especially when occlusions (partial and full) occur. Furthermore, the distance from the camera, and the camera's resolution and quality, also cause variations in recognition accuracy.

In order to capture human gestures by visual sensors, robust computer vision methods are also required, for example for hand tracking and hand posture recognition[32][33][34][35][36][37][38][39][40] or for capturing movements of the head, facial expressions or gaze direction.

"Gorilla arm"Edit

"Gorilla arm" was a side-effect of vertically oriented touch-screen or light-pen use. In periods of prolonged use, users' arms began to feel fatigue and/or discomfort. This effect contributed to the decline of touch-screen input despite initial popularity in the 1980s.[41][42]

In order to measure arm fatigue and the gorilla arm side effect, researchers developed a technique called Consumed Endurance.[43][44]

See alsoEdit


  1. ^ Matthias Rehm, Nikolaus Bee, Elisabeth André, Wave Like an Egyptian – Accelerometer Based Gesture Recognition for Culture Specific Interactions, British Computer Society, 2007
  2. ^ Pavlovic, V., Sharma, R. & Huang, T. (1997), "Visual interpretation of hand gestures for human-computer interaction: A review", IEEE Trans. Pattern Analysis and Machine Intelligence., July, 1997. Vol. 19(7), pp. 677 -695.
  3. ^ R. Cipolla and A. Pentland, Computer Vision for Human-Machine Interaction, Cambridge University Press, 1998, ISBN 978-0-521-62253-0
  4. ^ Ying Wu and Thomas S. Huang, "Vision-Based Gesture Recognition: A Review", In: Gesture-Based Communication in Human-Computer Interaction, Volume 1739 of Springer Lecture Notes in Computer Science, pages 103-115, 1999, ISBN 978-3-540-66935-7, doi:10.1007/3-540-46616-9
  5. ^ Alejandro Jaimes and Nicu Sebe, Multimodal human–computer interaction: A survey, Computer Vision and Image Understanding Volume 108, Issues 1-2, October–November 2007, Pages 116-134 Special Issue on Vision for Human-Computer Interaction, doi:10.1016/j.cviu.2006.10.019
  6. ^ Dopertchouk, Oleg; "Recognition of Handwriting Gestures",, January 9, 2004
  7. ^ Chen, Shijie; "Gesture Recognition Techniques in Handwriting Recognition Application", Frontiers in Handwriting Recognition p 142-147 November 2010
  8. ^ Balaji, R; Deepu, V; Madhvanath, Sriganesh; Prabhakaran, Jayasree "Handwritten Gesture Recognition for Gesture Keyboard", Hewlett-Packard Laboratories
  9. ^ Dietrich Kammer, Mandy Keck, Georg Freitag, Markus Wacker, Taxonomy and Overview of Multi-touch Frameworks: Architecture, Scope and Features
  10. ^ "touchless user interface Definition from PC Magazine Encyclopedia". Retrieved 2017-07-28. 
  11. ^ "Touchless Sensing Industry: Global Survey, Trends, Outlook, Overview and 2027 Forecast". Retrieved 2017-07-30. 
  12. ^ "" (PDF).  External link in |title= (help)
  13. ^ "Touchless Interaction in Medical Imaging - Microsoft Research". Microsoft Research. Retrieved 2017-07-30. 
  14. ^ "Technology | Elliptic Labs". Retrieved 2017-07-30. 
  15. ^ AK, Sofia. "9 Minority Report Inspired Touchless Technology". HKDC. Retrieved 2017-08-07. 
  16. ^ V. Pallotta; P. Bruegger; B. Hirsbrunner (February 2008). "Kinetic User Interfaces: Physical Embodied Interaction with Mobile Pervasive Computing Systems". Advances in Ubiquitous Computing: Future Paradigms and Directions. IGI Publishing. 
  17. ^ S. Benford; H. Schnadelbach; B. Koleva; B. Gaver; A. Schmidt; A. Boucher; A. Steed; R. Anastasi; C. Greenhalgh; T. Rodden; H. Gellersen. "Sensible, sensable and desirable: a framework for designing physical interfaces" (PDF). CiteSeerX . Archived from the original (PDF) on January 26, 2006. 
  18. ^ Thomas G. Zimmerman, Jaron Lanier, Chuck Blanchard, Steve Bryson and Young Harvill. "A HAND GESTURE INTERFACE DEVICE."
  19. ^ Yang Liu, Yunde Jia, A Robust Hand Tracking and Gesture Recognition Method for Wearable Visual Interfaces and Its Applications, Proceedings of the Third International Conference on Image and Graphics (ICIG’04), 2004
  20. ^ Kue-Bum Lee, Jung-Hyun Kim, Kwang-Seok Hong, An Implementation of Multi-Modal Game Interface Based on PDAs, Fifth International Conference on Software Engineering Research, Management and Applications, 2007
  21. ^ "Gestigon Gesture Tracking - TechCrunch Disrupt". TechCrunch. Retrieved 11 October 2016. 
  22. ^ Matney, Lucas. "uSens shows off new tracking sensors that aim to deliver richer experiences for mobile VR". TechCrunch. Retrieved 29 August 2016. 
  23. ^ Per Malmestig, Sofie Sundberg, SignWiiver – implementation of sign language technology
  24. ^ Thomas Schlomer, Benjamin Poppinga, Niels Henze, Susanne Boll, Gesture Recognition with a Wii Controller, Proceedings of the 2nd international Conference on Tangible and Embedded interaction, 2008
  25. ^ AiLive Inc., LiveMove White Paper, 2006
  26. ^ Electronic Design September 8, 2011. William Wong. Natural User Interface Employs Sensor Integration.
  27. ^ Cable & Satellite International September/October, 2011. Stephen Cousins. A view to a thrill.
  28. ^ TechJournal South January 7, 2008. Hillcrest Labs rings up $25M D round.
  29. ^ Percussa AudioCubes Blog October 4, 2012. Gestural Control in Sound Synthesis.
  30. ^ Quek, F., "Toward a vision-based hand gesture interface" Proceedings of the Virtual Reality System Technology Conference, pp. 17-29, August 23–26, 1994, Singapore
  31. ^ Vladimir I. Pavlovic, Rajeev Sharma, Thomas S. Huang, Visual Interpretation of Hand Gestures for Human-Computer Interaction; A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
  32. ^ Ivan Laptev and Tony Lindeberg "Tracking of Multi-state Hand Models Using Particle Filtering and a Hierarchy of Multi-scale Image Features", Proceedings Scale-Space and Morphology in Computer Vision, Volume 2106 of Springer Lecture Notes in Computer Science, pages 63-74, Vancouver, BC, 1999. ISBN 978-3-540-42317-1, doi:10.1007/3-540-47778-0
  33. ^ von Hardenberg, Christian; Bérard, François (2001). "Bare-hand human-computer interaction". Proceedings of the 2001 workshop on Perceptive user interfaces. ACM International Conference Proceeding Series. 15 archive. Orlando, Florida. pp. 1–8. CiteSeerX . 
  34. ^ Lars Bretzner, Ivan Laptev, Tony Lindeberg "Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering", Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, 21–21 May 2002, pages 423-428. ISBN 0-7695-1602-5, doi:10.1109/AFGR.2002.1004190
  35. ^ Domitilla Del Vecchio, Richard M. Murray Pietro Perona, "Decomposition of human motion into dynamics-based primitives with application to drawing tasks", Automatica Volume 39, Issue 12, December 2003, Pages 2085–2098 , doi:10.1016/S0005-1098(03)00250-4.
  36. ^ Thomas B. Moeslund and Lau Nørgaard, "A Brief Overview of Hand Gestures used in Wearable Human Computer Interfaces", Technical report: CVMT 03-02, ISSN 1601-3646, Laboratory of Computer Vision and Media Technology, Aalborg University, Denmark.
  37. ^ M. Kolsch and M. Turk "Fast 2D Hand Tracking with Flocks of Features and Multi-Cue Integration", CVPRW '04. Proceedings Computer Vision and Pattern Recognition Workshop, May 27-June 2, 2004, doi:10.1109/CVPR.2004.71
  38. ^ Xia Liu Fujimura, K., "Hand gesture recognition using depth data", Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, May 17–19, 2004 pages 529- 534, ISBN 0-7695-2122-3, doi:10.1109/AFGR.2004.1301587.
  39. ^ Stenger B, Thayananthan A, Torr PH, Cipolla R: "Model-based hand tracking using a hierarchical Bayesian filter", IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1372-84, Sep 2006.
  40. ^ A Erol, G Bebis, M Nicolescu, RD Boyle, X Twombly, "Vision-based hand pose estimation: A review", Computer Vision and Image Understanding Volume 108, Issues 1-2, October–November 2007, Pages 52-73 Special Issue on Vision for Human-Computer Interaction, doi:10.1016/j.cviu.2006.10.012.
  41. ^ Rupert Goodwins. "Windows 7? No arm in it". ZDNet. 
  42. ^ "gorilla arm". 
  43. ^ Hincapié-Ramos, J.D., Guo, X., Moghadasian, P. and Irani. P. 2014. "Consumed Endurance: A Metric to Quantify Arm Fatigue of Mid-Air Interactions". In Proceedings of the 32nd annual ACM conference on Human factors in computing systems (CHI '14). ACM, New York, NY, USA, 1063–1072. DOI=10.1145/2556288.2557130
  44. ^ Hincapié-Ramos, J.D., Guo, X., and Irani, P. 2014. "The Consumed Endurance Workbench: A Tool to Assess Arm Fatigue During Mid-Air Interactions". In Proceedings of the 2014 companion publication on Designing interactive systems (DIS Companion '14). ACM, New York, NY, USA, 109-112. DOI=10.1145/2598784.2602795

External linksEdit