Dynamic gesture recognition with fingertip points

Question

I have built a fairly robust program in c++ which tracks several points on a hand. It accurately quantifies the size of the palm, the center of the palm, and the fingertip locations among other hand traits. The points are all stabilized with a Kalman filter.

I want to be able to recognize certain dynamic gestures such as: a wave, a swipe right/left, a circle drawn with the palm open. I am thinking of storing 2D points in a vector over time and trying to pattern match this or apply constraints to recognize the gestures.

As I am already starting to eat up the CPU of an i7 and want this to be performed in real time, efficiency is necessary. What approach should I take to recognize these dynamic gestures. Should I train classifiers to recognize certain patterns, or should I apply certain geometric constraints?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

I have been looking into this sort of gesture-recognition for several days now. I can share the references that I think are the most useful, then I can sketch out the (incomplete) solution that I'm working on.

I found a wide-ranging survey (dated 2014) of existing gesture-recognition techniques on the ACM website. (I don't know if it's free in general; I'd have to check when I was home. It may be available to me under my company's subscription to ACM.)

The $1 recognizer seems to strike a pretty good balance between accuracy and performance, plus this page contains several links to improved versions of that algorithm, such as Protractor, $N, and $P. I haven't tried using any of them yet, but my problem with all of them is that they represent gestures using point series. That requires input gestures (i.e. a series of touch points produced by the user) to be resampled to match the sampling rate of the stored gesture. Also, transformations have to be made in order to make the gesture recognition work in the presence of size/orientation differences. Also, this sort of gesture-recognition doesn't really support the idea of extracting features from the gesture, e.g. the length of certain strokes relative to other strokes, that may be handy when assigning semantic meaning to the gesture.

This work cites the $1 recognizer in describing its improvements. But it also stores gestures as a series of points.

It seems to me that a better touch-gesture recognizer would describe the gestures as a series of line/semicircle strokes, with some error-tolerance limits, and some references to the size/orientation of previous strokes in the multi-stroke gesture. Since the input points are X/Y coordinates relative to time, I think that orthogonal regression would be preferable to linear regression.

Calculating the orthogonal-regression line equation, and the error tolerance required for the input points to match that line, can be calculated incrementally. Circular regression, i.e. determining the equation for a circle being traced out by input points, is described here.

I'm not yet sure when to declare that one linear/circular stroke of a gesture has stopped and another component has begun. The gesture-template library will be able to generate a list of linear/semicircular strokes that could follow the currently-detected set of strokes, so it seems possible to use that to limit what sort of components should be searched for. But in general, it seems that I should use the error-tolerance to detect when the currently-detected stroke no longer matches the gesture-template, and to use that discovery to search inside of that stroke for a change to another type of stroke.

For instance, if the user has drawn two line-segments, then the algorithm needs to search inside of the current line-segment to find the most likely point that represents the boundary between two different line segments. Using the incremental regression formula described earlier, it seems like it should be possible to represent the linear/semicircular regression as a binary tree, so that changes between strokes can be found by descending to areas with smaller errors in the fitting-line/circle, but I've had trouble extending my ideas in this direction.

score -1 · Answer 2 · answered Sep 04 '15 at 18:20

-1

You could use Hidden Markov Models as they are used in tasks similar to your program like speech recognition and handwriting recognition. This research paper might help you out.

http://www.cs.ucsb.edu/~mturk/Papers/ICPR2000.pdf

answered Sep 04 '15 at 18:20

chopstickPiano

1

Please include the key points of the paper and how they're relevant to the question. In the event that the link goes dead, this answer will be worth a lot less. – toniedzwiedz Sep 05 '15 at 08:41

Dynamic gesture recognition with fingertip points

2 Answers2