I have been looking into this sort of gesture-recognition for several days now. I can share the references that I think are the most useful, then I can sketch out the (incomplete) solution that I'm working on.
I found a wide-ranging survey (dated 2014) of existing gesture-recognition techniques on the ACM website. (I don't know if it's free in general; I'd have to check when I was home. It may be available to me under my company's subscription to ACM.)
The $1 recognizer seems to strike a pretty good balance between accuracy and performance, plus this page contains several links to improved versions of that algorithm, such as Protractor, $N, and $P. I haven't tried using any of them yet, but my problem with all of them is that they represent gestures using point series. That requires input gestures (i.e. a series of touch points produced by the user) to be resampled to match the sampling rate of the stored gesture. Also, transformations have to be made in order to make the gesture recognition work in the presence of size/orientation differences. Also, this sort of gesture-recognition doesn't really support the idea of extracting features from the gesture, e.g. the length of certain strokes relative to other strokes, that may be handy when assigning semantic meaning to the gesture.
This work cites the $1 recognizer in describing its improvements. But it also stores gestures as a series of points.
It seems to me that a better touch-gesture recognizer would describe the gestures as a series of line/semicircle strokes, with some error-tolerance limits, and some references to the size/orientation of previous strokes in the multi-stroke gesture. Since the input points are X/Y coordinates relative to time, I think that orthogonal regression would be preferable to linear regression.
Calculating the orthogonal-regression line equation, and the error tolerance required for the input points to match that line, can be calculated incrementally. Circular regression, i.e. determining the equation for a circle being traced out by input points, is described here.
I'm not yet sure when to declare that one linear/circular stroke of a gesture has stopped and another component has begun. The gesture-template library will be able to generate a list of linear/semicircular strokes that could follow the currently-detected set of strokes, so it seems possible to use that to limit what sort of components should be searched for. But in general, it seems that I should use the error-tolerance to detect when the currently-detected stroke no longer matches the gesture-template, and to use that discovery to search inside of that stroke for a change to another type of stroke.
For instance, if the user has drawn two line-segments, then the algorithm needs to search inside of the current line-segment to find the most likely point that represents the boundary between two different line segments. Using the incremental regression formula described earlier, it seems like it should be possible to represent the linear/semicircular regression as a binary tree, so that changes between strokes can be found by descending to areas with smaller errors in the fitting-line/circle, but I've had trouble extending my ideas in this direction.