0

My problem is to classify input time series signal (128 points, equidistant in time, range 0.0..1.0) in limited amount (say 16) different classes. We have hand-classified 20000 samples and we are using 2000 as training set and other 18000 to evaluate our classification algorithm (test set).

In my analysis I found that classification using correlation coefficient works quite good. https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

My classification "standard" class samples are calculated as mean values from training set. I correlate each sample from test set with standard class samples and the winning class is selected by the highest coefficient.

The problem is that I found out that different parts of signal is not equaly important. Some parts of signal fluctuate widely between samples, and same are very similar.

So I came upon idea to include some sort of weight for points in correlation coefficient. The weight would be inverse proportional to variance of selected point sample for the class in training set.

I can't believe this is a novel idea, but in my search up to now I have not found anything like this. If this is a known method, please let me know. If somebody had a similar problem and used similar methods, please report on results. If nowone has heard of this approach, please let me know what you (statistics-wise) think of it. Thank you!

SavinG
  • 9
  • 1
  • This is a problem ripe for machine learning - you may wish to compare it to classifying MNIST data - but if you really want to weight correlations, try [this](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#Weighted_correlation_coefficient). – J.G. Feb 04 '20 at 22:01
  • 1
    You can find the answer to your question by searching on the [Statistics StackExchange site: stats.stackexchange.com](https://stats.stackexchange.com/) – rwong Oct 02 '20 at 06:33
  • Since you have already created a hand-classified training set, I think you can start with Linear Discriminant Analysis (LDA) as a first step. You can prototype your algorithm changes by using a statistical prototyping framework such as... Python (numpy and scipy), Julia, Octave, Mathematica, MATLAB, etc. – rwong Oct 02 '20 at 06:36
  • In signal processing (fixed length vector with unspecified physical meaning; no applicable frequency-domain techniques), there are several questions to be asked before choosing an approach. (1) Does linearly scaling a vector keep it in the same classification? (2) Does adding a constant uniform bias (constant value to each element of the vector) keep it in the same classification? (3) Take one vector from class A, and one vector from class B. Let C = 0.5 * A + 0.5 * B. Should C be classified as either A, or B, or both A and B, or half way between, or a new class C? – rwong Oct 02 '20 at 06:40
  • If adding, scaling, biasing, and/or blending two vectors will give a result that should not be classified into the original vector, one should consider using nearest-neighbor based approaches. It doesn't have to be k-NN; there are many approximating methods that will preserve or approximate the nearest neighbor property during classification (recall). – rwong Oct 02 '20 at 06:42

1 Answers1

2

Because you are only comparing the candidates to the mean sample of each class, you lose information about the distribution of each class. You are trying to compensate for this by assigning a different importance for different aspects of the signal, but a good model should include this uncertainty directly.

As a very simple improvement, you could compare a candidate to multiple samples of the class and then combine the correlations. The more samples you take the more precise the estimate of correlation will be but the longer will the calculation take. You could even use such a technique to determine the uncertainty of your correlation estimate through bootstrapping.

You could also use a different metric to indicate class membership, and that metric could make use of more information. Your classification problem can be interpreted as a model selection problem. Each class represents a model, and you want to select the model that best explains the candidate sample. With the Bayesian approach, you would select that model that has the best evidence.

Whereas you currently describe the model by the mean value of the distribution of its training samples, you could extend the model e.g. to also include the variance. Unimportant features will have a high variance within a class, and thus automatically consider those features less important. This can be done fairly easily if you can assume the features to have a normal/Gaussian distribution. If you manually assign a weight to features that can still be helpful to speed the algorithm up, but it's not necessary or fundamentally helpful.

Before you start building complicated models, you should consider whether a learning approach could work that doesn't need to abstract a model from the data. For example, a KNN algorithm might perform well: you select the k nearest training samples (with some distance metric in the feature space, which could be your correlation metric), and then assign the candidate to that class to which most of its neighbors belong.

However, this is more of a statistics and data science problem than a software engineering problem. From the software engineering standpoint I have found it useful to create some interface that all algorithms must implement, so that algorithms can easily be swapped out for testing and comparison. That also makes it fairly straightforward to create a boosted classifier that combines the output of multiple simpler classifiers.

amon
  • 132,749
  • 27
  • 279
  • 375
  • Thank you for your answer. I am aware of supervised machine learning algorithms (such as KNN) and I am looking forward to evaluate them more throughly - first tests did not give better results as the simple correlation. The proposed extension of correlation was also due to relative simplicity and speed of calculation - using DFFT the correlation coefficient calculation is very fast operation even for 8-bit embedded microcontroller (which is my target system). Regarding the apropriateness of question for this forum, do you recommend a site where my question would fit better? – SavinG Sep 09 '19 at 21:06
  • @SavinG The Stack Exchange network contains [many different sites](https://stackexchange.com/sites), includes sites about data science, statistics, and signal processing. I'm not sure about their exact scope, so you'll have to look at their help center to see which aspects of your question could be on topic there – amon Sep 10 '19 at 12:55
  • @SavinG It appears that you are developing algorithms for a compute-constrained system (an 8-bit system with sub-GHz CPU speed). I would suggest the following approach: spend some time choosing the algorithm that achieve a high accuracy on the classification task without the CPU constraint (i.e. find an algorithm that works well on a fast desktop computer), and then separately choose an algorithm that, while maintaining a decent accuracy on the 8-bit system, guide your search by using insights learned from the desktop version. – rwong Oct 02 '20 at 06:47
  • 1
    For example, if the inherent nature of the task has something to do with frequency domain (due to something related to physics and time series), it would be foolish not to try frequency-based techniques. I'm glad that you have already tried DFFT. Likewise, if a task is inherently nearest-neighbor, it would be foolish to use a technique that doesn't perform well with nearest-neighbor tasks. – rwong Oct 02 '20 at 06:49