7

I would like to perform a FFT on a signal with equally sampled values of which some are missing.

(Actually, they are not even missing, but simply erroneous, so I can't use them, but I can detect when I have a wrong samplepoint.)

Is there a way to compensate for those missing values and get a good transformation? I suppose this is a frequent problem, but so far I only found this and somebody interpolating the missing values.

As far as I understand it, if my signal is, say, 100 HZ and the frequencies I care about are, say, 5 Hz, following the Nyquist theorem a 10Hz signal should be enough, so I could downsample the signal in a way that removes the missing points.

So my question is, is there a way to use a fourier transformation (preferably FFT) on a signal with missing values and get an (at least almost) correct spectrum for small frequency values while retaining the other information? How about other transformations (orthogonal like wavelets, etc.), is it the same problem or a completly different?


Edit:

Thanks for all the input, I will definitly review the literature, I was just hoping there was a good starting point or standard solution I had missed so far.

One commenter said I didn't provide enough information, so I'll add some. The signal processing should be real time (best case), so more processing time would be a big issue. The data cannot be recorded continuously and then sampled as the recording method per definition uses sampling (like a camera).

I am looking for a recurring signal with a frequency which is between 15 and 100 times smaller than the sampling rate, though most times it's rather 20-40 times smaller. The recurring signal itself however has higher frequency components, so it's not exactly staying below my sampling frequency.

My actual problem, for which I was trying to gather ideas with this question, is that at random points in time I get baseline shifts (inherent, no physical solution possible). They can be detected as they are way greater than the actual signal and others are working on ways to compensate them algorithmically. However, during signal analysis, as of now, I have thos shifts of baseline that may or may not happen fast.

Those shifts can either produce spikes and effectively killing any interesting signal or be more or less filterable, talking for example a ramp signal overimposed on a part of the original signal.

I'd like to retain as much data as possible - and as I'm trying to work real time, just using a transformation on the part before the shift and another one on the part after the shift would be a bad solution.

I do know the base frequency range (in which the signal is repeated) and more or less the signal form (however not exactly amplitude and length, it may be a little shorter or longer losely depending on the reptition frequency) I am looking for.

DonQuiKong
  • 212
  • 2
  • 13
  • 1
    There is a lot of literature on the missing value problem in FFT work. This is an area where I've spent some small time and there are a variety of tools you might apply, depending on what you know a-priori. Lagrange interpolation is one method. There are many others. NFFT is designed to handle non-equal spaced points, from the ground up, so you might examine it's methods, as well. Read its tutorial PDF. I would not recommend blindly setting missing values to anything. The surrounding data provides snapshots of reality that you should usually apply in some fashion. – jonk Mar 07 '17 at 22:39
  • 1
    Given you're sampling at 100 Hz and interested in signals below 5 Hz, I'd expect to get pretty good results by just interpolating in a value for the "bad" sample, based on the neighboring samples. – The Photon Mar 07 '17 at 22:48
  • https://www.mathworks.com/matlabcentral/newsreader/view_thread/41005.html? – Scott Seidman Mar 08 '17 at 02:17

3 Answers3

3

Unless your signal has zero mean, I would avoid zeroing the missing samples because that would introduce bias in the estimation of the mean, the power and the spectrum of the signal. Instead, the best estimation of the missing samples is the average value of your time series.

However, using the mean isn't good either. The subset of replaced samples would have a constant value. That would solve the estimation bias problem for the average value (f=0), but not for the remaining spectrum. In other words: the replaced samples don't have the statistical and spectral characteristics required.

How can we improve this? Interpolation of the missing samples is an option.

The subset of interpolated samples will also have the same average value of the signal of interest, along with some variability around it. If the wrong samples are randomly spaced, then the subset of interpolated samples will have a variance (power) similar to that of the signal of interest.

Interpolation implies some degree of smoothing. We can think of that as some kind of low-pass filtering on the subset of interpolated samples. Thus, we could mitigate the interpolation-induced distortion with oversampling.

So far, we've dealt with the problem at sample level. But there are other options. If we are estimating the spectrum using non-parametric methods based on signal segments (Barlett, Welch...), then we could discard the segments containing the wrong samples. Discarding segments will increase the variance of the estimation, but will not introduce estimation bias at any frequency.

Enric Blanco
  • 5,741
  • 6
  • 22
  • 40
1

The FFT algorithm doesn't deal nicely with gaps in the data. One workaround is to null out the 'bad' data. That may cause artifacts, since the statistical weight of the 'bad' data is not zero (though, in a careful data analysis, it ought to be). Another is to interpolate the missing points.

The best way to interpolate, however, is to ... do a weighted slow Fourier transform. There's nothing wrong with, just once, computing the frequency components from the data you DID receive, and ignoring (giving no weight to) the missing regions. All it takes is a little extra compute time.

Each Fourier component is computed from the data points, the data point weights (zero for 'bad data', or reciprocal of expected mean-square-error for each 'good data' point), and normalized against a unit sine or cosine with the same weights applied.

An FFT inverse of the computed components will show you the 'missing' areas reconstructed. Other transformations to orthogonal sets of functions (like wavelets) can similarly be computed using a set of statistical weights for disparate data point significances. That's also a general scheme for merging multiple experiments' data.

The absolute best way to do this kind of thing, is dependent on the data model. A procedure called 'maximum-entropy filtering' was used for blurry Hubble images, that was designed to recreate from the blur an underlying set of bright points in a dark background. For that data, that method made good sense.

Whit3rd
  • 7,177
  • 22
  • 27
0

Any samples that are known to be "wrong" can simply be zeroed out. That way, they won't contribute anything to the output values, which, when it comes down to it, are simply weighted sums of the input values in various combinations.

You can verify by doing an IFFT on the results and comparing with the original block of samples.

Dave Tweed
  • 168,369
  • 17
  • 228
  • 393
  • Thanks! What would be the effect of this concerning accuracy of the spectrum compared to the same signal without errors and the downsampled signal? – DonQuiKong Mar 07 '17 at 21:45
  • Could you please develop a little more your answer? – Enric Blanco Mar 07 '17 at 22:25
  • 6
    This misses the point, I think. Although zeroing bad samples will produce zero samples after FFT/IFFT, it will also add apparent high-frequency energy to the FFT output and distort the FFT. It may well be better to do something like replacing the bad samples with a value interpolated from its neighbors. The optimal interpolation would vary with knowledge of underlying spectrum (if any). – WhatRoughBeast Mar 07 '17 at 22:34
  • @WhatRoughBeast Agreed. I was kind of shocked reading Dave's answer. – jonk Mar 07 '17 at 22:41
  • @WhatRoughBeast: You can't say anything at all about what the missing values were supposed to be. After all, they might actually have been zero! Any replacement data is going to damage the spectrum in some way; zeroing them out is arguably the *least harmful* thing to do. If the signal you're looking for has known spectral characteristics, then yes, an interpolating filter can be applied before the FFT. But this is the same as ignoring the FFT output bins that are "most damaged" by the replacement data. – Dave Tweed Mar 07 '17 at 22:42
  • @DaveTweed Examine research papers on the missing value problem for FFTs and find _one_ supporting paper. I'd be interested to read it. – jonk Mar 07 '17 at 22:49
  • @jonk: I'm a volunteer here, just like you. Feel free to write a better answer! I can handle it if mine ends up at the bottom of the heap. – Dave Tweed Mar 07 '17 at 22:51
  • @DaveTweed It's better the OP goes the to literature. It's out there and quite good. I've already recommended that the OP read the NFFT tutorial (rev 3.0 or better) and study up. The OP doesn't provide enough info here for a good answer, anyway. So there are too many possible good answers. Best to go to the literature. – jonk Mar 07 '17 at 22:53
  • @jonk I have adressed the issue of insufficient info ;-) Hope that helps. – DonQuiKong Mar 08 '17 at 08:46
  • You'd be MUCH better off scraping the points out an interpolation them back in than zeroing them. – Scott Seidman Mar 08 '17 at 11:48