17

I'd like to know how to remove environmental noise from a speech recording.

I've made some research and I've noticed that most of the methods proposed use the fast Fourier transform. But why can't you use a classical electronic filter to remove the noise frequencies? Why bothering with doing an FFT?

pipe
  • 13,748
  • 5
  • 42
  • 72
Jazis
  • 173
  • 1
  • 1
  • 5
  • Because fft give a better quality output ?? – Solar Mike Oct 23 '17 at 07:51
  • 4
    Compare the price of a 5GS/s DSP system with that of a bunch of indutors and caps... – PlasmaHH Oct 23 '17 at 07:55
  • 1
    Maybe you can do more complex filtering with an fft. "classical electronic filters" simply remove all frequencies in a certain range. Also, here's your student badge, a gift from me to you. – Andrew Pikul Oct 23 '17 at 07:56
  • 1
    I have 2 problems with your question: (1) What do you mean by "classical eletronic filter"? (2) The fact that something is done one way (FFT) does NOT mean that it CAN'T be done another way (filter); there may just be some disadvantages to do it the other way. You assume, however, that it is not possible to do with classical eletronic filter (whatever you mean by that), which is probably wrong. – Curd Oct 23 '17 at 09:10
  • What is the format of your speech recording (analog vs. digital)? Is latency an issue (live vs. after the fact)? How long is the recording (a song vs. days of recording)? – copper.hat Oct 23 '17 at 17:05
  • Noise removal may involve more than filtering. For example, blanking or clipping. Like the filtering, these can be done by either digital processing or by analog processing. – richard1941 Oct 27 '17 at 00:16

6 Answers6

25

I'd like to know how to remove environmental noise from a speech recording.

Well it's stored digitally now, right? so are you planning on putting your microphone next to the speaker after an analog filter to re-record it?

Enough messing around, I'll be serious.


In order to make a filter attenuate more in a smaller range of frequencies, aka making the frequency response curve more vertical, then you just need to increase the order of the filter.

That is something that is reasonably easy to do in Matlab. It's also something that is feasibly to do post-processing. It's also about repeatability, if you apply the filter on a sunny day today, then you expect it to work identically to tomorrow when it's raining. You expect it to work exactly the same, right?

In analog circuits you have all these "5% resistor", "1% capacitor", and all other stuff. So if you want to make something exact you will definitely need to trim the circuit afterwards so it matches your desired filter perfectly. If you want to increase the order of the filter... then sadly.. it will make the filter so much larger physically. Instead of taking up the size of a credit card, it will take up the size of, I don't know, depends on filter order and what you're okay with.

Regarding the repeatability, doing something today.. warm.. tomorrow.. colder... the resistances will change ever so slightly, the frequency response will change, a couple of Hz there, some there, the more components you got in your circuits, there more likely it is that your components will change their values. And then you have humidity, oxidizing...

And here's the punchline that I should've said first, you can't really post-process it, unless you got cassette tapes. I'm not 100% sure what analog musical medium that is being used to record / delete easily. LP discs would be a nightmare...

And let's not forget the price. One is software, if you write it yourself then it's essentially for free, the other requires components, physical parts.

But don't think analog filters are bad, they got their uses, such as removing nasty harmonics in large DC motors, or making ultra silent stepper motors for 3D-printers by smoothing out the current. And tons of other uses. - Also if you would solve it with an analog filter, no one would think it would be a bad solution.

I believe I'm indirectly answering why FFT is a better way to go about it, post-processing wise. The bottom line is that it's much cheaper to do. You could also just apply a notch filter if you know what frequency the noise is at. Or a wider, aka bandstop filter.

And last thing I want to add... woaw this answer is so long, I'm sorry. But if you use an analog filter and you... mess up with your calculations and then think it's all fine and dandy and use it in some serious event, like interviewing the king of Sweden (Knugen). And you messed up with the sizing of a capacitor, instead of filtering 16kHz noise, you're filtering out 4kHz "noise". If you instead deal with it digitally then it's just a matter of changing some variables, you don't need to desolder -> solder another component. Also the interview is ruined.

Harry Svensson
  • 8,139
  • 3
  • 33
  • 53
7

But why can't you use a classical electronic filter to remove the noise frequencies?

Who says you can't? It is how this was done in the days before digital signal processing. The problem is that filtering noise is always a compromise between keeping your wanted signal (speech, music) untouched while lowering the noise.

For cassette tapes and other analog tape recordings systems like DNL and Dolby were used which filter only when the signal is weak meaning the noise is more audible. Then when the signal is stronger the filter fades off. See: Wikipedia Article on Noise reduction

Speech can be limited to a narrow frequency band like 300 Hz to 3 kHz while still being perfectly understandable. You could make a simple analog filter for that band but that would limit how much the noise is suppressed. To more effectively filter out frequencies outside of this band would require a complex analog filter. Such filters are difficult to design, build and manufacture.

This is where digital signal processing comes in. In the digital domain it is much easier to implement complex filters with many poles and zeros. Also since the location (in the frequency domain) of these poles and zeros is linked to the clock of the DSP (Digital Signal Processor), which is an accurate (crystal) clock, the filter will be much more accurate compared to an analog implementation.

Bimpelrekkie
  • 80,139
  • 2
  • 93
  • 183
  • +1 for mentioning the compromise between keeping wanted signals and removing unwanted. The problem is that the speech and noise occupy the same frequencies, so an FFT filter can remove the "baseline" noise i.e. after analysing the noise amplitude at each frequency *without speech*, that can be removed where there is speech. This is how FFT noise filters in [Audacity](http://www.audacityteam.org/download) etc etc etc work. – Reversed Engineer Oct 24 '17 at 08:46
  • I'm not sure what makes analogue filters especially difficult to design and build.. All you basically need is one or two opamps and some resistors and capacitors. And since opamps usually come in dual packages, you need just one chip. I would usually use analogue filter to do low-pass filtering to make sure there's little aliased high frequency signal in the signal. You *cannot* get rid of that with FFT afterwards. On the other hand it's no problem doing FFT bandpass filter when you have a clean "recording" to process. – Barleyman Oct 24 '17 at 10:08
  • @Barleyman *'m not sure what makes analogue filters especially difficult to design and build* I was referring to high-order filters like 4th order and higher. I agree that a couple of opamps. resistors and capacitors can do a almost any order filter but have you tried to design one yet? I have, OK, in a simulator, but then you already run into standard off-the-shelf capacitors not being **accurate** enough. At higher orders the precise value of the components become more and more important. – Bimpelrekkie Oct 24 '17 at 11:13
  • Also aliasing is not such an issue anymore these as we now have sigma-delta ADCs and DACs with very high sampling frequencies so a simple RC is all that's needed. – Bimpelrekkie Oct 24 '17 at 11:16
  • @Bimpelrekkie I have designed several, nothing to it these days.. Back in the day you would use a book with some precalculated parameter choices that you'd spend some time messing with to get some reasonable component values. Accuracy isn't a huge issue if you aren't trying to be too selective. RC does not really get the job done if you're trying to make e.g. audio recording. With -20dB/decade you'd have to put the filter at 2.2kHz to get some kind of filtration at the Nyquist frequency. 3rd order Cherbychew would do better @ 12kHz fc. 5th order would get you to -46dB which is "good enough" – Barleyman Oct 24 '17 at 16:54
  • Who says the "noise" is just on certain exact frequencies that can be notched out? Usually non-impulse noise is spread over a frequency spectrum. There is a classical theorem in signal processing that tells us the best possible filter, based on the signal spectrum and the noise spectrum. See "Extraction of signals from noise" [by] L. A. Wainstein [and] V. D. Zubakov. Translated from the Russian by Richard A. Silverman. Be careful! These guys may have been communists! – richard1941 Oct 27 '17 at 00:24
6

Well, the first step to understanding why we need FFT is to understand how digital filtering works.

So basically, you have a structure, like a shift register, with a number of memory elements, an input and an output. A sample value goes into the input, gets shifted through the register and moves to the output. At each stage in the register, it's multiplied by a number called filter coefficient.

This idea works OK when you have a fast register doing fast multiplications and you have samples coming in slowly one by one.

In real life, instead of that, you'll most likely get a frame consisting of a number of samples. When you want to filter that, you'll convolve the samples with the filter coefficients. That's same as doing the previous approach, but just looks a bit different.

Now comes the FFT part. It turns out that convolution grows in numerical complexity very quickly with number of samples. On the other hand, the FFT is at its start numerically complicated, but the number of needed operations grows much slower with the increase of filter coefficients compared to convolution.

What the above means is that above a certain number of samples, it's going to be much faster to convert a signal into the frequency domain using an FFT, filter the signal in the frequency domain, and then convert it back using IFFT. The trick that we're using is one of the properties of convolution, namely that convolution in time domain can, in some circumstances, be modeled as multiplication in frequency domain.

So to sum it up, if the number of filter coefficients you have is sufficiently large, FFT is faster. The "large" could be as small as a hundred or so.

MSalters
  • 561
  • 3
  • 7
AndrejaKo
  • 23,261
  • 25
  • 110
  • 186
  • 3
    I think by "classical electronic filter" he meant an analog filter, not convolution. – jalalipop Oct 23 '17 at 12:32
  • 1
    @jalalipop Could be, but I explicitly wanted to explain the FFT part. At the time, we've already had an answer explaining why we would like to have digital filters instead of analog. – AndrejaKo Oct 23 '17 at 16:26
2

FFT-based methods (you'll still have to work with windowing and overlap-add or overlap-shift modifications) have as the main advantage that the design is solidly in the frequency domain, and a Wiener filter or spectral subtraction or a number of other systems relying on signal statistics and a model really work fundamentally in the frequency domain.

In contrast, echo cancellation and various variants do not rely on a model of the noise but on an imperfect recording highly correlated to the noise. Those are done using varying filters (usually FIR) to subtract a noise estimate from the signal and update the filters in order to keep the correlation of the remaining signal to the noise channel minimal. For those techniques, FFT is not all that useful (when considerable delays of the resulting signal and of the filter updates are permissable, they can be employed as a component in a blackbox FIR with delay for performance reasons but aren't really useful for their frequency domain representation capability).

  • *solidly in the frequency domain* . Would not be any particular advantage in that if it wasnt because tones and notes have the ground tone and overtone properties. – mathreadler Oct 24 '17 at 16:23
2

Analogue filters are easy enough to design but the limitation is that you need to keep on adding physical filter elements to achieve band-stop filtering of given frequencies. And you need to adjust the component values if you want to move the notches around. A single opamp can do one band-stop notch so you need to add another amplifier for each notch you want. For a more selective notch you'd need two amplifiers per notch.

On practical terms you'd likely be best served by a 3rd order low-pass filter that you can do with a single opamp or perhaps a fifth order low pass filter that requires two. Use the low pass filter(s) to attenuate frequencies above the Nyquist frequency (1/2 sampling frequency) with some margin and you'll have a high-quality digital sample to post-process. With a clean recording like that, you can then apply FFT filters to create high-pass, band-pass and band-stop filters as needed.

Barleyman
  • 3,568
  • 14
  • 25
1

Linear time invariant filtering which a "classical electronic filter" does is just a "dumb" multiplication in the Fourier domain. But the information you find in an FFT tells you more details than just the response of one filter, which is just a linear combination of those components. Using that information you can steer the data processing and make it adapted to the data. Noise has some characteristics which clear vocals and musical tones do not, for example the correllation between overtones is not nearly the same for noise as for voice or music.

So if we can identify correllations between frequency components - i.e. finding a "ground tone" somehow we can steer the filtering and make it more adapted to the data.

mathreadler
  • 161
  • 1
  • 8