Can I use the FFT to recognize musical notes on a piano?

Question

I want to create a tool which recognizes a few musical notes (I know this is re-inventing the wheel). So I would play middle C, D, and E on a piano and it should be able to classify those notes. Here's how I think I should approach it:

Record a sample of me playing a note
Convert the signal to the frequency domain using the fast fourier transform
Find the frequency that is most present (basically argmax of the frequency domain data)
Assume that frequency comes from the note played and use that to classify the note

I haven't tried any of this yet because I don't want to start down the wrong path. So, theoretically, will this work?

It would be nice if you could be more specific in the title. I tried to include the bit about piano pitch recognition, but my (non-native) English is apparently failing me today. — pipe, Aug 07 '16 at 19:40
Your "sample" of playing a note should already be a waveform of amplitude and time. Essentially, point 2 is redundant. For a relatively simple implementation, your above steps should be just fine. — user2943160, Aug 07 '16 at 21:10
@user2943160 I added it to be explicit. Sound can be stored in a lot of formats, and it usually takes some mangling to get it into a nice amplitude over time. — michaelsnowden, Aug 07 '16 at 21:16
@michaelsnowden: You are using the term "amplitude" wrong: the amplitude of a sinusoidal function \$y(t) = A\sin(\omega t)\$ is \$A\$. It is the maximum of the signal (voltage, displacement, ...) and it is a constant (or slowly changing with respect to the frequency). What you mean is just the signal \$y(t)\$. Otherwise I'd think by "amplitude over time" you mean the envelope of the signal but as far as I understand you don't. — Curd, Aug 07 '16 at 23:07
@Curd whoops you're right that's not the right word. I'll change it — michaelsnowden, Aug 07 '16 at 23:27
@Curd What word would you use? Technically it's pressure right? — michaelsnowden, Aug 07 '16 at 23:28
@michaelsnowden: yes, but then it gets transformed into a length (displacement) of a membrane and that is transformed into a voltage by the microphone (and later it gets digitized and is represented by a bit combination). I'd just call it the (sound) **signal**. — Curd, Aug 08 '16 at 01:21
I'd like to also suggest that you look at this similar question from a few weeks back, a lot of good info in this thread http://physics.stackexchange.com/questions/268568/why-are-the-harmonics-of-a-piano-tone-not-multiples-of-the-base-frequency — Ian Riley, Aug 08 '16 at 01:34
This was almost exactly my final year computer science degree project, except with guitar chords so it is possible — Alex Logan, Aug 08 '16 at 12:06
It won't be as simple as you might think. Piano tuning is a subtle art: https://en.wikipedia.org/wiki/Inharmonicity — Solomon Slow, Aug 08 '16 at 13:55

score 25 · Accepted Answer · edited Aug 08 '16 at 02:13

25

The concept is good, but you will find it is not so simple in practice.

Pitch is not simply the predominant tone, so there's problem number 1.

The FFT frequency bins can't hit all (or even multiple) tones of the musical scale simultaneously.

I would suggest playing with an audio program (for example, Audacity) that includes an FFT analyser and tone generator to get a feel for what it can (and can't) do before you try to implement a particular task using the FFT.

If you need to detect just a few specific tones, you may find the Goertzel algorithm to be easier and faster.

Pitch detection is complicated, and there is still research going on in that field. Tone detection is pretty straight forward, but may not get you what you want.

edited Aug 08 '16 at 02:13

2012rcampion

592
3
16

answered Aug 07 '16 at 18:52

JRE

67,678
8
104
179

If we start with the assumption that the samples are of a specific instrument, the problem may be a bit easier to deal with, right? – user57037 Aug 07 '16 at 19:13
This looks really good. One follow up question is: can the Goertzel Algorithm be used to detect two notes which are being played simultaneously? – michaelsnowden Aug 07 '16 at 19:13
It can be used to detect simultaneous tones. Whether that is sufficient to detect simultaneous notes is a different question, and one I'm still working on. I have a Goertzel based guitar note detector that I've been monkeying with off and on for years. – JRE Aug 07 '16 at 19:21
2

@mkeith: Sort of. You can test the notes and see if detecting the predominant tone is adequate for a particular instrument (and maybe just the notes of interest.) So far as I know, though, there's no general solution for detecting all notes from all instruments. – JRE Aug 07 '16 at 19:24

score 3 · Answer 2 · answered Aug 08 '16 at 08:35

I would say using a multimodal observation window of the signal would be better. Something along the lines of a wavelet decomposition of your audio signal which will allow you to identify the multiple overtones inside the note. Yup, actually Wavelets, I would say is the way to go.

This is a very generalised breakdown of what wavelets are, but think of them as a multiresolution window that passes over your signal like a STFT. So you can identify different sinusoidals which occur at different temporal locations within your signal. this is also important as the note you play is not a stationary signal, it plays and then decays over time. I am not a musician, however I believe that tone dominence changes throughout the decay of the note.

of course after the wavelet decomposition wou will need to implement algorithms that identify notes and peripheral tones.

I think wavelets really address the problems people have been talkaning ybout in terms of pitch identification.

if you would like to learn how wavelets work this is a wonderful whitepaper released by HP about it :) http://www.hpl.hp.com/hpjournal/94dec/dec94a6.pdf and Introduction to Wavelets

for implementation, MATLAB has a wavelet tool and I am sure there is a plethora of other packages available for platforms like R, etc.

score 1 · Answer 3 · answered Aug 07 '16 at 22:05

I guess you are thinking of notes played in the middle of the piano's range (say between 200 and 500 Hz) but even in that range, a single note will have many overtones, which are not exact multiples of the fundamental frequency, and also a significant amount of broadband noise at the start of each note, and perhaps also at the end.

For loud notes at the lower end of the note range, you will find that very little of the sound energy (less than 1%) is actually in the fundamental pitch of the note.

Another problem is that a naïve interpretation of an FFT assumes the signal you are trying to detect has constant amplitude. That does not apply to piano notes where the amplitude actually follows several superimposed exponential decays - the initial part of the decay has a relatively short time constant, but the later part has a longer time constant.

You may be better investigating short-timescale Fourier transform methods, for example the Gabor transform, or wavelet-based methods.

Note that since the fundamental pitch of successive notes increases by about 6% for each note, you don't necessarily need a very high accuracy in identifying the frequencies of the harmonics in the audio. Correctly identifying musical notes is not the quite same problem as determining if the notes are accurately in tune with a musical scale, where frequencies may need to be measured to better than 0.1% accuracy.

score 0 · Answer 4 · answered Aug 07 '16 at 22:33

Yes, this is what the FFT is all about! To give you the frequency spectrum of the data you feed. The hard part is the implementation details, as you have mentioned.

Depending on what you want to do, exactly, changes the answer.

If you just want to analyze your own music, there are already software out there to do that. You could look at EQ's that show the response(basically the FFT), or get a "musical EQ" that shows the pitches also. You can get audio to midi VST's that convert what you play in to the correct midi notes. If your keyboard is midi, just skip the VST's, and record the midi directly.

If you want to teach yourself the FFT and how it relates to music, then better to get something like Matlab where you can compute the FFT of any data. It has the ability to record and also playback along with reading wav files and such. These then to be reall easy to use. You can graph the audio and do all kinds of analysis rather quickly if you know the syntax.

If you want build a device to do such a thing then it's quite complex. You'll need a uC/dsp/fpga/etc to do the calculations. Most popular devices already come with FFT code so you won't have to code it yourself(also complicated).

You'll need to build the circuitry and all that. It is not difficult but depending on your experience/knowledge it could take quite some time and has a steep learning curve. It also depends on the quality of the final product.

Mathematically, an ideal musical note consists of a geometric series of the "fundamental".

Suppose F0 is the fundamental frequency, then most musical notes will be approximated by F(t) + F0*sum(a_ke^(2^kF0*piit)) = F0 + a_1*F1 + a_2*F2 + ....

The a_k's are just the strength of those higher frequencies F_k and F_k is just some multiple of F0. If a_k = 0 for all k, then we have a pure sinusoid. The pitch of this is easy to detect. Just find the maximum of the FFT and that frequency is the fundamental of the tone = the musical note.

When you take the FFT, you end up with data that, and just do math on. It's basically calculus.

All that is relatively easy.

Some problems you'll have to deal with. Note that not all of these are "solved".

Latency - If you are going to do any type of real time stuff, this can become a problem.
Multiple notes - It is difficult to determine the group of notes because of all the extra harmonics. If play A = 440hz and A' = 880hz, most of the harmonics will overlap. You can easily get the A = 440hz, but getting the A' = 880hz is more difficult. When you think of chords, fast runs, etc, then it can be very difficult to precisely get all the information(notes). While everything is generally mathematically possibly, the data itself has errors and aberrations, and the equations are under defined in some cases.
Noise - Noise in the signal can give you spurious results. If a musical noise occurs it can screw up your results. Better algorithms would then be required = time + money + knowledge.

Can I use the FFT to recognize musical notes on a piano?

4 Answers4