I'm still trying to understand this as well.
The STM32 processors seem to have PDM hardware processing. I think the ESP32 also has it. So if you use the right processor, the hardware will solve this for you. And there are I2S microphones with the conversion from PDM to PCM built into the microphone itself. But if we don't have a processor with the needed hardware support are there software options for slower microcontrollers?
Here's an app note from ST that has lots of good data about PDM microphones and how to use them with their processors:
Interfacing PDM digital microphones using STM32 MCUs and MPUs
They also have a software package that converts raw single-bit PDM data streams into PCM format so it seems they either have very fast processors or very clever software. Or maybe both. I have not found any reference to the limits of real-time processing by this software so it seems the software can keep up even at the high speeds PDM microphones send data (up to around 4 MHz).
My limited understanding of the conversion from the high-speed 1-bit format of PDM to a conventional multiple bit lower speed PCM format is that the 1-bit data is processed by a digital low pass filter that operates with the output bit resolution required (say 16 bits), and this produces a high-speed 16-bit data stream which is them "decimated", meaning you simply take one out of every N samples from this output and ignore the rest to end up with a 16 kHz sample rate at 16 bits per sample (for example). The filtering logic is required to prevent aliasing of any high-frequency content down to the low frequencies.
Though the above logic can be fairly simple using a recursive low pass filter, I still don't get how these processors can run fast enough to keep up with a real-time 4 Mhz sample rate.
Ok, I just found this Adafruit code that gives me some real insight into how this can be done. (Thankyou Adafruit)
Adafruit_ZeroPDM/examples/pdm_analogout_dma/pdm_analogout_dma.ino
She's using a 64 sample windowed sinc low pass digital filter which requires a sum-product of 64 input samples to compute each output sample. Seems like way too much math to do for each bit of the PDM input. But I see there are obvious tricks here I didn't understand. Because it's a FIR filter and not a recursive filter, you don't need to compute all the samples you will be throwing away. Those intermediate steps are not used so you don't need to compute them just to throw them away. So the math is only done for each output sample you need, not the ones you throw away.
The code above is configured to produce 16-bit resolution output values, at a rate of 16,000 samples a second. It runs the PDM microphone at 64 times that frequency, which is 1.024 MHz. So she's using a 64 coefficient size filter, to convert each block of 64 PDM bits, to one 16 bit output sample.
The code is using an I2S interface that reads the bits in 16-bit blocks and uses DMA to put it into memory for you (no I/O reads in a loop needed), so she reads 4, 16-bit samples to get 64 bits of PDM data and then converts that to one 16 bit output value with a sum-product of those bits against the constants in her filter. Because the filter size matches her decimation, it lines up nicely with every 64 bits it can produce on a 16-bit sample out.
Normally, a filter like this would be done with floating-point to reduce rounding error accumulations but she's using integers here. I don't known enough about this so understand the possible loss of dynamic range this creates but since 16 bits is already a 96db resolution and since the input samples are just 0 or 1, there's no loss of precision from the input samples. And the accumulated error is only for the 64 addition (max) for each output simple, not sample to sample. So the loss of resolution is worse case 64 * .5 or +-32 for each 16-bit sample I guess? That reduces the resolution from 16 bits to 12 bits worse case? So the dynamic range might be 12 bits instead of 16 because of the use of 16-bit integers? Which means 70 dB instead of 96dB? And if the coefficients were rounded to integers in a more complex way, the worse case error accumulation might be less so maybe 80 dB result? (I'm no expert on any of this).
So since the sum-product is just multiplying the coefficients by 1 or 0, all she had to code was a test if a bit was turned on, and add the matching coefficient to a running sum for the bits that are on.
So the code reads 64 bits from the PDM mic, then for each bit, she adds the corresponding filter value or not. The sum of these values is the 16-bit output sample.
So the only processing required per PDM bit, is a one-bit test (sum&0x01), one 16 bit sum through a pointer (result += *ptr) (which only happens for about half the bits), then a bit shift (sample>>1), and one pointer increment (ptr++); Or this code repeated 64 times for each block of 64 PDM bits read into memory:
if (sample & 0x1) {
runningsum += *sinc_ptr;
}
sinc_ptr++;
sample >>= 1;
So if your processor can read PDM data and run this much code for each bit, you can process the PDM data in software in real-time to convert it to 16-bit samples. But what you do with the output data at that rate, is another issue.
I see there's also the possibility to trade off memory for CPU by using lookup tables. For example, each 8-bit byte convolution with the filter could be pre-computed and turned into a lookup table with 256 entries so the conversion of 64 input bits to a 16-bit output could be done with 8 table lookups summed together. It requires 8 different tables each with 256 16-bit entries so it's 4K of lookup tables but I would expect it to use 1/4 the CPU as her code. So if you are short on CPU but have the memory to waste, that could be an option.
So in the comments above, the idea of doing an average of the bits was mentioned. This "correct" low pass filter code is not any harder than a running average but should get accurate 16-bit results. But it will use up a lot of CPU even on a fast processor to try and do the conversion in real-time at 1 MHz per bit. I would say that if you want to do anything with the data other than saving it to memory as a limited sample, you need to use hardware to do this conversion for you (or dedicate an entire small micro controller to do this work for you).