How to interpret this PCM audio data?

Question

I am attempting to write a program to get an ATmega328P to play audio. I first tried to use this library but I could not get it to work so I am now attempting to program my own. The problem is that I don't know how to interpret the audio data that I have:

const unsigned char samples[] PROGMEM = {
  128, 128, 128, 130, 129, 127, 127, 128, 127, 127, 128, 130, 128, 126, 125, 125, /* ... additional 26 kilobytes of numbers ... */
};

This is the same data used by the library, and it came from an audio file that had a bitrate of 8 kilobits per second. I then used an executable from this tutorial to extract the data from an MP3.

I first tried to see if these were the frequencies that needed to be played each for 1 thousandths of a second (calculated from the bitrate), but it sounded like static and didn't seem to work.

The wiring is pretty simple, just a transistor with the base going to the output pin and the speaker being controlled by this transistor. I know that this part works because I can output a PWM signal on the pin and it sounds as you would expect.

What is this data and how would I write a program to play it?

we can't deduct that from the numbers. It's a sensible assumption that these are just plain 8-bit sample values with a "DC" offset of 128, but that's just a guess. — Marcus Müller, Mar 10 '23 at 22:42
@MarcusMüller The library I linked uses the data, unfortunately I'm not experienced enough to understand what's going on. — luek baja, Mar 10 '23 at 22:43
the library is just using these values with a function called `pgm_read_byte` which doesn't seem to be part of that library. It's unclear what that function does. — Marcus Müller, Mar 10 '23 at 22:44
However, it literally says "The audio data needs to be unsigned, 8-bit, 8000 Hz", so just use python's `wavfile` module to write a single channel 1Byte-per-frame 8 kHz wav containing that data? — Marcus Müller, Mar 10 '23 at 22:46
@MarcusMüller I used an online converter to get it to comply with those constraints but I would much rather understand what this data is and program my own to save space. — luek baja, Mar 10 '23 at 22:49
As you already know, 8 bit unsigned integer PCM sampled at 8 kHz. That's a pretty complete description. — Marcus Müller, Mar 10 '23 at 22:53
@MarcusMüller I'm sorry but I have no idea what that means. Are these the frequencies or what? — luek baja, Mar 10 '23 at 22:55
No. These are PCM values. Time-domain samples. (You might want to look up PCM) — Marcus Müller, Mar 10 '23 at 23:11
@luekbaja for the first 1/8000 of a second the speaker should be in the middle, then in the middle, then in the middle, then 2/128ths of the way outwards, then 1/128 of the way outwards, then 1/128 of the way inwards, etc.... — user253751, Mar 10 '23 at 23:53
you know how a sound looks like a wave? these are just the Y coordinates of the wave — user253751, Mar 10 '23 at 23:53
@user253751 errr, no that's not how reconstruction of a digital signal works, what you describe is a zero-order hold. What should indeed happen is that the analog signal is defined by the digital value every 1/8000 of a second, and that the reconstruction system interpolates with a sufficiently band-limited impulse response between these instants; ideally, with a sinc-shaped filter. — Marcus Müller, Mar 11 '23 at 09:13
@MarcusMüller you are trying to answer a question of somebody who has absolutely no idea how PCM works; I think the more intuitively apparent explanation is more useful even if it is not quite as accurate. This is how things are taught. First the general idea, then the fine details later. — user253751, Mar 11 '23 at 14:58
@user253751 Ok I think understand now. I had seen the graphics of PCM but the part that was confusing me was that the ATmega328 didn't have a DAC and that the library looked as if it was just using PWM. — luek baja, Mar 11 '23 at 15:23
@luekbaja You can create a low quality DAC with PWM and a filter (resistor+capacitor usually); sometimes just with PWM. By turning the pin on sometimes, off sometimes, then smoothing it out, you get the effect of having it partway on, as the capacitor's voltage gets stuck in the middle, as it doesn't fully charge or discharge. Changing the pulse width changes the amount of charging time vs discharging time so it changes which voltage it gets stuck at. Sometimes you don't even need the capacitor as e.g. the speaker's inductance could probably also work as a smoother. — user253751, Mar 11 '23 at 15:56

score 2 · Accepted Answer · edited Mar 11 '23 at 09:42

2

The data format is clear. The library starts with a comment that it

Plays 8-bit PCM audio

and later another commment

unsigned, 8-bit, 8000 Hz

Which makes sense as that will be the most simple format to play than anything else.

The numbers are literally audio sample values that contain the sampled audio waveform. The values are what you would load to a DAC at the sampling rate, 8000 samples per second, and the DAC would then output the analog audio waveform (a bit simplified, as for a simple DAC which just keeps the voltage constant between samples, it would look like a stair-case waveform instead of the original intended waveform, unless the output is filtered with a low-pass filter, but that's off the scope of the question).

As simple MCUs don't have a real DAC, PWM can be used to play samples, and that's exactly what the library does.

edited Mar 11 '23 at 09:42

ocrdu

8,705
21
30
42

answered Mar 11 '23 at 09:03

Justme

127,425
3
97
261

So to write a program to play it, I'm guessing I would have to measure the number of integers between the peaks of each wave to determine the frequency, and then scale the duty cycle with the integers to change overall volume, but I am still stuck on what I would output when the integer is 128 or below. Would it just be nothing? – luek baja Mar 11 '23 at 15:53
No. Those numbers are voltages of waveform. You load 8 bit bytes into DAC or use directly as duty cycle. – Justme Mar 11 '23 at 17:36

score 1 · Answer 2 · answered Mar 11 '23 at 06:41

1

it will be tricky to interpret. it looks like 8 bit unsigned data, but the sample you show is mostly numbers around 128, so the amplitude in that part is small.

With 8 bit audio there are three common coding schemes linear, μ-law and A-law

Get some longer samples and play around with them in sox or audacity find out what sounds natural.

answered Mar 11 '23 at 06:41

Jasen Слава Україні

31,874
1
31
65

The library says 8-bit PCM, which means linear. It does not play companded/non-linear PCM. That could be arranged with a simple lookup table though, but the resulting linear PCM samples would be 10 or 12 bit. – Justme Mar 11 '23 at 09:13

How to interpret this PCM audio data?

2 Answers2