How are HiFi DACs constructed?

Question

My basic understanding is that a DAC is a sequence of R-2R resistances. However, the audio industry offers HiFi fans a variety of DACs, with prices going to thousands of $ for brand names. At the same time, we have other devices where the cost of the integrated DAC is insignificant.

What are the differences between cheap and expensive devices?

Are you referring to the DAC chip itself (the IC, integrated circuit, which does the conversion) or the entire product (including surrounding components: IV stage, filters, interfaces, connectors etc.)? — TypeIA, Oct 02 '22 at 06:02
The expensive ones are made with Japanese bamboo that only grows in in a secret glade, only sold to _true_ audiophiles. — pipe, Oct 02 '22 at 06:44
I imagine there is no great variation on the chip itself as they are more expensive to develop. The surrounding components are just an amplifier, and I wonder what should be filtered if there is filtering. I would agree with the comment that most are PR-like in the fashion industry. — bandi, Oct 02 '22 at 07:49
In the more expensive systems, it is the circuits that remove the sampling artifacts that increase the cost. If the audiophile can see jaggies on a scope, they aren't happy. — Mattman944, Oct 02 '22 at 08:12
@bandi DACs output a step function. The edges of the steps have high frequency harmonic content that you don't want in an audio signal. It has to be filtered out. And "just an amplifier" is naive. There's a huge difference between a good amplifier and a bad/cheap one, even without getting into the dubious world of audiophile snake oil. — TypeIA, Oct 02 '22 at 11:53

score 68 · Answer 1 · edited Oct 03 '22 at 05:40

There are various ways to make a DAC. I'll do a historical review…

Conceptually the simplest is to have a range of binary weighted elements like resistors, current sources or capacitors, each corresponding to one bit. These elements are connected to a summing node, and each injects a current or charge corresponding to \$ 2^n \$ for bit n. However, this requires high precision and matching: the maximum error on each element must be quite lower than 1 LSB, which becomes impractical for the highest weight elements as the number of bits increases. In addition, if there is too much error, the DAC may be non-monotonous. For example, consider a 4-bit DAC, which outputs the binary word 0111 then 1000. Since both words use different elements, if they're not precise enough, the actual analog value for 1000 could be lower than for 0111, which would be incorrect. This is used in low-bit DACs, but it is impractical for 12 or 16 bits.

A solution is to use a R-2R network which is much easier to match, since all resistors are identical (2R is two identical resistors of value R). This does not remove the precision requirement, it simply makes it easier to achieve.

This type of DAC has an issue which is specific to bipolar signals like audio: zero output level sits in the middle of the range, which is right on the largest transition, from 0111... to 1000..., which is the one with the highest error, which is exactly what you do not want. Various tricks are used like sign-magnitude to fix this, but - it's complicated.

I believe the last audio DAC to use this method was PCM1704. It was extremely expensive due to the tight tolerance matching required on the elements.

Another method is thermometer coding. This one is very simple: if you need a N bit DAC, you just use \$ 2^n \$ identical elements, all paralleled on a summing node. To output a value N, it simply turns on N elements. This gives good matching, the DAC is guaranteed to be monotonous, and every transition between two adjacent codes only flips one element. All transitions behave the same, there is no special one like in the previous case where error increases. The drawback is that it requires \$ 2^n \$ elements. So this is a popular scheme for many small DACs up to 12 bits, usually controlled by I2C. But 24 bits would require 16 million elements, so no-one has done it yet.

All the above are known as "multibit" DACs, because they have several output elements, in opposition to one-bit DACs.

Now, to make an audio DAC, or any DAC that's supposed to output a smooth continuous signal, you'll need an output filter to avoid aliasing. This filter should be pretty steep, which means expensive and impractical if implemented in pure analog. It is much cheaper to implement it as a combination of a DSP lowpass operating at a higher sampling rate (i.e. oversampling) followed by a simple cheap 1st-2nd order analog filter. This requires a DAC capable of operating at a high sample rate though.

The evolution of this led to multibit DACs (16-18 bits) with 4-8x oversampling. Increasing the sample rate means noise shaped dithering can be applied, which means the number of actual bits and elements in the DAC can be decreased while keeping the same performance. Because analog is expensive and digital is cheap, the next logical step was to increase oversampling to the max and decrease bits to the minimum, which led to delta sigma one-bit DACs. The one-bit DAC is the simplest possible DAC, it consists of only one element that outputs pulses at two levels, either 1 or 0, and the average density of pulses at level 1 sets the analog output value. It's a bit like PWM. This has the extremely important feature of being very cheap, since the DAC is basically a DSP and a logic buffer. In addition, it does not require highly accurate matched elements, because there is only one element, and it is inherently linear. So, it was absolutely perfect - in theory.

In practice, it was initially a complete failure on sound quality, which I think was due to designers switching from multibits to one-bits while not adapting the designs to the new, different requirements of the new chips. That didn't prevent one-bits from being a great commercial success. Delta sigmas really took a while to mature, maybe 20-30 years.

For example, naively running a pulse density bitstream through an analog filter will usually give pretty bad distortion... Consider the bit sequences "110" and "101": they contain the same amounts of ones, and have the same average value. But the buffer that outputs this has a rise/fall time and a transition time, so depending on transitions, each bit has a slightly different length depending on which bits are before or after. The fix for that was to replace each bit with a pulse, short pulse for a 0 and long for a 1, therefore keeping the number of transitions constant. But that required the output to be faster, since 0 becomes 001 and 1 becomes 011. So it evolved, and they simply added more bits to the one-bit DAC, by making the output use PWM instead. For example it could use a 3-bit delta-sigma modulator, outputting values between 0 and 8, encoded into PWM with a 9 bit period, which outputs 100000000 for 0 and 111111110 for 8. That keeps the number of transitions constant per PWM period, and looks better on paper. However, now the transition is moving around in the PWM period, which means the output is now phase modulated, which opens another can of worms.

In addition, sensitivity to clock jitter is proportional to the difference between output samples. So a multibit DAC which outputs something resembling the original signal, with rather low difference between samples, has low sensitivity to jitter. A one-bit DAC always has the maximum possible difference between output samples (0 or 1) so it has maximum sensitivity, and that spawned the jitter craze.

I'll skip on the details, but basically this led to delta-sigma multibit DACs which combine the advantages of the two: the output DAC usually has 4-8 bits which means it can be cheap and effective, thermometer-coded with dynamic element matching, with low sensitivity to jitter, matched transitions, etc.

This is the current state of affairs, and it is excellent: a cheap DAC chip from recent years, properly applied, will deliver pretty damn good results.

What are the differences between cheap and expensive devices?

In the audiophile world, the difference between cheap and expensive devices is cost.

There are two different design philosophies:

Standard: apply the DAC chip of the day according to manufacturer guidelines, get good performance if you don't screw it up. Usually not that expensive.
Esoteric: some retro, impractical and/or expensive technologies are assumed to be the only way to reach nirvana, which leads to very expensive boutique products. Some are engineered like garbage, some are absolutely solid. Some of them are excellent, but the price is no guarantee of this.

Of course, both will open a firehose of marketing bullshit to convince you to buy their products.

This is complicated by the fact that no-one knows how to measure a DAC. You can run the usual tests like harmonic distortion, noise, jitter, etc, but that's not well correlated to how it sounds unless the test results are unreasonably bad. And some distortions sound better. It's a mess.

If you're asking "how to properly apply a modern DAC chip" it's not that complicated:

Vref

Since the output is the product of the sample value and the voltage reference, you need a stable voltage reference. Most DACs draw a constant current on their reference pin, so it is not complicated to feed a stable voltage into that, you need a reference chip or LDO, a filter, a buffer, a bunch of caps, attention to PSRR, budget $2, and the reference ground has to be taken where it should (the DAC usually has a pin for that).

If some 3V3 from logic chips is used as Vref, you get lots of noise, but it won't show up on a noise measurement, because that's done with zero digital input, and the digital chips make much less noise when processing all zeros. Most DAC noise measurements are done wrong, with the output set to zero. This makes Vref noise common mode, and the balanced output structures cancels it. You have to measure noise playing digital full scale DC instead, or the FFT noise floor while playing a low level sine with a DC offset, which should be swept over the whole range, and the maximum noise taken as official spec.
Clock

You need a low jitter oscillator, doesn't need to be expensive, it'll work fine with a $1 canned clock if it's the right one, and its power supply has to be clean, not 3V3 from the big FPGA.

If the clock is extracted from an incoming SPDIF or HDMI input stream, an bottomless can of worms of epic proportions is opened. If clock recovery is deficient and introduces signal-dependent jitter, audiophiles will comment that the DAC is "transparent" and "allows to hear the difference between digital cables", which just means it does not reject jitter due to cable-related intersymbol interference on the SPDIF link, so yeah, all cables and sources sound different in this case… and if it's bad enough (or "transparent" enough) yes it could make a difference (audible and measurable) if you put a brick on the CD player.

If the DAC does its job, which is to play the data and ignore the incoming jitter, it will sound the same no matter what the source/cable is. This is surprisingly difficult to achieve.
Output filter

Most modern DACs have balanced outputs, which they count on to cancel some forms of noise and distortion, so the output circuitry must be balanced. The opamps have to be fast enough not to choke on the HF noise coming from the DAC. An analog pre-filter can be used to mitigate this.
Layout

If the board doesn't have 4 layers with a continuous ground plane, expect the worst. Layout is of the utmost importance and it will make or break the design. A continuous ground plane is the safest. For example, one break in the ground plane in the wrong place, and fast digital signals (I2S, SPDIF, etc) can leak into the output, into Vref, or into the clock, causing all sorts of problems. If the board is 2 layer without a ground plane, the scope screen will be full of HF noise no matter what you probe, even ground. Likewise, simple is good, one board, one ground plane, no MHz digital signals in ribbon cables, no mezzanines, no modules, result in less problems. Star grounding does not work at HF.

Good decoupling, splitting power supplies into little islands with LC filters, etc. It's one of the cases where you absolutely need several LDOs, because the analog parts are sensitive to noise, and the digital parts make a lot of noise. Multiple transformers are only nice if your design uses +/-15V and +3V3 to reduce power waste, but otherwise, not really.

Wow, great answer, +1! Do you have any perspective to add on DACs these days which have built-in PLLs (ESS calls them "jitter eliminators" I think)? — TypeIA, Oct 02 '22 at 13:43
It's the only cheap way to get rid of jitter when the master clock is not in the dac. They call it DPLL but it's not really a PLL, rather it is simply asynchronous sample rate conversion (ASRC) but it is done at the oversampled rate, not at the base rate, and that makes the processing much simpler and with less rounding errors and with higher precision than using an ASRC chip operating at the base rate. — bobflux, Oct 02 '22 at 13:50
Basically it oversamples (64-128x) incoming data which any delta sigma dac will have to do anyway. It timestamps incoming samples, and calculates at what timestamps the local master clock ticks are relative to incoming samples. And it does the ASRC with linear interpolation, which works absolutely fine at this sample rate. Much simpler than ASRC between 44.1k and 44.10001k as the ASRC chips must do, that requires calculating lots of sinc, quite complex. — bobflux, Oct 02 '22 at 13:52
Wish I could upvote this twice. Clear answers to audio questions are so hard to come by. There's so much snake-oil that sensible answers are drowned in the 'its all snake oil' noise---and so nobody learns. — 2e0byo, Oct 02 '22 at 15:44
@bandi decent D/A chips are amazingly cheap these days. I thoroughly recommend *trying* to build something. Don't aim for perfection, just a decent sound in headphones. If you get the grounding + psu right decent sound is very possible. If you get it wrong, it'll sound awful, no matter what you do. A great way to learn what the big error sources are (which are sometimes present in 'audiophile' equipment!) — 2e0byo, Oct 02 '22 at 15:47
I appreciate the clear reason why one might “hear the cable” with a digital signal - I thing I though was obviously bullshit previously. Thanks for that. — Bryan, Oct 02 '22 at 18:04
@Bryan Thanks! Audiophiles say you can hear brands of CD-R, which appears ridiculous. However, the optical pickup actuator coils draw a lot of current as they move the optics to follow the track, and on some badly engineered players this causes a ridiculous amount of ripple on the power supply, which couples into everything... In this case, this supply ripple will change if the disc hole is off-centered or if it vibrates, or if the track has a ripple... lol. In the case of CD723, someone walking in the room caused the SPDIF output trace to wobble on the scope screen. — bobflux, Oct 02 '22 at 19:33
@2e0byo thanks for the appreciation! In fact many audiophile legends seem to have truth hiding behind them, even though the common interpretation of things is usually the opposite of the truth. See above comment for a funny example — bobflux, Oct 02 '22 at 19:34
very good surmise of how certain audio voodoo effects are indeed real (if the player is poorly engineered). Wrt jitter: from calculations, you need absurdly low jitter on an e.g. 10-20 MHz 1-bit DS DAC, or jitter will become the dominant form of broadband noise.. This is probably the hardest part to achieve in this otherwise extremely simple Diy-DAC. — tobalt, Oct 03 '22 at 07:09
@tobalt yes 1-bits are very sensitive to jitter, that's one of the reasons for the transition to delta sigma multibits. Some have an interesting solution: run the bitstream through a shift register, with each output of the shiftreg going to a summing node through a resistor. Resistor values can implement sinc filter or just average. That reduces jitter sensitivity and produces much less HF edges in the output, which is much better for the opamps downstream. — bobflux, Oct 03 '22 at 08:06

score 10 · Answer 2 · answered Oct 02 '22 at 08:38

10

Well, even your first assumption is wrong.

Audio DACs are these days are not even constructed from R-2R resistances. Even a 1-bit delta sigma modulator will likely have better specs on audio bandwidth than a bunch of resistors. The problems of analog domain are handled by processing in digital domain.

Usualy the price of a product is just dependent on how much money and effort has been made designing it, so careful design generally makes a more expensive product. Careful design includes selecting and evaluating reliable quality components, designing the schematic and good PCB layout, maybe even using a more expensive 4 layer PCB, and maybe the look, feel and materials of the case to justify the high price. It may include more software features than other products, so it takes longer and more money to develop features that may even be rarely used.

But remember that selling a good looking product at a high prive does not mean it's Hi-Fi, it's the specs.

And sometimes, these devices go just overboard with the specs, maybe forced by the marketing department. For example, your ears hear up to 20 kHz, and likely the material you play is recorded to 20 kHz, so why would you need a DAC, amplifier and speakers that can play up to 40 kHz unless you are a bat.

answered Oct 02 '22 at 08:38

Justme

127,425
3
97
261

Thanks for the explanation. Do you know when the switch to this new technology took place? – bandi Oct 02 '22 at 09:59
@bandi There was no single switch. At any given time, you use the technology (chip) you have available and what is cheapest that fills the specs, unless you want the latest expensive technology to gain a marketing advance. But the technology itself was invented in the 1960s, feel free to do your own research on it. – Justme Oct 02 '22 at 10:27
1

@justme - you can buy R2R ‘audiofile’ dacs at stupid prices. They must’ve been real careful in choosing the resistors and that must explain the cost………….. – Kartman Oct 02 '22 at 10:45
The need for 40kHz+ sampling rates is explained by the Nyquist theorem. To represent a signal of frequency f, a sampling rate of at least 2f is required. – Seb Oct 02 '22 at 23:57
@Seb If you read my answer, I am not even talking about sampling rates. I am talking about systems that can play content above human hearing range. – Justme Oct 03 '22 at 04:06
2

Perhaps the word "*reproduce* up to 40kHz" would unambiguously indicate that you mean that Nyquist frequency (sample rate / 2), rather than that sample rate? Agreed it's clear from context if we assume that everyone knows the Nyquist theorem, but "play" could also be interpreted as sample-rate of the input you feed the system. – Peter Cordes Oct 03 '22 at 06:09
@Kartman A lot of audiophiles who are deep into the snake oil are drawn to R2R dacs because it maximises the number of discreet and passive components, and more passive components means more better. The circuit boards certainly look a lot more impressive. – Chuu Oct 03 '22 at 22:04

score 2 · Answer 3 · answered Oct 02 '22 at 12:12

Good comprehensive posts, so just a couple of cents in addition:

For audio reproduction, perhaps the key property of the DAC is its nonlinearity, or Integral Nonlinearity (INL).

R2R DACs don't offer advantages here, so are being successively abandoned in favor of cheaper and more flexible Delta Sigma DACs. Digital compensation for INL also favors Delta Sigma DACs as they implicitly come with beefier DSP units integrated/attached.

score 0 · Answer 4 · answered Oct 02 '22 at 17:44

0

I can't add much to the above excellent contributions, except to add that if you have a voltage available of 30V pk-pk in the analogue domain, then 16 bit resolution equates to about 460uV pk-pk or around 160uV rms. So if your analogue circuitry is not giving less than this as noise - across the full 10 octaves of the minimal audio range (20Hz-20kHz) that is - further discussuions about the resolution of the converter are probably looking at the wrong thing.

For reference that -160uVrms is -74dBu. In my experience of making audio measurements, if you measure less than -80dBu output noise in a typical circuit, you're doing OK. -90 is very good.

And that is with +/-15V rails on the analogue side of the converter - if you are dealing with only 6V or 3V3, and 24 bits, sampling at 96kHz, of course it all gets even more difficult.

If you are making an ADC, you are likely samping a lot of low level analogue noise from input circuits.

If you are making a DAC, the output noise of your gain stages following the DAC might be more important than the DAC itself.

In reality I suspect many "high end" ADCs/DACs with 24bit resolution are actually losing at least a few bits just because of analogue domain stuff.

If you are curious about whether or not you can really hear all those bits, the redoubtable Ethan Winer has some interesting things to say about it here. You can even take part in his interactive test by downloading and listening to some files.

answered Oct 02 '22 at 17:44

danmcb

6,009
14
29

1

you can't compare pkpk noise with precision. If you have 1mV of pkpk noise and 1mV of sine wave, that sine is certainly audible. More precision (if through inherent resolution or through dithering is irrelevant) is always good and eventually, you must have at least 120dB dynamic range for good sound. But in the end, INL will be worse than 1ppm, so INL will almost always be the limiting factor, also because intermodulation distortion is much more annoying than broadband noise. – tobalt Oct 02 '22 at 18:15
If you need 120dB of dynamic range for good sound, then just about everything ever recorded on analogue tape is not good. Even the best pro machines gave +26dBu MOL, with one channel noise output being round about -85 or -90dBu at best. That's only ONE channel. Noone records at MOL (typically 20dB below or so) and channels must be summed (OK not at full level). These figures were worse in say 1960, yet some recordings from then are considered superlative. Yes, you will hear coherent signal mixed with white/pink noise at the same level. But for sure you will hear the noise as well. – danmcb Oct 03 '22 at 06:16
What I am pointing out here is that even at 16 bits, the presence of analogue domain noise probably dominates the lower order bits even on a good day. At 24 bits, it's really doubtful whether anything useful is being added, in terms of conversion. Of course there is reason for having more bits in a DAW, that's a different story. – danmcb Oct 03 '22 at 06:19
Ok -120dBFS is a bit high as a general spec.. But 16 bit DACs were considered too bad in terms of precision, which is why 16 bit music uses additive dither and in-band noise shaping, to get better than 100dB dynamic range. If the DAC has 24 bit you don't need to add dither and consequently reach lower noise floor. And again: having 1LSB wide pkpk noise in an audio bandwidth is still roughly *-45dB noise floor below the smallest sine wave amplitude*. IMO, a lot of broadband noise is far less of a problem than you make it appear. Noisy bits still add to dynamic range through their average value. – tobalt Oct 03 '22 at 06:56
I understand what you're saying. In general it's quite a difficult thing to quantify because of the way that (particularly) music is perceived. I agree that wideband noise isn't necessarily a huge problem - I enjoy for instance many recordings made pre-1950 because of the musical content and I don't care if they are a bit noisy. More bits don't hurt but I doubt that they are anywhere near as important as us geeky types make out. I haven't tried the listening test I posted above, I may do later. (BTW in pre-digital days, 90dB of dynamic range from an analogue studio was considered very good.) – danmcb Oct 03 '22 at 07:09
Here is an interesting link on ditrhering https://www.sageaudio.com/blog/mastering/what-is-dithering.php - worth reading until the end. – danmcb Oct 03 '22 at 07:24
Thanks, but I don't fully support the conclusion. mainly the author is throwing dithering and noise shaping into the same bag. a) yes, many real analog recorded signals don't require add. dither because they have sufficient noise. b) it is clear that adding dither and noise shaping will lead to more rms noise ( that's the whole point) c) it is possible to do NS without additive jitter, which the author seems to ignore. d) their method of FFTing the whole track will of course mask the existence of harmonic quantization distortion in 16 bit. it will look like noise in a whole-track-FFT. – tobalt Oct 03 '22 at 07:41
if the recording doesn't have enough noise to be self-dithered (e.g. good modern recording), then truncation to 16-bit will create harmonic distortion. 24-bit data and DAC won't have this problem. That is the whole point. The broadband noise of the DAC will neither fix this issue, nor will it realistically ever be large enough to restrict the dynamic range (from a 21st century POV) – tobalt Oct 03 '22 at 07:45
"then truncation to 16-bit will create harmonic distortion" - yes, IF there is signal in the original that sits at around the level of resolution of 16 bits. That is I understand it. However, if that is NOT the case, all that dithering is doing is adding noise. That is what the author of that piece basically says, and shows. – danmcb Oct 03 '22 at 09:30
also note that almost all discussion on this topic is based on theory. Almost never do we see real tests done to see what actually IS audible -mostly because doing proper controlled blind testing is hard, but also because exposing the limits of the argument does not serve the marketing department well. That's why I like Ethan Winer - he is one of the few to really try to figure out, by experiment, what REALLY matters and what does not. – danmcb Oct 03 '22 at 09:35
in other words - we can have a long and interesting discussion about what happens at -80dB or even less, but it is far from certain whether our theoretical conclusions make any provable real-world difference. Perception of hearing, especially music, is notably subject to personal bias, even when the subject is well intentioned. (Look out Winer's "audio myths" lecture on that subject.) – danmcb Oct 03 '22 at 09:39
You can listen to possible truncation effects easily: Make a recording at 24 bit, then truncate to 16bit. Subtract the truncated from the original and amplify the result something like 2^14. If you can still hear some resemblance to the recording (e.g. its rhytm), then there is significant harmonic distortion. – tobalt Oct 03 '22 at 10:24

How are HiFi DACs constructed?

4 Answers4

Linked