There are various ways to make a DAC. I'll do a historical review…
Conceptually the simplest is to have a range of binary weighted elements like resistors, current sources or capacitors, each corresponding to one bit. These elements are connected to a summing node, and each injects a current or charge corresponding to \$ 2^n \$ for bit n. However, this requires high precision and matching: the maximum error on each element must be quite lower than 1 LSB, which becomes impractical for the highest weight elements as the number of bits increases. In addition, if there is too much error, the DAC may be non-monotonous. For example, consider a 4-bit DAC, which outputs the binary word 0111 then 1000. Since both words use different elements, if they're not precise enough, the actual analog value for 1000 could be lower than for 0111, which would be incorrect. This is used in low-bit DACs, but it is impractical for 12 or 16 bits.
A solution is to use a R-2R network which is much easier to match, since all resistors are identical (2R is two identical resistors of value R). This does not remove the precision requirement, it simply makes it easier to achieve.
This type of DAC has an issue which is specific to bipolar signals like audio: zero output level sits in the middle of the range, which is right on the largest transition, from 0111... to 1000..., which is the one with the highest error, which is exactly what you do not want. Various tricks are used like sign-magnitude to fix this, but - it's complicated.
I believe the last audio DAC to use this method was PCM1704. It was extremely expensive due to the tight tolerance matching required on the elements.
Another method is thermometer coding. This one is very simple: if you need a N bit DAC, you just use \$ 2^n \$ identical elements, all paralleled on a summing node. To output a value N, it simply turns on N elements. This gives good matching, the DAC is guaranteed to be monotonous, and every transition between two adjacent codes only flips one element. All transitions behave the same, there is no special one like in the previous case where error increases. The drawback is that it requires \$ 2^n \$ elements. So this is a popular scheme for many small DACs up to 12 bits, usually controlled by I2C. But 24 bits would require 16 million elements, so no-one has done it yet.
All the above are known as "multibit" DACs, because they have several output elements, in opposition to one-bit DACs.
Now, to make an audio DAC, or any DAC that's supposed to output a smooth continuous signal, you'll need an output filter to avoid aliasing. This filter should be pretty steep, which means expensive and impractical if implemented in pure analog. It is much cheaper to implement it as a combination of a DSP lowpass operating at a higher sampling rate (i.e. oversampling) followed by a simple cheap 1st-2nd order analog filter. This requires a DAC capable of operating at a high sample rate though.
The evolution of this led to multibit DACs (16-18 bits) with 4-8x oversampling. Increasing the sample rate means noise shaped dithering can be applied, which means the number of actual bits and elements in the DAC can be decreased while keeping the same performance. Because analog is expensive and digital is cheap, the next logical step was to increase oversampling to the max and decrease bits to the minimum, which led to delta sigma one-bit DACs. The one-bit DAC is the simplest possible DAC, it consists of only one element that outputs pulses at two levels, either 1 or 0, and the average density of pulses at level 1 sets the analog output value. It's a bit like PWM. This has the extremely important feature of being very cheap, since the DAC is basically a DSP and a logic buffer. In addition, it does not require highly accurate matched elements, because there is only one element, and it is inherently linear. So, it was absolutely perfect - in theory.
In practice, it was initially a complete failure on sound quality, which I think was due to designers switching from multibits to one-bits while not adapting the designs to the new, different requirements of the new chips. That didn't prevent one-bits from being a great commercial success. Delta sigmas really took a while to mature, maybe 20-30 years.
For example, naively running a pulse density bitstream through an analog filter will usually give pretty bad distortion... Consider the bit sequences "110" and "101": they contain the same amounts of ones, and have the same average value. But the buffer that outputs this has a rise/fall time and a transition time, so depending on transitions, each bit has a slightly different length depending on which bits are before or after. The fix for that was to replace each bit with a pulse, short pulse for a 0 and long for a 1, therefore keeping the number of transitions constant. But that required the output to be faster, since 0 becomes 001 and 1 becomes 011. So it evolved, and they simply added more bits to the one-bit DAC, by making the output use PWM instead. For example it could use a 3-bit delta-sigma modulator, outputting values between 0 and 8, encoded into PWM with a 9 bit period, which outputs 100000000 for 0 and 111111110 for 8. That keeps the number of transitions constant per PWM period, and looks better on paper. However, now the transition is moving around in the PWM period, which means the output is now phase modulated, which opens another can of worms.
In addition, sensitivity to clock jitter is proportional to the difference between output samples. So a multibit DAC which outputs something resembling the original signal, with rather low difference between samples, has low sensitivity to jitter. A one-bit DAC always has the maximum possible difference between output samples (0 or 1) so it has maximum sensitivity, and that spawned the jitter craze.
I'll skip on the details, but basically this led to delta-sigma multibit DACs which combine the advantages of the two: the output DAC usually has 4-8 bits which means it can be cheap and effective, thermometer-coded with dynamic element matching, with low sensitivity to jitter, matched transitions, etc.
This is the current state of affairs, and it is excellent: a cheap DAC chip from recent years, properly applied, will deliver pretty damn good results.
What are the differences between cheap and expensive devices?
In the audiophile world, the difference between cheap and expensive devices is cost.
There are two different design philosophies:
Standard: apply the DAC chip of the day according to manufacturer guidelines, get good performance if you don't screw it up. Usually not that expensive.
Esoteric: some retro, impractical and/or expensive technologies are assumed to be the only way to reach nirvana, which leads to very expensive boutique products. Some are engineered like garbage, some are absolutely solid. Some of them are excellent, but the price is no guarantee of this.
Of course, both will open a firehose of marketing bullshit to convince you to buy their products.
This is complicated by the fact that no-one knows how to measure a DAC. You can run the usual tests like harmonic distortion, noise, jitter, etc, but that's not well correlated to how it sounds unless the test results are unreasonably bad. And some distortions sound better. It's a mess.
If you're asking "how to properly apply a modern DAC chip" it's not that complicated:
Vref
Since the output is the product of the sample value and the voltage reference, you need a stable voltage reference. Most DACs draw a constant current on their reference pin, so it is not complicated to feed a stable voltage into that, you need a reference chip or LDO, a filter, a buffer, a bunch of caps, attention to PSRR, budget $2, and the reference ground has to be taken where it should (the DAC usually has a pin for that).
If some 3V3 from logic chips is used as Vref, you get lots of noise, but it won't show up on a noise measurement, because that's done with zero digital input, and the digital chips make much less noise when processing all zeros.
Most DAC noise measurements are done wrong, with the output set to zero. This makes Vref noise common mode, and the balanced output structures cancels it. You have to measure noise playing digital full scale DC instead, or the FFT noise floor while playing a low level sine with a DC offset, which should be swept over the whole range, and the maximum noise taken as official spec.
Clock
You need a low jitter oscillator, doesn't need to be expensive, it'll work fine with a $1 canned clock if it's the right one, and its power supply has to be clean, not 3V3 from the big FPGA.
If the clock is extracted from an incoming SPDIF or HDMI input stream, an bottomless can of worms of epic proportions is opened. If clock recovery is deficient and introduces signal-dependent jitter, audiophiles will comment that the DAC is "transparent" and "allows to hear the difference between digital cables", which just means it does not reject jitter due to cable-related intersymbol interference on the SPDIF link, so yeah, all cables and sources sound different in this case… and if it's bad enough (or "transparent" enough) yes it could make a difference (audible and measurable) if you put a brick on the CD player.
If the DAC does its job, which is to play the data and ignore the incoming jitter, it will sound the same no matter what the source/cable is. This is surprisingly difficult to achieve.
Output filter
Most modern DACs have balanced outputs, which they count on to cancel some forms of noise and distortion, so the output circuitry must be balanced. The opamps have to be fast enough not to choke on the HF noise coming from the DAC. An analog pre-filter can be used to mitigate this.
Layout
If the board doesn't have 4 layers with a continuous ground plane, expect the worst. Layout is of the utmost importance and it will make or break the design. A continuous ground plane is the safest. For example, one break in the ground plane in the wrong place, and fast digital signals (I2S, SPDIF, etc) can leak into the output, into Vref, or into the clock, causing all sorts of problems. If the board is 2 layer without a ground plane, the scope screen will be full of HF noise no matter what you probe, even ground. Likewise, simple is good, one board, one ground plane, no MHz digital signals in ribbon cables, no mezzanines, no modules, result in less problems. Star grounding does not work at HF.
Good decoupling, splitting power supplies into little islands with LC filters, etc. It's one of the cases where you absolutely need several LDOs, because the analog parts are sensitive to noise, and the digital parts make a lot of noise. Multiple transformers are only nice if your design uses +/-15V and +3V3 to reduce power waste, but otherwise, not really.