Is an MPU capable of equal quality digital synthesis as a dedicated DSP chip?

Question

Is there any factor which would cause a modern microprocessor ( such as an ARM Cortex A/M ) to be inferior at performing digital synthesis compared with a dedicated DSP chip of comparable speed?

At a glance, it looks like most digital synthesizers designed prior to the 2010s use dedicated DSP chips for tone generation and filtering. There seems to be more new synthesizers, for example Korg's WaveState or Volca series, built around ARM Cortex A/M chips, without dedicated DSP hardware.

I'm wondering if certain factors like the real-time processing capability of the chip may affect the overall sound quality of the output signal, all other things being equal. Admittedly, I'm not even remotely an expert in this area. I'm mostly curious based on my observations of a trend in this industry.

This isn't a meaningful question - digital algorithms are what they are regardless of what they run on. The only meaningful question is if a particular platform forces the use of a poorer algorithm (less resolution, lower sample rate, cruder model...). Today plenty of things are fast enough for audio usage, its more video where unique engines are useful. Possible holdouts would be extreme low power, or things like wakeword engines, but those are more AI-like recognition tasks than traditional audio DSP. — Chris Stratton, Dec 02 '20 at 04:39
Thank you for responding. I was thinking about potential constraints that might not be immediately obvious. Real-time processing capability, for one example. I could be wrong, I suspect that a consumer-grade MPU might not have the same abilities in this area as dedicated DSP hardware. Wouldn't this affect the overall quality of synthesis? — ajxs, Dec 02 '20 at 04:45
@ajxs The subject area requires nuanced answers and it appears you are looking for bright line answers. They don't readily exist. There was a time when the divisions were clearer. But recent developments continue to increasingly blur the lines. — jonk, Dec 02 '20 at 04:47
Given how cheap fast generic processing is today, probably not. But again, the thing about digital algorithms is that they don't care what they run on, as long as the calculations are done accurately and on time. Decide on the algorithm you need, then price out something that can run it fast enough. — Chris Stratton, Dec 02 '20 at 04:47
_"There seems to be more new synthesizers, for example Korg's WaveState or Volca series, built around ARM Cortex A/M chips, without dedicated DSP hardware."_ - so does their quality equal a dedicated DSP? There's your answer. — Bruce Abbott, Dec 02 '20 at 04:56
@Bruce Abbott This is what has prompted me asking. Anecdotally, many people report a general lowering in the quality of new synthesizers. I tested a Volca FM and I thought the quality was very poor, however I'm not entirely sure I could definitively tell whether low quality could be attributed to poor synthesis, or poor DACs, or some other factor. — ajxs, Dec 02 '20 at 05:05
Of course it can do equal quality, if quality simply means quality. The same algorithm just may not run real-time on a slow MPU. Mind you that some modern microcontrollers do include hardware floating point calculation units and DSP instructions, so they are much more capable of signal processing than microcontrollers without these. So a dedicated DSP is not a requirement to do light real-time synthesis algorithms. — Justme, Dec 02 '20 at 06:36
If you have long enough, yes an MCU can deliver the same quality as a dedicated DSP. In the mid-1980s you could do wavetable synthesis on a DSP (TMS32010) or a Z80. But the Z80 was limited to 4 voices, 8 bit, 9 kHz sample rate but the TMS32010 wasn't. MCUs have got a bit faster since then. — , Dec 02 '20 at 16:41

score 3 · Answer 1 · answered Dec 02 '20 at 05:33

I'm no expert on the topic. I'm a hobbyist, at most, today. I watch the news, buy and test out development boards from time to time, and broadly speaking enjoy the subject area of DSP processing. I've also developed professional products on both the TI C30 and C40 lines (years ago) and the Analog Devices' ADSP-21xx integer DSP lines. (I prefer the ADSP-21xx.) I still do play with DSP algorithms, from time to time. But the last time I did anything like this, professionally, was in 2013. Given the pace of things, this means I don't know very much.

It's also hard to know what you are really asking about. But if you feel there may be a bright line answer, then I think you may be out of luck. The algorithms you are using have everything to do with what hardware features might boost performance.

DSPs traditionally have focused on a few core concepts. These include:

multiply-accumulate instructions
parallel read from two memory systems, simultaneously
combined micro-instructions into a single cycle

For example, the ADSP-21xx processor can read from two different memory systems (data memory and reading from instruction memory as if it were data), perform an ALU operation, and write to memory -- all of these things take place within in a single cycle. This kind of performance allows the reading of data, associated constants (or other data), while also performing an ALU operation and writing out a prior ALU operation result in one clock cycle.

The ADSP-21xx was relatively low-power and didn't support floating point. Instead, it focused on cheap, low-power applications and aided floating point with a fully combinatorial barrel shifter which was able to normalize and de-normalize in a single ALU operation.

All competitive products will have to balance dozens if not hundreds of competing issues. Power consumption, manufacturing and calibration cost, size, weight, legal risks, response to environmental variations, variations between users, availability, tool complexity and cost, and outlook into the future for all of the above and more.

There's no single answer to an application area.

More modern processors, like the Micro Magic RISC-V, announced in October, delivers 11k CoreMarks at \$200\:\text{mW}\$, which is damn good. It's not a DSP. But it is fast and provides that speed at low power.

But if I had to hang my hat on a single thing that, from experience, makes a DSP nice to have and other processors, despite their overall performance, not so nice ... it is on the idea of rigorous processing from input sample to output sample.

In all the work I've done on signal processing, the one thing that has caused my products to excel when competing (and much more highly funded projects failed), is my focus upon making the time between sampling input and driving output fixed and short.

I shoot for zero-cycle variance. With some DSPs (the Analog Devices ADSP-21xx family, for example), I can achieve this. With others (the TI C30, for example), I cannot achieve this under any circumstances. So even within DSP devices, some are better than others in this particular area.

So I will look for a system where I sample the ADC at an absolute fixed rate. Many of these cases, the DSP or MCU must toggle pins and otherwise operate the external ADC (not uncommon) in a manual way. Doing this on a common MCU with zero-cycle variance is very difficult. Doing this on a DSP (at least a good one) is not difficult. And with the ADSP-21xx, I've been able to operate very fast ADCs with zero-cycle sampling variance. Very few MCUs can be expected to achieve that. Also, given the rigorousness with which the instructions execute, I can ensure that there is also zero-cycle variance in the delivery of output changes to the DAC. (There will always be some sub-cycle analog variance beyond my control, though.) And I can ensure that the work in between is performed quite quickly, given the dual-memory-read and memory-write, plus ALU OP, that the ADSP-21xx allows me.

Using most MCUs (CISC or RISC) usually means I don't have as tight of control. They may be fast (which is good), but if there is input sampling variation and output drive variation, then that processing speed doesn't help me. FFT processing will immediately (because of its assumptions of regular sample-spacing) smear the results if I cannot deliver consistent in/out sampling.

So, perhaps, if I had to pick one thing out of many that has helped me deliver solutions to customers when they were failing miserably using the products of large companies (Omega, etc), it would be the fact that I can very strictly control the data flow in a DSP where I will otherwise find (in an MCU) that I lose some of that control because of how their instructions have variable timing or their interrupt system isn't predictable in its response. (And other details, as well.)

A lot of people focus on just borrowing algorithms from others, stuffing them in place, and they don't care about the timing or what is going on under the hood. Not all compilers will translate those routines in the same way and there is a great deal of variation that arises from the use of "library code."

But I write every line of code, myself, and I test and validate each and every routine, start to finish. When I specify that the processing takes 1783 cycles, then that is exactly what it takes. Not 1782 cycles or 1784 cycles. It will take exactly 1783 cycles every single time. No exceptions. So you will know the delay, down to a fraction of a cycle.

Please note, though, that not all DSPs can provide rigorous timing. The TI C30 and C40 lines couldn't come close. In one case I worked on, together with TI experts in an application where timing was vital, I found that the documentation said the routine should take 7 cycles and the actual experience was 11 cycles. The engineers at TI never were able to resolve the difference between their documentation and the actual, measured behavior. And I could not again trust their devices for my application. We spent months working through this, with the final result being that we had no ideas and it was a mystery to the field engineers from TI, the internal design staff at TI, etc. So we closed the door on their DSP.

So it's only some DSP. Not all. And I may expect that some RISC processors might be very competitive, today.

Perhaps your experiences are due to this effect. I can't tell, though.

The AD-21xx were great and really eye opening to work with in the way the wide instruction word let you explicitly command each part. But they were great for the context of the later half of the 1990's. Their 16-bit architecture imposes some limits where slightly wider operands would be nice, which modern general purpose hardware doesn't have. There are reasons most of that family is "not recommended for new designs". And they're not cheap. — Chris Stratton, Dec 02 '20 at 05:48
@ChrisStratton They used to be cheap. The ADSP-2105 was advertised at $5, as I recall. We were paying about that for the ADDP-2111. Wiped out our competition, too. Nice family. — jonk, Dec 02 '20 at 05:53
@jonk Sounds like you're a programmer after my own heart. I've hand written a wavetable synth on the AD21xx with a length of 625 to get some factors of 5 into the resolution for decimal synthesis, and deeply pipelined filters on the C6xxx. But you try telling kids these days about a cardboard box in t'middle of t'motorway, and they won't believe ya. — Neil_UK, Dec 02 '20 at 06:11
@Neil_UK I coded in assembly on the device and hand-optimized, carefully, to maximize use of what I had available. It was quite impressive. And the great thing is that the documentation, timing diagrams for the bus, etc. were all very accurate. I never did find a single mismatch. (I did find silicon bugs, though, due to a FAB change they made.) I enjoyed calculating the exact cycle counts for every routine, their variation upon inputs, etc. I then tested every routine with every possible input to validate the code and then also work out 'worst-case' so to set the input-to-output fixed timing. — jonk, Dec 02 '20 at 06:29

Neil_UK · Answer 2 · 2020-12-02T05:59:10.963

It depends what you mean by 'quality'.

It's generally the case that a piece of hardware dedicated to one particular job will be an order of magnitude or two faster and more power efficient than general purpose MPU doing the same job. An MPU specialised with dedicated DSP processing units comes somewhere between the two.

However, what are your other requirements?

If you don't need the speed of the DSP, and there's general purpose processing to be done as well, then a single MPU will be more space efficient than an MPU and separate DSP device.

Much hardware dedicated to DDS will often only run 10 or 14 bit to the DAC, and only give binary resolution of frequency. If that's OK for your purposes, then fine. If you want higher resolution on the DAC, or specific resolution on the frequency (perhaps decimal for use with a 10 MHz standard), then you will want to roll your own with an MPU or FPGA.

First define your specifications, and what you mean by 'quality'. Then rate the different solutions.

score 2 · Answer 3 · answered Dec 02 '20 at 09:48

The one algorithm that makes a difference is the fast Fourier transform (FFT) that is needed for filtering and overtone shaping. DSPs would have an optimized hardware implementation, however with a fixed set of parameters to choose from.

In general, the Cortex architecture should be fast enough for audio processing and even to program FFTs. Check out if there are software libraries that you could use. I'd begin with an MCU for generality and flexibility and only consider DSPs when I'd find the MCU software/hardware combination too slow in practice.

The sound quality depends only on the algorithm, not the hardware, if you use good external DACs. So this is not a primary factor to your question.

analogsystemsrf · Answer 4 · 2020-12-03T04:20:59.407

I have provided specific answers to such as this question, regarding the highly impaired PHASE NOISE you will find in MCU_based synthesizers.

One example is the trash coupled onto the XTAL oscillator VIn and Vout pins (the two ends of the PI_resonator that creates the inversion needed by the onchip inverting amplifier).

If the XTAL frequency is 10MHz, with +- 1 volt amplitude, the slewrate at the zero crossing will be

SlewRate = d(1v * 10Mhz)/dT = 63 million volts/second.

Now if the rail bounce is 0.1 volts (in Ground or in VDD), and the ESD diodes have 2pf values, then assume a 10:1 voltage divider onto the 20pF C_pi capacitors around the XTAL.

Thus w compute the time_jitter form Tj + Vnoise/SlewRate

T_jitter = 0.01 volts / 63 million Volts/second
T_jitter = 0.01 volts * 16 nanosecond/volt
T_jitter = 0.16 nanoSeconds time jitter

Is that acceptable?

Given the reconstruction clock may be 1MHz (1,000 nanoSec period), the program/InOut dependent MCU clock is 1/6,000 jitter, or about -75dB.

=====================================================

How do DSPs avoid this?

They do not avoid this.

You can reduce the imposed zero_crossing charge_injection, by using multiple Ground and VDD pins, one such pair to be private to the XTAL oscillator.

But then substrate upsets are always there, just at lower level.

How to include an onchip PLL? and onchip XTAL reference oscillator?

Have system design that tolerates some level of deterministic jitter.

Is an MPU capable of equal quality digital synthesis as a dedicated DSP chip?

4 Answers4