Why not always use DMA in favor of interrupts with UART on STM32?

Question

I spend last month a lot of time getting UART (for MIDI) to work with an STM (STM32F103C8T6) using interrupts, without very much success.

However, this evening using DMA it worked quite fast.

Since as far as I read DMA is faster and relieves the CPU, why not always use DMA in favor of interrupts? Especially since on the STM32 there seem to be quite some problems.

I'm using STM32CubeMx/HAL.

Not all µC got DMA. If you *can* use it, then great, use it if you *need* the speed. — Harry Svensson, Nov 17 '17 at 01:10
@HarrySvensson I don't know if I really need the speed, but DMA I got it working in a few hours, while I tried interrupts for several weeks (free hobby time). I thought it would be best to first try directly (that worked), than interrupts (didn't work well), than DMA (using interrupts). — Michel Keijzers, Nov 17 '17 at 01:36
Why not? That's either a question of opinion, one seeking a guess as to which possible technical reason, or in the same way too broad, and hence not a question which belongs here. To name a random example, DMA will mean more latency in claiming the data, especially since you don't get any real benefit unless you allow it to gather multiple characters. Often that might be fine, sometimes it might not. — Chris Stratton, Nov 17 '17 at 01:40
If getting interrupts working took weeks, it's because you approached the task in the wrong way; getting DMA working could well take longer - it's actually a more complex task, so the apparent ease of the more complex task over the simpler one presumably comes down to the resources you used for guidance with each, not the mechanism itself. — Chris Stratton, Nov 17 '17 at 01:45
@Michel Keijzers: you have never told the bitrate of your application, or I have missed it. I am quite surprised that interrupts did not work for you. At 72MHz and 115200 baud you have whopping 5000 clocks per character. — A.K., Nov 17 '17 at 03:58
Never assume that dma frees up the cpu, sometimes yes, the cpu keeps going, sometimes no the processor is frozen to hold the bus for the dma engine. Trivial to do this with an arm implementation, so cant just say that all arms are this way and all x86s are that way or whatever, it is not that simple, you have to always examine the system design and maybe do a bit of hacking. The chip you have may very well free up the arm core, this is just a comment on dma. As far as your question, doesnt make sense you couldnt keep up and dma + int is likely the full solution if you cant just poll. — old_timer, Nov 17 '17 at 09:15
since you are using a HAL and/or any vendor provided library the problem might not be in the chip but in the HAL or library provided. Did you carefully examine every inch of the library to insure the problem wasnt in the code or a combination of your code and theirs and the compiler? — old_timer, Nov 17 '17 at 09:17
Note MIDI is 31250 baud, so likely the arm is not breaking a sweat at all dealing with this. — old_timer, Nov 17 '17 at 09:18
@old_timer ... not even if I receive 1 byte at a time (that the setting up of the new DMA intterupt takes too much time?) — Michel Keijzers, Nov 17 '17 at 10:05
Interrupts are pretty trivial on the STM32F serial port. Why don't you post a question with your code so some of us can try to spot where you are going wrong? It's never a good idea to hack code until it works without understanding what the underlying problem was. — Jon, Nov 17 '17 at 12:12
@Jon Fully true ... I will this evening (not home now). Well considering the many posts about UART with interrupts (non DMA) on STM it is notr really trivial. — Michel Keijzers, Nov 17 '17 at 12:19
In my (not so) humble opinion, this is one of the down sides to using the awful, bloaty Cube. Write the software from scratch, you will learn exactly how the UART works (because you have to), you'll understand the peripheral much better and in the long run it will save you so much time. — DiBosco, Nov 17 '17 at 13:04
@Jon ... maybe I wait a bit ... first I want to have a working version with DMA, after that I will revert back to interrupts to see if it works or not ... than I will ask the question (I have very limited time sorry). — Michel Keijzers, Nov 17 '17 at 13:10
@DiBosco true ... however I will first continue some checks with the DMA (since it seems to work), than go back to interrupts, than probably use the 'low level' way. — Michel Keijzers, Nov 17 '17 at 13:11

Jonas Schäfer · Accepted Answer · 2017-11-17T08:35:40.687

While DMA relieves the CPU and thus may reduce latency of other interrupt-driven applications running on the same core, there are costs associated with it:

There is only a limited amount of DMA channels and there are limitations on how those channels can interact with the different peripherals. Another peripheral on the same channel may be more suited for DMA use.

For example, if you have a bulk I2C transfer every 5ms, this seems like a better candidate for DMA than an occasional debug command arriving on UART2.
Setting up and maintaining DMA is a cost by itself. (Normally, setting up DMA is considered more complex than setting up normal per-character interrupt-driven transfer, due to memory management, more peripherals involved, DMA using interrupts itself and the possibility that you need to parse the first few characters outside of DMA anyways, see below.)
DMA may use additional power, since it is yet-another-domain of the core which needs to be clocked. On the other hand, you can suspend the CPU while the DMA transfer is in progress, if the core supports that.
DMA requires memory buffers to work with (unless you are doing peripheral-to-peripherial DMA), so there is some memory cost associated with it.

(The memory cost may also be there when using per-character interrupts, but it may also me much smaller or vanish at all if the messages are interpreted right away inside the interrupt.)
DMA produces a latency because the CPU only gets notified when the transfer is complete/half complete (see the other answers).
Except when streaming data into/from a ring buffer, you need to know in advance how much data you will be receiving/sending.
- This may mean that it’s needed to process the first characters of a message using per-character interrupts: for example, when interfacing with an XBee, you’d first read packet type and size and then trigger a DMA transfer into an allocated buffer.
- For other protocols, this may not be possible at all, if they only use end-of-message delimiters: for example, text-based protocols which use '\n' as delimiter. (Unless the DMA peripheral supports matching on a character.)

As you can see, there are a lot of trade-offs to consider here. Some are related to hardware limitations (number of channels, conflicts with other peripherals, matching on characters), some are based on the protocol used (delimiters, known length, memory buffers).

To add some anecdotal evidence, I have faced all of these trade-offs in a hobby project which used many different peripherals with very different protocols. There were some trade-offs to make, mostly based on the question "how much data am I transferring and how often am I going to do that?". This essentially gives you a rough estimate on the impact of simple interrupt-driven transfer on the CPU. I thus gave priority to the aforementioned I2C transfer every 5ms over the UART transfer every few seconds which used the same DMA channel. Another UART transfer happening more often and with more data on the other hand got priority over another I2C transfer which happens more rarely. It’s all trade-offs.

Of course, using DMA also has advantages, but that’s not what you asked for.

Thanks for your detailed answer. MIDI will be the most critical part so I guess DMA is suitable for it (although speed is low: 31250 baud). I have enough DMA channels, later I'm going to use another STM32 when using 4 USARTs. I don't need to suspend the CPU, since it will have 5V USB power, and I need to do processing between the messages (to process the messages in the main loop). I have a 256 byte read and 256 byte transmit buffer. I can increase it later if needed. The STM32f103c8t6 has 20 KB RAM, the eventual STM I will use has 192 KB. — Michel Keijzers, Nov 17 '17 at 09:57
And you give me a very good idea how to improve. So far I always read 1 byte and check continuously when a complete (MIDI) message is received. But I can read the first byte, and depending on that mostly the size is known and can ask for the rest. This cost me another small buffer but that's ok. — Michel Keijzers, Nov 17 '17 at 09:59
Reading single bytes with DMA is very inefficient. For lower latency and higher efficiency, using per-character interrupts until you know the size and then switching to DMA would be favourable. — Jonas Schäfer, Nov 17 '17 at 11:27
Well I had lots of problems using interrupts (without DMA), I think I will use a 1 byte DMA receive, and after that I know how many bytes I will expect and do a DMA request to get more. — Michel Keijzers, Nov 17 '17 at 11:30
That's probably a mistake - you should fix your simple interrupt code, *without* DMA. — Chris Stratton, Nov 17 '17 at 14:58

score 11 · Answer 2 · answered Nov 17 '17 at 02:57

11

Using DMA usually means that you're no longer taking an interrupt on every character, but rather only after a "buffer full" of characters has been received (or transmitted). This increases the latency of processing those characters — the first character is not processed until after the last character in the buffer has been received.

This latency can be a bad thing, especially in a latency-sensitive application such as MIDI, where a few ms here and there can add up to serious playability issues for live performances.

answered Nov 17 '17 at 02:57

Dave Tweed

168,369
17
228
393

What I do is receiving 1 byte at a time (so a 'DMA' buffer of 1 byte) and after every DMA callback of that one byte, to store it in a ring buffer which I handle manually. In my main loop I am intending to check for complete MIDI messages and process them. – Michel Keijzers Nov 17 '17 at 09:44
3

DMA is typically used to get multiple bytes, and only interrupt when they've all been received. Interrupting after just one byte is normal when **not** using DMA, so it makes me wonder: what's the point in the extra complication of using DMA for that? – Steve Melnikoff Nov 17 '17 at 10:11
5

@MichelKeijzers Then what you do is pretty much exactly the same you would do in pure interrupt-driven implementations. Hence, there's no benefit in using DMA in this case and your original problem is probably not solved by the DMA but by your rewriting of your (ISR, setup) code. – JimmyB Nov 17 '17 at 11:07
@JimmyB ... thanks ... however due to the answer of Jonas below, I will make an improvement to read that many bytes as the message is long. I know this after receiving the first byte (in most cases). Than it will benefit more to use DMA over interrupts. – Michel Keijzers Nov 17 '17 at 11:23

score 8 · Answer 3 · answered Nov 17 '17 at 01:33

8

DMA isn't a substitute for interrupts -- they're typically used together! If you're using DMA to send data over a UART, for instance, you still need an interrupt to tell you when the send is complete.

answered Nov 17 '17 at 01:33

True indeed, maybe just on the STM32 the (pure non DMA) interrupt mechanism is a bit clumsy compared to directly DMA. – Michel Keijzers Nov 17 '17 at 01:34
2

@duskwuff Not really; you can poll to see when the DMA is done, and you might well *want to* because one of the key reasons *for using* DMA is to not have to bother with the serial port until your program is in a state where it can act on the received data. Or for outgoing DMA, you can merely poll to see if it's possible to add more to the send buffer. – Chris Stratton Nov 17 '17 at 01:43
1

@MichelKeijzers: IDK the specific chip, but usually the alternative to DMA isn't literally interrupts, it's programmed-IO (where you use CPU instructions to read/write data from/to an I/O register). In an interrupt handler, you would typically do one read, and then maybe another in case a character came in while you were reading the first, especially if that won't trigger another interrupt. Or read until an internal buffer was empty if there is such a buffer. Obviously you need more interrupts for PIO, and set them up differently. – Peter Cordes Nov 17 '17 at 08:27
@ChrisStratton Good point ... so far I haven't checked into if it is possible to transmit, I just transmit something, not checking if it is ok. Probably if not, I try again later. – Michel Keijzers Nov 17 '17 at 09:46
@PeterCordes It seems the STM32 has enough interrupts for DMA and I read every time just 1 byte. Even the simplest STM32(F103c8t6) has enough DMA ports/interrupts available. – Michel Keijzers Nov 17 '17 at 09:47
Reading one byte with DMA has limited merit; probably reserved for the case where the peripheral has no holding register (unlike those on most modern MCUs) and so the received byte must be extracted immediately to make room for the rest, but the processor for some reason can't service an interrupt. Note that DMA can still distort main program timing, since it forces bus arbitration if the CPU also needs to get at RAM or the same peripheral bus - though not by as much as servicing an interrupt would. – Chris Stratton Nov 17 '17 at 14:53

score 5 · Answer 4 · answered Nov 17 '17 at 01:56

Using DMA introduces some interesting questions and challenges beyond all the other considerations of UART peripheral use. I'll give you a few examples: Assume that your uC is sitting on a RS485 (or whatever) bus with other devices. There are many messages on the bus, some are intended for your uC, some aren't. Additionally assume that these bus-neighbors all speak a different data protocol, which implies the message lengths are different.

Some questions that only come up when using DMA are:

when do I interrupt?
- DMAs only really like to interrupt when they've transferred a preset amount of data.
- What do you do if you never receive enough data to trigger a DMA interrupt?
What if you only receive a partial message when the DMA interrupts?
What do your RX buffers look like? Are they linear or circular?
- DMA can be an unruly circular buffer participant in the sense that it only obeys the address boundary, but has no problem blowing past the other pointers in the circular buffer system.

Anyway, just food for thought.

Thanks for those considerations. Currently I always receive 1 byte and store it in a ring buffer, since indeed my messages (MIDI) can have different lengths and I don'tknow which I get next. In my main loop I check for complete messages to process them (and if complete, I remove them from the ring buffer). So I always will receive enough data (unless I would miss bytes, I have to check into that). My RX buffer is only 1 byte, but I copy it to a ring/circular buffer. I didn't do checks if it is full (need to add it). — Michel Keijzers, Nov 17 '17 at 09:51
Hey, no worries. I'm sure your application will be well-programed. As others mentioned, DMA is great, but isn't free is all. It introduces extra considerations in to the system which don't exist if you can get away without using it. — pgvoorhees, Nov 17 '17 at 12:27

score 3 · Answer 5 · answered Nov 17 '17 at 01:46

3

On the receive side (as I recall) DMA terminates either on a character match or at the terminal count. Some protocols and many interactive applications don't fit easily into this model and you really need to handle things character by character. The DMA techniques can also be brittle if the communications link is unreliable, loosing a single character in the stream can easily mess up your DMA state machine.

answered Nov 17 '17 at 01:46

Dean Franks

3,441
2
13
19

I indeed receiving byte by byte and copy it manually to a ring buffer to process it later. – Michel Keijzers Nov 17 '17 at 10:00

uɐɪ · Answer 6 · 2017-11-17T09:08:35.547

I have used the STM32CubeMx/HAL on a couple of projects now and have found that the UART handling software that it generates has definite shortcomings on the receive side.

On transmit you will normally want to send a block of data or line of text. In this case you know up front how long the data transfer is and so using the DMA is an obvious solution. You get an interrupt once the transfer is complete and can use the UART TX complete callback function to indicate to your main code that the transmission is complete and you can send another block of data.

When it comes to data reception the functions provided by ST all assume that you know how many characters the sending device will be giving you before it starts to send. Normally this is not known. The interrupt functionality places the data received into a buffer and only indicates that there is data available when the pre-defined number of characters has been received. If you try to use the DMA or interrupt functionality to receive data by setting up sequential single character transfers then the setup time for each of these will mean that you will lose characters at anything other than the slowest data rates (the baud rate that you will start to lose data will depend upon your processor clock speed) and will load the processor excessively ,leaving no instruction cycles for any other processing

To get round this I have written my own interrupt handler function that stores the data in a small local circular buffer and sets a count that is read by the main code (an RTOS counting semaphore) to indicate that there is received data ready. The main code can then collect the data from this buffer at its leisure, it does not matter if there is some delay in collecting the data providing that the local buffer does not overflow before the data is collected.

I do exactly the same (I think). I read 1 byte at a time and store it in a cyclic buffer, and I'm intending to check in the main loop for complete messages. Can be enhanced a bit though. — Michel Keijzers, Nov 17 '17 at 10:01
Do you think I might run into the problem that setting up the DMA each time will overload my processor/missing characters at 31,250 baud? — Michel Keijzers, Nov 17 '17 at 10:03
As long as you set the DMA up to transfer a number of characters at a time than this will not be a problem. I have 4 UARTs running 115200 and higher and I2C using DMA without problems. The UART transmissions are all ~20 bytes or longer. The problem was using DMA for receive on the UART (L4 processor at 80MHz, 9600baud). — uɐɪ, Nov 17 '17 at 10:14
Currently I set it to 1 byte at a time, but I can improve it (by doing first byte, andn than check how many further bytes are needed). — Michel Keijzers, Nov 17 '17 at 10:19

Why not always use DMA in favor of interrupts with UART on STM32?

6 Answers6