Detecting DMA overflow in arbitrary waveform generation

Question

I'm generating a complex series of pulses on an STM32F103, essentially as described in the ST app note AN4776 General-purpose timer cookbook, section 5.3. As a quick summary, that means I'm using the timer's DMA burst mode to transfer new values to the ARR, RCR and CCR1 after every update event of the timer. A more detailed description is provided in the end of the question, in case you're not familiar with the app note.

My problem is related to the DMA bandwidth: the pulses I generate can in principle be arbitrarily closely spaced, including just 1 clock cycle (the timer has a prescaler of 1). Of course in this case, the DMA cannot possibly make it in time to transfer the data for the next pulse, and there will be glitch in the output. Since my application can tolerate small infrequent timing errors, I preprocess my pulses such that the smallest pulse interval is some fixed minimum (I'm using 96 clock ticks at the moment) to give the DMA a chance. However, I'm still getting glitches that I think are due to the DMA, and increasing the minimum time even to very large numbers doesn't seem to be helping.

The key part of the previous sentence is "...I think...". So I'd like to find a way to know for sure whether the DMA has missed it's transfer or not, preferably something that I can leave in the code always running, so that I'll find even very infrequent glitches.

What I've though of/tried this far is:

Looking at the DMA error registers. However, there doesn't seem to be a "transfer missed" flag or anything similar, which makes sense since most of the time it's not an error if a transfer is pending for a while. Also, even in my case it isn't necessarily an error if the transfer is pending for a while, so I guess this isn't going to help me.
I'm running this on two different timers, TIM8 and TIM1, outputting pulses on two different pins. The sequences of pulses I'm preparing are always 0.5ms long in total (so that I can react to external events within 1.5ms), i.e. there's an idle pulse if necessary such that one of the timer updates happens at exactly the 0.5ms boundary. The TIM8 is the master, and I actually time my main loop by waiting for the DMA queue to be at the 0.5ms boundary, that is, waiting for a specific value of the DMA CNDTR register (I've checked by oscilloscope, toggling a pin in the main loop, that this works and is accurate). Now, when I've come to the right time according to TIM8, I know exactly where the TIM1 queue should be, so I can check its CNDTR. The reason I think that there are glitches is that this check fails some times (once every few minutes, so often on human time scales, very rarely in terms of numbers of pulses).
What I'm working on now: I set up another timer as reference, and use its capture channel to get the value of the reference timer whenever TIM8's update signal is asserted. The captured value should always be exactly one of the pulse instants, and especially I can check the latest captured value in the main loop at the 0.5ms boundaries. Using another timer, I can do the same check on TIM1. The downside here is that this doesn't yet quite guarantee that the transfers have completed, since the registers in the main timer are updated in order ARR, RCR, CCR1. So if ARR has been updated but CCR1 was missed, the pulse instants themselves would be correct, even though the output pulse lengths wouldn't be.

Of course, the glitches I'm seeing could be due to a bug in the code that's generating the buffers the DMA is sending to the timers. But that's exactly why I'd like to know for sure whether or not the DMA is missing some transfers, so I'd know if I'm on the hunt for a bug in my code or not. The code itself does pass a reasonably comprehensive set of unit tests, so the bug would be a subtle one if it's there.

So, any ideas on checking if my problem is due to DMA misses or not?

Details of what I'm exactly doing:

I'm generating a series of pulses, each determined by the length of the "on" phase (pulsewidth) and the time (in clock ticks) until the next pulse. In terms of data structures

struct pulse {


     /*
     * These are an image of the timer registers.
     * We don't use repeats but it must be there
     * since the DMA transfers it anyway
     *
     * TODO: support other capture/compare channels than 1
     */


    uint32_t length; //ARR
    uint32_t repeats; //RCR, we don't use this
    uint32_t pulsewidth; //CCR1
};

//Check that pulse is of the correct type for the DMA stream
static_assert(std::is_pod<pulse>::value, "pulse must be POD");
static_assert(sizeof(pulse) == 12, "pulse must not be packed");

pulse pulseArray[MAX_PULSES];

Then, the burst mode of TIM8 (and correspondingly, TIM1) allows us to set up the following scheme:

The DMA channel of the timer is programmed to move from pulseArray to the TIM8_DMAR "DMA address for full transfer" register, in circular mode (so I of course have to keep filling it with new data in a similarly circular fashion)
the TIM8_DCR "DMA control register" is programmed with burst length 3, and base address TIM8_ARR (see datasheet RM0008 page 360-361 for a detailed description of these registers), and the DMA request for update is enabled via TIM8_DIER bit 8
Now, on each update of the timer, due to the burst mode programmed above, the timer activates the DMA request 3 times transferring, in this order, the fields length, repeats, and pulsewidth, and the burst mode directs these to TIM8_ARR, TIM8_RCR and TIM8_CCR1 registers. Since we've enabled preload on both the timer itself (TIM8_CR1 bit 7) and the compare channel (TIM8_CCMR1 bit 3), the data are then transferred to the preload register, and become active at the next update (i.e. when the current pulse completes)

Figure 30 and 33 in the app note are very enlightening in understanding the above.

And now I can state the problem in a bit more detail: assume length is, for example, 1. Then the amount of clock cycles available for the DMA to transfer the next pulse (which is actually 2 positions ahead in the buffer due to the preload registers, but that's not important here) is one (assuming timer prescaler is 1, which it is in this case). Obviously it is impossible for the DMA to transfer 3 16-bit words in one clock cycle, and therefore the values of the previous cycle get repeated. Lets call that a "DMA miss", for want of a better term.

On the other hand, there must be some minimum length such that during any pulse longer than that, the DMA will have time to transfer all the data. Unfortunately this minimum length depends on the exact bus timings, other DMA traffic and their priorities, so it's really difficult to determine that length by pen and paper.

So I'd like to find a way to detect, with as much certainty as I can, that a "DMA miss" has not happened, so I could fine tune my minimum length, and at the same time be sure that some other glitches I'm seeing are not due to a "DMA miss".

How are you even using DMA in burst mode on a STM32F1? From the timer cookbook: In STM32 microcontroller families, there are two DMA peripheral variants: • The DMA burst transfer feature supported only by the STM32F2 products DMA peripheral variant where the DMA peripheral can transfer a configurable number of data elements next to a single data transfer trigger. • In another variant, like the one on the STM32F1 products, the DMA peripheral variant supports only single transfers; this means that only one data element is transferred next to a data transfer trigger. — jms, May 22 '18 at 08:52
@jms the *timer* has a burst mode, se RM0008 page 360, or the equivalent page for the general purpose timers. The cookbook is indeed a bit confusing on this point, I remember thinking too at some point that this can't be done on the F1 -series. — Timo, May 22 '18 at 08:54
show the waveforms, what exactly you want to archive and what glitches are. Your description is impossible to understand. Do no mess method with your target — 0___________, May 24 '18 at 15:20
@PeterJ_01 I added a more detailed description in the end, hope that helps? — Timo, May 25 '18 at 10:12
@timo IMHO, the answer to this is going to turn on the details of your actual implementation. What config structures look like, whether or not you are using ST's pre-baked libraries, and what other things are running. Is there any way you can post the entire code base? — pgvoorhees, May 25 '18 at 12:56
The DMA parallel transfers must be completed before serial buffer empties like a PISO FIFO except I guess it cycles thru random interleaved raster scanning memory writes and reads. Seems a complicated random data test set? — Tony Stewart EE75, May 26 '18 at 08:02

score 1 · Answer 1 · answered May 31 '18 at 12:14

You use DMA in circular mode; how do you determine the update time of the registers? As you use only 1 buffer, there is no such trivial timing that won't cause glitches. You only will get more close to be glitch-free by increasing precision of timing but never exactly glitch-free in probability terms.

There is an easy-to-use double buffering feature on stm32f4 devices. But on stm32f1 devices, there is elements of it to do it manually. You have the half-transfer and transfer-end interrupt triggers there on DMA peripheral. Design the circular buffer as two sets of registers, as with every interrupt, swap the buffer pointer that assumed being read by DMA and to be updated by CPU.

score 0 · Answer 2 · 2018-05-28T07:32:54.720

When you have a complex set of over-lapping timing issues, your best bet is to set up a hypervisor, a 'boss of bosses'. You are creating a 4 phase master semaphore that repeats count 0-3 endlessly.

I call them Phase A, B, C, D. They act as enablers for specific threads to run. Linux and Android have this built-in. I have used it in LabView and MPLAB. Any complex project has a 4 phase clock to time events. A defacto RTC.

At worse some piece of code has to wait its turn, but it has no conflicts when it does run.

Phases:

A. Read log of previous run, then set up conditional checks and preset/reset crucial values. Open window for ISR's to run during phase A only.

B. Run crucial deterministic code next. Read any results of ISR's. This determines some of what runs next, depending on clock cycles consumed and priority. Error handlers run first.

C. Run main code based on results of phase A and B. This includes resuming or starting DMA burst modes, now that you have time slots based on phase C run time. Write code so it checks DMA flags before phase C times out.

D. Halt DMA transfers. Check for errors. Check 'shadow' timers. Check for over runs or if time for a short DMA burst to complete a packet. Do CRC and any error flags, now that you have a time slot for them. This is where you would find an error of miss-matched DMA vs. a shadow counter, etc.

Leave a log or numeric code as to pass/fail status and is it ok for phase B to run normal path or an error handler.

Thanks for the answer! This is indeed very much what I'm doing right now (point 3 in the first list in my question). However, the problem is that as far I understand, there doesn't seem to be any flags or the kind that would tell me if the DMA has failed or not, which is what I'd do in your point D. Also, the process is realtime and any misses are rare anyway, so stopping to check is a bit difficult. — Timo, May 29 '18 at 11:51

Detecting DMA overflow in arbitrary waveform generation

Details of what I'm exactly doing:

2 Answers2