Synchronization mechanism suitable for bare metal applications

Question

I have been developing a bare metal control application on dsPIC33EP256MC506. My application consists of infinite loop in the background and three foreground "application" interrupt service routines (isr). Besides those application isrs there is also let's say system isr which services the SPI end of transaction interrupt requests. In each SPI interrupt new status of the remote digital inputs (state of contactors) is being read. This information is then used for the calculations (logic expressions determining when to close or open individual contactors) done in the background loop.

My problem is that I am not sure how to ensure that during one pass through the background loop the state of the remote digital inputs will be consistent. Better saying I have been looking for a mechanism how to avoid following situation

Can anybody recommend me a simple and robust solution for this kind of problem?.

EDIT:

Below given answers inspired me to following possible solution. I will define SPI driver:

The update function will be executed in the background loop and it will do following

if(new_data_ready){
    new_data_ready = false;
    transferDataToMirror();
}
startTransaction();

The SPI end of transaction interrupt will be serviced in following simple manner:

new_data_ready = true;

The client's code from the SPI driver point of view will access to the digital_ inputs_mirror via getInputsState function call. The digital_ inputs_mirror will be updated in synchronous manner in the background loop via transferDataToMirror() which will retrieve data from the SPI peripheral registers.

set a flag at the end of the SPI transaction to denote there is new data available. The background test can test this and do what is required. — Kartman, Feb 05 '21 at 11:58
If your compiler doesn't come with instruction re-ordering (somewhat unlikely for bare metal), then you can use a simple bool flag. See this: https://electronics.stackexchange.com/a/409570/6102 — Lundin, Feb 05 '21 at 15:26
L3sek, this sounds an awful lot like most of what I've done for decades. I developed code for scientific and commercial instrumentation most of my life. Much of it "bare metal" and almost all of it where I wrote all of the O/S code, state machines, etc., as required. But there's insufficient information for me to get a clear bead on what you are doing, despite having written so much text above. Can you be more specific and list each and every one of your inputs, their purposes and estimated frequencies, processing and est. time needed, and outputs and purposes? Disclose all that you can? — jonk, Feb 05 '21 at 20:13
If you find yourself wanting to do threading, that's generally a sign that your project has outgrown the "bare metal" stage. — Mark, Feb 05 '21 at 21:28
@Mark do you think that the solution which I have attempted to sketch in the "edit" part of my question is usable? — L3sek, Feb 06 '21 at 15:21

score 4 · Accepted Answer · answered Feb 05 '21 at 11:59

4

Pretty simply: don't directly work on the data the ISR modifies. Instead, in an atomic operation, copy over the potentially volatile variables from the ISR-modified location to your loop state.

Alternatively, if there's more data than you can copy atomically, you'll need to teach your ISR how to write into a ring buffer, and your main loop how to read from one, so that you're never modifying a piece of data that's still being used.

Remark: What you do sounds like 100% an application of a RTOS. These very slim pieces of operating systems are available for your CPU, too (promise! If you can run compiled C on it, someone has ported a small RTOS to it), and you should be using them, exactly because task synchronization is hard and it's a good idea to not do task juggling yourself. It's not any less "bare metal" (you can still write nearly exactly the same code if you want), you just get primitives for executing tasks, exchanging data etc. I don't know your MCU, but look into ChibiOS (if you want something really small) or FreeRTOS (if you want something small with a large user base).

answered Feb 05 '21 at 11:59

Marcus Müller

88,280
5
131
237

Well, he didn't say his target. However from about the Cortex-M3 up freertos is viable if you have the memory. Good luck with a PIC12 though :D – Lorenzo Marcantonio Feb 05 '21 at 13:00
Huh, nothing wrong with using FreeRTOS on a cortex-m0. Yeah, smaller PICs will be challenging :) But, then again, all questions by this asker go through lengths to forget to mention the actual microcontroller family used, but just mention that it runs C, so one can't be sure. – Marcus Müller Feb 05 '21 at 13:26
@LorenzoMarcantonio by the way, you might really find ChibiOS/NIL refreshing if memory footprint is important to you :) – Marcus Müller Feb 05 '21 at 13:29
@LorenzoMarcantonio https://github.com/ChibiOS/ChibiOS/tree/master/demos/AVR/NIL-DIGISPARK-ATTINY-167 (admittedly uses an AVR with 512 B of RAM instead of a PIC12 with 256 B) – Marcus Müller Feb 05 '21 at 13:36
@MarcusMüller thank you very much for your response. I have attempted to summarize how I have understood your idea in my original post. – L3sek Feb 05 '21 at 15:05
This appears to be C so keep in mind that you _can't_ get atomic access unless you use C11 `_Atomic`or inline assembler. Without those, you are left to the whims of the compiler. It may or may not generate atomic access today, and it may generate something else next time you change the code. Also be wary of the very common misconception "my MCU is 32 bit so all 32 bit access is atomic". That's only true on the asm instruction level, not in C code. – Lundin Feb 05 '21 at 15:22
@Lundin it's not quite as bad; alignment specifiers in C do guarantee memory alignment, and CPU specs might guarantee atomic access for aligned words. But generally, you're right, I'm hazardly mixing C and CPU semantics here. – Marcus Müller Feb 05 '21 at 15:48
@MarcusMüller: Unfortunately, because the Standard said the semantics of `volatile` were implementation-defined, compiler writers who didn't have to meet the needs of paying customers have decided to use semantics that are insufficient to establish a mutex even on single-core systems. – supercat Feb 05 '21 at 21:41
@MarcusMüller The point is that most C variable access often goes as: 1. load stack value into register, 2. do atomic stuff with register. 3. write register to stack. You can get interrupted between any of those instructions and then it does the program little good that instruction 2) is atomic in itself. – Lundin Feb 08 '21 at 09:49

score 3 · Answer 2 · answered Feb 05 '21 at 12:00

3

There is a standard solution for this: the ISR always has priority over user code so the conflict can happen only in one direction. I use shadow variables so that the user copy is updated only when it's safe to do so. In pseudo-C:

volatile int io_status_isr;
int io_status;

void interrupt isr()
{
    /* acquire stuff from whatever */
    io_status_isr = stuff;
}

main()
{
    while (1) {
        disable_IRQ();
        io_status = io_status_isr;
        enable_IRQ();

        /* NEVER use io_status_isr here! only io_status */
    }
}

This is the basic pattern; you can avoid disabling the irq if your µc has atomic moves (few have) but usually copying a couple of variable doesn't hurt the realtime performance. Remember to use the volatile keyword to access variables used in ISR, otherwise the compiler could optimize stuff away!

answered Feb 05 '21 at 12:00

Lorenzo Marcantonio

8,231
7
28

I think these days, most MCUs have atomic moves (at least on a word level, and that's 32 bit for a lot of modern ones; cortex-M does specificially have "exclusive load/store" instructions, especially!) I'd say disabling ISRs is a relatively problematic solution (why use an ISR if it's not always allowed to interrupt? This can lead to dangerous conditions in control systems, but might be OK in soft-realtime systems like PC computers), and would be a last resort, imho. Especially since, as said, there's atomic moves on most platforms these days, so you'd only need that for larger pieces of data, – Marcus Müller Feb 05 '21 at 12:04
where copying actually takes some time, and thus you're disabling your IRQ for a relatively long duration. A lockless double buffered/ring buffered approach seems wiser here. (especially since double buffering is just "using the shadow state in a different manner", not requiring any additional resources) – Marcus Müller Feb 05 '21 at 12:07
He was asking for a 'simple' solution and no specific target environment. In hard realtime you should measure the actual irqless time to be sure. Also he's using a SPI which is a relatively slow peripheral but we don't know if it has a FIFO or DMA attached. For a word or two in my experience the shadow approach is fine 99% of the times. If he was, for example, generating µs pulses with irq driven timers I agree that disabling irqs wouldn't be a viable choice – Lorenzo Marcantonio Feb 05 '21 at 12:19
:) it's not that a disagree with the solution, it's still a last resort to me: it's easy, yes, but it can very quickly lead to more pitfalls, and having two places where the ISR alternatingly writes (assuming we can guarantee the main loop to be fast enough to ensure two IRQs won't happen within one iteration) isn't really hard, in my opinion! – Marcus Müller Feb 05 '21 at 12:22
Well, by definition if the main loop is fast enough to ensure two IRQs won't happen the problem doesn't exist in the first place, probably:D even without atomic increment or something like that the easiest way would be double buffering with the buffer index toggled by the main loop: the ISR always write on A & 1 and the main loop reads on (A+1) & 1. When it's safe increment A and *at worst* you get previous data (but still consistent). However you have to index all the access to the structure (i.e. longer amortized time). Can't say which would be better without hardware profiling – Lorenzo Marcantonio Feb 05 '21 at 12:58
1

I honestly think we're saying the same thing here! the atomic move only makes sense when you need to move a single word; if you need to do more than that, double buffering works fine without (as long as memory ordering is intact, but I wouldn't know a single MCU core that wouldn't guarantee that) – Marcus Müller Feb 05 '21 at 13:39
@MarcusMüller I believe that the LDREX and STREX for the Cortex-M only work for multiprocessor systems with shared memory. I don't think they protect a given processor from its own ISRs. But I would love to be proven wrong... – Elliot Alderson Feb 05 '21 at 14:45
@ElliotAlderson huh, you might be right there! I never actually tested it: https://developer.arm.com/documentation/dht0008/a/arm-synchronization-primitives/exclusive-accesses/ldrex-and-strex – Marcus Müller Feb 05 '21 at 15:03
1

Notably you should never disable the global interrupt mask for this, but rather the specific hardware peripheral interrupt, in this case an SPI one. – Lundin Feb 05 '21 at 15:25

Synchronization mechanism suitable for bare metal applications

2 Answers2