4

I'm trying to write atomic code, in my example below I need to perform simple operation a ^= 1;

    static volatile int a = 0;

   //-- a ^= 1;
   __asm__ __volatile__( "xori    %0,     %0,     1"
         : "=r"(a)
         : "r"(a)
         );

Generated code is not atomic:

9D0014E8  8F828018   LW V0, -32744(GP)
9D0014EC  38420001   XORI V0, V0, 1
9D0014F0  AF828018   SW V0, -32744(GP)

As I see in docs, operations LL and SC provide atomic Read-Modify-Write sequence. How can I make compiler to generate code with LL, SC instead of LW, SW? I tried to write such code myself:

  static volatile int a = 0;

  __asm__ __volatile__( "ll      $t1,     0(%0)": : "r"(a) );
  __asm__ __volatile__( "xori    $t1,     $t1,     1" );
  __asm__ __volatile__( "sc      $t1,     0(%0)": : "r"(a) );

But this is wrong, result is other than I need:

140:                       __asm__ __volatile__( "ll      $t1,     0(%0)": : "r"(a) );
9D001454  8F828018   LW V0, -32744(GP)    # WRONG! | I need for LL T1, -32744(GP) instead of
9D001458  C0490000   LL T1, 0(V0)         # WRONG! | these two LW, LL instructions
141:                       __asm__ __volatile__( "xori    $t1,     $t1,     1" );
9D00145C  39290001   XORI T1, T1, 1
142:                       __asm__ __volatile__( "sc      $t1,     0(%0)": : "r"(a) );
9D001460  8F828018   LW V0, -32744(GP)    # WRONG! | I need for SC T1, -32744(GP) instead of
9D001464  E0490000   SC T1, 0(V0)         # WRONG! | these two LW, SC instructions

How can I do that?

Dmitry Frank
  • 307
  • 1
  • 9

1 Answers1

3

Well, there's one of these happy moments when I need just to ask someone, and solution comes to my head immediately:

   __asm__ __volatile__( "ll      $t1,     0(%0)": : "r"(&a) );
   __asm__ __volatile__( "xori    $t1,     $t1,     1" );
   __asm__ __volatile__( "sc      $t1,     0(%0)": : "r"(&a) );

I.e. I need to use &a instead of a. Now, generated code is:

104:                    __asm__ __volatile__( "ll      $t1,     0(%0)": : "r"(&a) );
9D001434  27828018   ADDIU V0, GP, -32744
9D001438  C0490000   LL T1, 0(V0)
105:                    __asm__ __volatile__( "xori    $t1,     $t1,     1" );
9D00143C  39290001   XORI T1, T1, 1
106:                    __asm__ __volatile__( "sc      $t1,     0(%0)": : "r"(&a) );
9D001440  E0490000   SC T1, 0(V0)

Which seems to be what I need. Note: to make it better, we need to use "beqz" instruction in order to loop if SC failed (there's an example in MIPS32 instruction quick reference). But this is another story.

More, at microchip forum user andersm suggested to use GCC's atomic builtins instead of re-inventing the wheel. (But, these builtins add two sync instructions that are useless on PIC32, so, it might make sense to write my own macro)

Dmitry Frank
  • 307
  • 1
  • 9
  • Without researching it too deeply, it looks like this instruction pairing is only potential atomic - ie, it either works atomically or else it fails to write back and sets a flag. You don't seem to be checking for / handling the failure possibility in the way the example code at your instruction reference link does. It is permissible to accept your own answer if you are fully satisfied with it. – Chris Stratton Jul 05 '13 at 15:18
  • @ChrisStratton, i can't understand which instruction pairing are you talking about, and how instruction pairing might make code atomic in general. Atomicity in the code above is achieved by instructions `LL` and `SC`. As to assepting my answer, I'll surely accept it, when system will permit it to me (I can accept it after two days only) – Dmitry Frank Jul 06 '13 at 09:27
  • This answer admittedly solves the issue of incorrect assembly generation, but what Chris meant was that three separate assembly instructions *cannot* ensure a single atomic operation. This code can be interrupted at any point in between these instructions, so if you don't check if SC failed, you don't gain any benefits compared to your initial code *in terms of atomicity*. The other forum user was right that it's better not to reinvent the wheel and simply use `__sync_fetch_and_xor` instead (if your compiler/mcu combo permits it). – vgru May 10 '17 at 08:03