ARM How to invoke branching?

Question

Was looking through a code with regards to loop.

loopinner ....
          SUBS R2,R2,#1 ; j--
          BGT loopinner ;in this case, loop should continue when j>1

In this case, I am not sure how BGT branches to the loopinner again. Don't I need to specify what it is greater than? Since SUBS invoke the flags, let's say if j-- becomes the value of 1. How does the branch knows what value it is greater than?

This is your third simple question about ARM assembly in the last 12 hours. Are you in the middle of some kind of exam? — Elliot Alderson, Aug 10 '21 at 11:13
@ElliotAlderson not exactly, but I just started to learn this module in advance ahead of the planned schedule, and am interested to learn more about ARM. — Meep, Aug 10 '21 at 11:24
Branch instructions look at the state of various flags set by the previous instruction. — Peter Bennett, Aug 10 '21 at 16:34
Note that it will actually loop if j>0 (using the value of j after the decrement), not j>1. — jcaron, Aug 12 '21 at 06:58
what part of the arm documentation do you not understand? this is clearly documented. — old_timer, Aug 13 '21 at 08:57
@Meep You okay? Hope everything is going okay for you. Best wishes! — jonk, Aug 24 '21 at 04:08
@jonk Yep all good! Have been learning on my own steadily, thanks for all the help here! — Meep, Sep 05 '21 at 13:05

jonk · Accepted Answer · 2021-08-13T01:10:18.960

From ARM conditionals you can readily find that the instruction examines the Z, N, and V status flags and branches when Z=0 & N=V. Since it examines the V status flag and not the C status flag, this is clearly intended as a signed test. (This means to me that this isn't useful for unsigned loop control -- FYI.)

I wrote this not so long ago, with enough information to understand what's going on. But I can summarize it here.

Let's use simpler 4-bit words where there are only 16 symbols:

Word     Signed     Subtrahend
0000         0         1111
0001         1         1110
0010         2         1101
0011         3         1100
0100         4         1011
0101         5         1010
0110         6         1001
0111         7         1000
1000        -8         0111
1001        -7         0110
1010        -6         0101
1011        -5         0100
1100        -4         0011
1101        -3         0010
1110        -2         0001
1111        -1         0000

Above, the third column is what the ALU actually uses when subtracting by that value. It simply inverts each bit before adding. (The ALU never subtracts anything. It doesn't even know how.) So, the SUB instruction actually performs addition, using the subtrahend form of the value when adding. (If you want to understand status bit semantics, it's pretty important that you master this concept as it will help you when you'd otherwise be confused.)

Stamp it onto your forehead --

A CPU ONLY ADDS. IT CANNOT SUBTRACT.

If you ever feel the temptation to go down the primrose path of believing that any kind of subtract instruction actually subtracts, and this includes all comparison instructions that set status bits but don't change register values, just kick yourself really hard, really fast. It doesn't happen.

A CPU ONLY ADDS. IT CANNOT SUBTRACT.

Everything has to be cast into addition semantics. Everything.

A SUBS R2, R2, #1, in this 4-bit universe I just created, would add 1110 plus a carry-in of 1, as well. There are only 16 possibilities:

Actual Operation    Operation Result    Operation      Comparison
 R2     SUBS OP        Z N V C ALU       Semantics      Semantics   Z=0 & N=V?
0000 + 1110 + 1        0 1 0 0 1111     0 - 1 = -1       0 > 1 ?    False
0001 + 1110 + 1        1 0 0 1 0000     1 - 1 =  0       1 > 1 ?    False
0010 + 1110 + 1        0 0 0 1 0001     2 - 1 =  1       2 > 1 ?    True
0011 + 1110 + 1        0 0 0 1 0010     3 - 1 =  2       3 > 1 ?    True
0100 + 1110 + 1        0 0 0 1 0011     4 - 1 =  3       4 > 1 ?    True
0101 + 1110 + 1        0 0 0 1 0100     5 - 1 =  4       5 > 1 ?    True
0110 + 1110 + 1        0 0 0 1 0101     6 - 1 =  5       6 > 1 ?    True
0111 + 1110 + 1        0 0 0 1 0110     7 - 1 =  6       7 > 1 ?    True
1000 + 1110 + 1        0 0 1 0 0111    -8 - 1 = -9 E    -8 > 1 ?    False
1001 + 1110 + 1        0 1 0 1 1000    -7 - 1 = -8      -7 > 1 ?    False
1010 + 1110 + 1        0 1 0 1 1001    -6 - 1 = -7      -6 > 1 ?    False
1011 + 1110 + 1        0 1 0 1 1010    -5 - 1 = -6      -5 > 1 ?    False
1100 + 1110 + 1        0 1 0 1 1011    -4 - 1 = -5      -4 > 1 ?    False
1101 + 1110 + 1        0 1 0 1 1100    -3 - 1 = -4      -3 > 1 ?    False
1110 + 1110 + 1        0 1 0 1 1101    -2 - 1 = -3      -2 > 1 ?    False
1111 + 1110 + 1        0 1 0 1 1110    -1 - 1 = -2      -1 > 1 ?    False

Under Operation Result I have a column for ALU. The ALU field is what goes back into R2 after the SUBS instruction completes. (The V status flag is generated by an XOR of the carry-out of the next-to-most significant bit during the operation and the carry bit itself.) Note also that there is a single case marked with E where a signed overflow occurred.

You can now easily see why the BGT instruction applies those particular status bits in exactly the way it does. Admittedly, this uses 4-bit words. But the exact same idea applies to much wider word sizes, without any change to it.

Looking back at the table, you can see that the condition is True if and only if R2 was 2 or greater before the subtraction, and not 1 or 0 or smaller.

Your question:

Don't I need to specify what it is greater than? Since SUBS invoke the flags, let's say if j-- becomes the value of 1. How does the branch knows what value it is greater than?

Let's start with the following table from the ARMv6-M Architecture Reference Manual, page A6-99:

The GT condition is described as "Signed greater than". The reason the documentation doesn't specify a constant is that this test occurs after some prior instruction. That prior instruction defines the context. But without having that context, all that can be said is a general signed >.

So, if the prior instruction were CMP:

Then the context would be the comparison of two signed values and the BGT instruction would then mean "branch when signed operand 1 is greater than signed operand 2."

But in your case, with "SUBS R2, R2, #1" the context changes and the BGT instruction would then mean "branch while signed R2 still remains greater than 0."

The conditional branch instruction itself doesn't actually know what the prior instruction was. It also doesn't even know what register(s) are involved. That knowledge is left to the individual (or compiler) that is generating the instruction stream. So the branch instruction doesn't actually have a fixed constant value, nor does it have a register with which to compare against. It depends entirely upon what earlier instructions did with the status bits. It just examines the resulting status and then does what it does. It's up to you to know the context and to use it, correctly.

(Speaking of which, the source code comment may be misleading or wrong.)

Note

Elliot takes issue (see discussion below) without evidence. He writes, "I could equivalently argue that a CPU can only subtract." He can make that argument, but it is only academic. The actual fact of the matter is that CPUs don't subtract. They add.

So while this is partly my response, providing clear, unequivocal evidence in support so that even Elliot can understand the situation on the ground, today, it's also an excellent segue, too. So I'm very glad for the opportunity Elliot affords me in expanding the discussion.

My first CPU was made from 7400 parts that I built and successfully completed in 1974. Newspaper reporters, to my surprise, showed up and wrote an article about it. That's my first experience. Since then, I professionally worked at Intel doing chipset testing for the BX chipset and, as a matter of relevance to teaching this subject, I've taught Computer Architecture classes as an adjunct professor at Portland State University in the 1990's, with class sizes of approximately 65-75 students. This is the largest 4-year university in the State of Oregon.

I feel equivocation (expressing ambivalence about how computations might be done) about how processors generate their status bits and how they compute only leads students into unnecessary uncertainty, confusion and difficulty that can take hours, weeks, months and sometimes even years to correct. Just as teaching group-theoretic abstract algebra before getting the basics across would confuse most first-year algebra students, so also would teaching academic abstractions about how computers could do things. More students would be damaged, than helped.

The simple truth is that instruction decoding emits an ADD, even when the instruction text (it's just text, after all -- it's not what is actually going on) says SUB. The decoding still issues an ADD. It just modifies some operand details along the way.

Similarly, as it must also be in the case of the ARM processor, the above theory is all you need to understand how things are actually done.

Please don't confuse yourself! Computers add. They don't subtract. They just fiddle around a bit to make it look like they subtract.

For good or bad, it's important to understand what a computer actually does in order to understand certain status bits; what they do and why they do it. There's no other way around it. The above theoretical model is the way things work in modern processors and it is how to work out and understand the status bits, correctly. There is a good reason why things are the way they are.

It's my hope that these details, above, and those I'll write below will be useful. Any failure to communicate here is mine and I'll gladly work to repair, amend, and improve this document where I may.

To continue, I'll be using the ARMv6-M Architecture Reference Manual as a reference.

Let's start on page A6-187 (register case):

Here, you can see that they clearly document this behavior:

AddWithCarry(R[n], NOT(shifted), '1')

This is an addition, with operand 2 (the subtrahend) inverted and the carry-in set to '1'. Just as I wrote happens, above. (It's just how it is done.)

In the case of multi-word extensions, go to page A6-173, and find SBCS:

Here note that they again use addition:

AddWithCarry(R[n], NOT(shifted), APSR.C)

Instead of the carry-in being a hard-coded '1', as it is for the SUBS instruction, it's now using the last-saved carry-out value. In this case, it's usually expected that this will be the carry-out from a prior SUBS (or SBCS) instruction.

For multi-word operations, one starts with SUBS (or ADDS) and then continues the process with subsequent SBCS (or ADCS), which use the carry-out of earlier instructions to support a multi-word operation.

In multi-word addition, this carry-out can be thought of just as a carry-out, which it is. A '1' indicates that a carry occurred and needs to be dealt with. A '0' indicates no carry occurred.

In the case of multi-word subtraction, this carry-out is better seen as an inverted borrow-from. A '1' indicates that there was no need to borrow from a higher-order word. A '0' indicates that there is a need to borrow. Since a SUBS instruction always sets this to '1', this means there's no borrow (the subtraction result requires an 'increment' in order to compensate for the inverted operand 2.) But for the SBCS instruction, if APSR.C is a '0', then no 'increment' takes place and this is the same as borrowing (since an increment is required, if there is no borrow.)

The ADCS instruction, found on page A6-106 but not displayed here, also uses the carry-out of prior instruction executions. It doesn't invert the carry-out value or otherwise do something weird or different, just because it is an ADCS instruction. It does exactly the same thing as the SBCS instruction except and only for one minor detail -- the SBCS instruction will invert operand 2 and ADCS won't. That's it.

This is one of the really cool aspects about the way these details work. Very little added logic is required to turn an addition into a subtraction and/or a multi-word addition into a multi-word subtraction.

And finally, to complete the story, see page A2-35:

Consistent with my descriptions of how things actually do work, above.

It's really a pleasure to see how all this works. It's worth some time playing with different signed and unsigned values and, by hand, setting and using status flags. It really deepens these ideas. And they are very good ones!

All of the above is about understanding the status bits and how they are generated and why they are generated in the way that they are. If you focus on what actually happens in a CPU, the rest just falls out as the necessary consequences and it's very easy to understand, then.

A CPU only adds. It cannot subtract.

It doesn't make sense to insist dogmatically that a CPU can only add. I could equivalently argue that a CPU can only subtract, and that to "add" it negates the second operand and subtracts. The ALU is just a blob of combinational logic, optimized for timing, area, and power. If I implement an adder/subtractor in an FPGA I get one multiplexer that performs both addition and subtraction, and neither operation is more fundamental than the other. — Elliot Alderson, Aug 10 '21 at 18:58
@ElliotAlderson It used to be the case that they could do things various ways and, in those days, understanding the status was more complicated than it has today become. You had to read the manual and see what some designer did, that time around. But nowadays, the only thing they do is add. Which is a ***good*** thing, not a bad thing. The reason I'm making this emphasis is because, once that idea is drilled deep into the brain, then everything ***makes sense***, ***always*** and you can parse anything accurately. Less memorization, more understanding. It's better this way. — jonk, Aug 10 '21 at 19:07
@ElliotAlderson Everything I wrote, by the way, is dead-accurate. — jonk, Aug 10 '21 at 19:10
Well, we will agree to disagree then. In my personal experience, assuming that I have absolute knowledge will usually backfire. — Elliot Alderson, Aug 10 '21 at 19:25
@ElliotAlderson Accepted. Thanks. And you would be very much correct "back in the day." I cannot count the number of oddball choices made by random cpu designers. I was continually sussing out details from manuals because no two people seemed to make the same choice. No disagreement there. But I worked at Intel in the late 1990's on P II chipset design and testing and I got my earful at the time about how everyone is now on the same page in this particular CPU design choice. It's true. If you ***ever*** find a modern ALU exception to what I wrote, provide it and I will delete my answer here. — jonk, Aug 10 '21 at 19:28
@ElliotAlderson Also, and more pointedly, with ARM it is exactly true. — jonk, Aug 10 '21 at 19:33
Well then I apologize, I did not know that you had designed the ALUs in all of the ARM silicon implementations as well as the soft cores provided for implementation in FPGAs. Nevertheless, I think my example of implementing an ALU on an FPGA still stands and I've learned that these discussions are really not fruitful. Peace. — Elliot Alderson, Aug 10 '21 at 20:00
@ElliotAlderson You know I'm right. Being unequivocal is better on this topic. And if by "being fruitful" you mean arguing me to change a correct view for which there is no counter-evidence as a waste of time, that's true enough. But again, if you provide me a single modern example as evidence, I will admit my failure and delete the post, too. You will get everything you want. Just find the evidence. I'm very open to being wrong. I just know I'm not. And you don't like my certainty. Oh, well. — jonk, Aug 10 '21 at 20:27
In the context of Arm (as per question), @jonk is architecturally correct. The various ArmARMs define `SUBS` as `AddWithCarry(R[n], NOT(imm32), '1')` (I can expand this if it would be helpful), which covers this answer. — awjlogan, Aug 11 '21 at 10:23
@ElliotAlderson Please see the update. awjlogan clued me into how to find the right docs. (Appreciated.) It's all there. — jonk, Aug 11 '21 at 11:07
"I could equivalently argue that a CPU can only subtract" -- Maybe, but you'd be on your own. Subtraction in ALUs has been implemented as "negate the second operand, then add" since at least the 6502, probably back to the 8008, maybe even the 4004 or earlier. The fundamental circuit involved, is called an "adder". Any comparable "subtracter" circuit is more complex than an adder, and is fundamentally an adder with a negater (which is also an adder) in front of it (since two's complement negation is "invert bits" *and add one*). Sorry, man, fundamentally it's all addition. — Mike DeSimone, Aug 11 '21 at 15:10
@ElliotAlderson I wrote, "If you provide me a single modern example as evidence, I will admit my failure ..." because accepting good evidence and admitting error is important to me and defines what I want to be for others. Your response, "Well then I apologize, I did not know that you had designed the ALUs in all of the ARM silicon implementations, ..." I wouldn't have been capable of writing. It's not in my bones. Were the tables turned, I'd be making a loud and clear apology, admitting error, and promising to do better. Do you let this become the behavior that defines you? I hope not. — jonk, Aug 11 '21 at 19:03
Is someone seriously questioning whether add logic is used to subtract, I thought at this point everyone who actually works on these or has take then few minutes to look knows this. Not really relevant whether the tech writer for some documented chose to indicate that or not, its what the language compiler, cell library, or author of the code did directly that matters. — old_timer, Aug 13 '21 at 09:05
@old_timer Yeah. Elliot marked me down and then defended himself by making silly arguments that have no basis in fact. He was just bullying me. But Elliot and I have fundamentally conflicting world-views about teaching (from prior discussions.) Both of us have been teachers -- he says he still is one. And he similarly wants me to conform to his own personal rules about teaching here, too. Long discussion where we had to disagree and leave it there. I suspect this was more about ***me*** and his continuing efforts to bully me. And nothing much else, to be honest. He's now left himself exposed. — jonk, Aug 13 '21 at 09:11
@jonk ahh, yes I there are other users I have similar problems with. Yes quite exposed. — old_timer, Aug 13 '21 at 09:29

score 5 · Answer 2 · answered Aug 10 '21 at 10:12

5

In your code example the SUB instruction has an S suffix, this means that the sub instruction will set the condition flags, which the BGT will evaluate. For the branch to be taken, the Z flag must be 0, and the N flag must equal V

answered Aug 10 '21 at 10:12

Colin

4,499
2
19
33

1

I do know that the sub instruction will set the condition flags, where N= not negative, Z= not zero, C/V=unsigned/signed overflow. But how does the branch, in this case, know when to go to the inner loop when for example, j=2, after decrement it is 1, hence N=0, Z=0, C/V=0, how does BGT (greater than ?) goes back to the loop again? – Meep Aug 10 '21 at 11:27
1

Why will it be zero if R2 still holds a value of 1? As for BGT, what is it comparing to? Greater than? Sorry for these questions, new to ARM. – Meep Aug 10 '21 at 11:37
1

Commands change flags. Checking flags allows conditional testing. Use the C to explain the assessembler. If j > 0, branch. – StainlessSteelRat Aug 10 '21 at 12:04
1

@Meep If it was equal Z would be 1, if it was less then I'm not entirely sure but probably C would be 1. Since those are both zero, it must be greater. – user253751 Aug 10 '21 at 12:17
2

This is right. It's checking if greater than zero, in that scenario both C, V flags will have same value and Z will be 0. – Mitu Raj Aug 10 '21 at 19:59

old_timer · Answer 3 · 2021-08-13T09:33:43.940

The arm documentation clearly states that GT is a signed greater than, it will branch when Z==0,N==V.

When r2 = 2. Remember from grade school that x - y = x + (-y), and from day one (or shortly thereafter) in computer engineering/science/whatever twos complement negation is invert and add one so x - y = x + (~y) + 1. This saves on logic and is how we do subtraction

      1  add one
   0010 
 + 1110  invert
==========

four bits is more than enough to see what is going on, the result is the same as 32 bits.

   11101
    0010 
  + 1110
 ==========
    0001

So N = 0 and Z = 0 from the result. The carry in and carry out of the msbit are the same so V = 0 (xor of the carry in and carry out of the msbit, can also do it by inspection of the msbits of the operands and result).

We need Z == 0 and N == V to do the branch, and they are, so the branch happens.

You will find this is the case for positive numbers since this is a signed greater than, if you wanted unsigned greater than then use bcs/bhs, logic works the same it just optimizes to using the carry out only (can see this as well if you look at the table jonk generated or generate one yourself)

When r2 = 1

   11111
    0001
 +  1110
 ==========
    0000

Z = 1, N = 0, V = 0

N == V but Z != 0 so the branch does not happen.

short version of jonks answer, showing how we get N,V,Z. upvote jonks if/when/instead you upvote this one. — old_timer, Aug 13 '21 at 09:31

ARM How to invoke branching?

3 Answers3

Note

Linked