Why doesn't the compiler use directly LSR

Question

Hi I've been working on a project using an Arduino Uno (so ATmega328p) where timing is quite important and so I wanted to see into which instructions the compiler was converting my code. And in there I have a uint8_t which I shift one bit to the right on each iteration using data >>= 1 and it seems the compiler translated this into 5 instructions (data is in r24):

mov     r18, r24
ldi     r19, 0x00
asr     r19
ror     r18
mov     r24, r18

But if I look into the instruction set documentation I see an instruction which does exactly this: lsr r24

Do I overlook something or why isn't the compiler using this as well? The registers r18 and r19 are not used anywhere else.

I'm using an Ardunio but if I'm correct it does just use the normal avr-gcc compiler. This is the code (trimmed) which generates the sequence:

ISR(PCINT0_vect) {
    uint8_t data = 0;
    for (uint8_t i = 8; i > 0; --i) {
//        asm volatile ("lsr %0": "+w" (data));
        data >>= 1;
        if (PINB & (1 << PB0))
            data |= 0x80;
    }
    host_data = data;
}

As far as I can see the Ardunino IDE is using the AVR gcc compiler provided by the system which is version 6.2.0-1.fc24. Both are installed via the package manger so should be up to date.

Well I compiled it using the Ardunio IDE and then used `avr-objdump` on the elf file… What is it that doesn't seem to correspond? — xZise, Feb 07 '17 at 17:09
@Eugene Sh.: It **does** correspond to the C code. It corresponds just to the line`data >>= 1;` — Curd, Feb 07 '17 at 17:16
This is one of the cases where "use shifts instead of division" is the wrong advice. If you do /= 2 instead the compiler will generate lsr r24; (tip: try the gcc explorer to play around with asm code generation) — PlasmaHH, Feb 07 '17 at 20:43
*What* compiler? What processor? It really should be obvious this is necessary information for the question to make sense. — Olin Lathrop, Feb 07 '17 at 22:13
The AVR GCC backend has some unfortunate gaps. I remember one time it compiled an ISR consisting of `bytevar--` into something like 8 instructions. — chrylis -cautiouslyoptimistic-, Feb 08 '17 at 01:49
Voting to close because the issue cannot be reproduced. `avr-gcc` version 6.2.0 mentioned in the question hasn't been released yet. — Dmitry Grigoryev, Feb 10 '17 at 11:57
Not sure where you got your information but there is the `avr-gcc` package version 6.2.0 for Fedora 24. https://apps.fedoraproject.org/packages/avr-gcc — xZise, Feb 11 '17 at 19:08

Curd · Accepted Answer · 2017-02-07T21:59:25.330

18

According to C language specification any value whose size is less than the size of int (depends on the particular compiler; in your case int is 16-bits wide) involved in any operation (in your case >>) is upcast to an int before the operation.
This behaviour of the compiler is called integer promotion.

And that's exactly what the compiler did:

r19=0 is the MSByte of the integer promoted value of data.
(r19, r18) represents the total integer promoted value of data that is then shifted right by one bit by asr r19 and ror 18.
The result is then implicitely cast back to your uint8_t variable data:
mov r24, r18, i.e. the MSByte in r19 is thrown away.

Edit:
Of course the complier could optimize the code.
Trying to reproduce the problem I found that at least with avr-gcc version 4.9.2 the problem doesn't occur. It creates very efficient code, i.e. C-line data >>= 1; gets compiled to just one single lsr r24 instruction. So maybe you are using a very old compiler version.

edited Feb 07 '17 at 21:59

answered Feb 07 '17 at 16:57

Curd

16,043
34
43

Okay but this seems a waste of resources (time and space). – xZise Feb 07 '17 at 17:13
That's the reason why I still code in assembler on such tiny machines. For interrupt functions, it still seems to be a wise choice. – Janka Feb 07 '17 at 17:18
I'm not sure but maybe you can get the compiler to optimize the code by setting the optimization switch (-O...). – Curd Feb 07 '17 at 17:26
It is already using `-Os` which is (according to the documentation) almost as good as `-O2`. Some are disabled but none of them seem to be related. And previously `-O3` was producing enormous programs. – xZise Feb 07 '17 at 17:28
2

It's not a total waste because sometimes you need the unoptimized code for debugging on assembler level. Then you are very glad if you have unoptimized code. – Curd Feb 07 '17 at 17:28
3

If I recall correctly -mint8 is the flag to make integers 8-bit. However this has a lot of unwanted side effects. Sorry, can't quite remember what they were now, but I never used the flag because of them. I spent a lot of time comparing avr-gcc with a commercial compiler many years ago. – Jon Feb 07 '17 at 17:35
1

Oh that's right, the C standard requires integers to be at least 16-bit, so using -mint8 breaks all the libraries. – Jon Feb 07 '17 at 17:40
9

Nigel Jones said in "Efficient C Code for 8-bit Microcontrollers" something like: "...The integer promotion rules of C are probably the most heinous crime committed against those of us who labor in the 8-bit world"... – Dirceu Rodrigues Jr Feb 07 '17 at 17:43
So is there any workaround? Like disabling *only* the promotion rules, without breaking ``int`` by making it 8 bit? Because this in fact looks devastating for AVR code. – Jonas Schäfer Feb 07 '17 at 20:38
1

@Jonas Wielicki: the best solution for the problem is to use a better compiler. E.g. with avr-gcc version 4.9.2 I cannot reproduce the problem: For C code line `d >>= 1;` I get just one single `lsr r24` instruction. Maybe xZise is using a very old compiler version. – Curd Feb 07 '17 at 21:28
1

Well I'm using the packages provided for Fedora 24 (arduino and avr gcc). If I'm reading this correctly arduino is using the installed avr-gcc package which is 6.2.0 for me. – xZise Feb 07 '17 at 23:29
1

@DirceuRodriguesJr: A decent compiler for an 8-bit platform should have no problem recognizing cases where the upper word of a 16-bit value isn't used. A bigger problem is the lack of explicitly-non-promoting types which behave as a particular bit size independent of `int` length. Some compilers will behave "creatively" if the product of two uint16_t exceeds 2147483647 even if the result is masked with 65535. – supercat Feb 08 '17 at 00:08

Why doesn't the compiler use directly LSR

1 Answers1