I recently compiled some C++ code for the ATmega1284P in Atmel Studio and was analyzing the timings of some routines using my scope. To my surprise, a loop I thought I had optimized was taking longer than expected.
After taking a peek at the assembly code, I noticed that the compiler has compiled the following multiplication:
word *= 10;
into repeated addition:
add r28, r28
adc r29, r29
add r18, r18
adc r19, r19
add r18, r18
adc r19, r19
add r18, r18
adc r19, r19
add r18, r28
adc r19, r29
Now although I admire the optimizations in this repeated addition, there is no reason to do so when the chip has hardware multiplication instructions available, like mul
.
Why is gcc behaving this way and how can I tell it to use hardware multiplication instructions? Note that I have set up the multiplication so overflow is not an issue. word
is a uint16_t.