4

I would like to know, what is the purpose of the special FPU unit, that is advertised with STM32F4 microcontrollers ?

To quote this website :

" The Cortex-M4 core features a Floating point unit (FPU) single precision which supports all ARM single-precision data-processing instructions and data types. "

What would be the difference, if this unit was not present in the architecture ? Does this mean I need to use some special libraries / functions when doing arithmetics with floating point variables ? Thank you.

James C
  • 652
  • 2
  • 9
  • 19

3 Answers3

7

Yes, if you don't have a hardware floating-point unit then floating-point operations must be performed using library functions. That's what is done with typical Cortex-M3 processors that do not have hardware floating-point support, and the execution time for these operations goes up significantly.

Joe Hass
  • 8,447
  • 1
  • 29
  • 41
  • 1
    That is the key, performance goes up when there is a hardware unit processing float points instead of just a software implementation. – kR105 Feb 28 '14 at 02:29
  • 2
    Thank you. So to use FPU on STM32F4, I don't need to write any special code, I just write arithmetics operations in a usual C style, using floating point variables ? – James C Feb 28 '14 at 02:44
  • 1
    Yes, assuming that your compiler knows you are programming a Cortex-M4 with an FPU. – Joe Hass Feb 28 '14 at 03:13
  • 2
    @JoeHass Joe, is that 1000:1 or 100:1 from an official benchmark or a guesstimate? The numbers I've seen are nowhere near that high (more like ~10:1). – Spehro Pefhany Feb 28 '14 at 03:35
  • It's from my own observation, on a very small sample, so I am willing to concede that I may have seen pathological cases. I'll edit the answer. – Joe Hass Feb 28 '14 at 11:32
  • Not quite. Without a FPU, floating point operations must be done in software. This software does not have to come from a library. – Olin Lathrop Feb 28 '14 at 13:38
3

If you need to do simple single-precision float operations, the FPU (assuming your compiler supports it, and you properly configure it) can speed up those operations by at least an order of magnitude.

Keep in mind that if you need double precision, the SPFPU is of no help. In practice 24 bits of mantissa (32-bit float) is not quite enough for a lot of real applications (precision data acquisition and filtering, navigation, high end audio), whereas a double or often even a 32-bit fixed point is enough.

Not sure if it speeds up SP transcendentals or not, I would like to see some benchmarks.

Spehro Pefhany
  • 376,485
  • 21
  • 320
  • 842
2

The 32 bit ARM is pretty efficient at floating point in software. The instruction set allows any instruction to include an arbitrary length right or left barrel shift in 1 cycle. The speed gain is from the FP hardware is more like 5 to 50 depending on the operation and how things like trig functions are handled. The fixed point DSP hardware in the F4 can improve DSP speed about 2 to 4 times. That doesn't sound like much but it is the difference between updating motor speed 16 times a second versus 4. It has among other things, a MAC (multiplier-accumulator) that does 32x32 + 64 --> 64 bit accumulator, and some processes that will do a pair of 16 to 32 bit MACs. The MAC is the mainstay of DSP.

There is also an analog random number generator and 3 12 bit ADCs that can handle 7.2MHz (I'm assuming a Discovery board). My guess is we will be seeing a lot of these in "-uino" variations.

Re: Speed advantage, I did a big analysis way back when. It was 1 MHz 65C02 versus same with an ADM9511 FPU added. A few things in the transcendentals were 1000 times faster but a lot was only in the 10 to 50 range. As the inspiration for ARM it isn't surprising that 6502 was pretty efficient. Wozniak wrote the entire Apple II FP system in 256 bytes. Numbers like 100 to 1000X were good for some other 8 bit 8080/Z80. AVR in Arduino has some nice tricks to the instruction set that set it far ahead of the 8080 crowd of the old days.

C. Towne Springer
  • 2,141
  • 11
  • 14