0

Suppose you are designing a floating point unit, and it is desired that it be capable of both single precision and double precision operation, in the former case not by simply expanding the single precision operands into double precision registers, but by performing twice as many operations per clock cycle.

That is, to be specific, say you want to have two options, if A and B are 64-bit registers, either perform the operation C=A*B, or alternatively C0=A0*B0, C1=A1*B1 where A0, A1 are the low and high 32-bit words of the A register.

How efficient is this? How practical or efficient is it to split a double precision unit into a pair of single precision units like this? I want to say, what percentage of the transistors can be dual-purposed, but I'm sure it's not as simple as that.

To be specific, how does the practicality/efficiency of this compare to the same dual-purposing of a fixed point multiplier unit?

rwallace
  • 559
  • 3
  • 11
  • 1
    Back in the olden days, when floating-point hardware was new (IBM 704, etc.), software extensions could treat two floating-point hardware registers as a single, higher precision floating-point value. Such values were called DOUBLEs, for the obvious reason. – Pete Becker May 05 '17 at 10:40
  • 1
    depending on gate count vs clock cycles a 64 bit or even a 32 bit floating point multiply might be implemented in multiple clocks with a smaller multiply unit. the real estate cost goes up exponentially or at least faster than linearly as you add more bits to a one clock multiply, with a pipe you can hide the multiple clocks as you are already doing that with the pipe...you should probably design it at the surface as a full sized solution 64x64 or whatever and then the backend of that you can choose to feed into 64, 32 or 16 or 8 bit multiply blocks. – old_timer May 05 '17 at 14:21

2 Answers2

1

The only thing that really needs to be duplicated is the adders in the exponent handling. The mantissa logic can be split easily for the basic operations, at minimal extra cost.

For fixed-point, you don't have adders to compare and adjust exponents, and you omit the barrel shifters for the mantissa (which are also easily split, so savings there are minimal).

Life gets complicated for trigonometric functions and denormals though. If you don't implement these, the extra cost for dual-mode FPUs should be minimal.

Simon Richter
  • 12,031
  • 1
  • 23
  • 49
0

Multiply would likely take more than 4x as many clocks, not double.

You have 4 partial products and you have to add them.

For the same reason it will probably cost more to make a high performance double-precision FPU than double that of a single precision.

Spehro Pefhany
  • 376,485
  • 21
  • 320
  • 842