Suppose you are designing a floating point unit, and it is desired that it be capable of both single precision and double precision operation, in the former case not by simply expanding the single precision operands into double precision registers, but by performing twice as many operations per clock cycle.
That is, to be specific, say you want to have two options, if A and B are 64-bit registers, either perform the operation C=A*B
, or alternatively C0=A0*B0, C1=A1*B1
where A0, A1 are the low and high 32-bit words of the A register.
How efficient is this? How practical or efficient is it to split a double precision unit into a pair of single precision units like this? I want to say, what percentage of the transistors can be dual-purposed, but I'm sure it's not as simple as that.
To be specific, how does the practicality/efficiency of this compare to the same dual-purposing of a fixed point multiplier unit?