Given any system which supports standard math operations -- add, subtract, multiply, divide -- on n bits, you first need to keep in mind some rules. If you are using sign-magnitude representation you lose a bit of potential value, but the math is simpler. If you are only handling unsigned, it simplifies even more. The rules below presume you are using twos-complement number representation.
- The max absolute value you can store is 2^(n-1) in any n-bit location. The total range is -2^(n-1)...0...+(2^(n-1)-1), e.g. -32768...0...+32767 for 16-bit twos-complement numbers.
- Addition of two numbers with the same sign can generate an answer with n+1 bits required to represent the result:
2^(n-1) + 2^(n-1) == 2 * 2^(n-1) == 2^(n-1+1) == 2^(n)
- Subtraction of two numbers with opposite sign can also generate an answer with n+1 bits required to represent the result.
- Multiplication of two numbers can generate an answer with 2n-2 bits required to represent the result:
2^(n-1) * 2^(n-1) == 2^(n-1+n-1) == 2^(2n-2)
- Division is always hardest! You must somehow handle zero-divisor, and you have to consider whether to round or not for integer division, and floating-point? Well, implementing floating-point standards -- IEEE 754 -- is not trivial, but I have done it. Intel botched accuracy of the first Pentium Floating-point implementation with a small bug.
All modern processors support addition/subtraction. Floating-point units are still the exception rather than the norm, but are becoming more widely used.
For the system you describe, you have two 32-bit values, which you can further subdivide into four 16-bit values or eight 8-bit values. The algorithms are the same, you just have to iterate more. This page describes the basic rules for two's-complement arithmetic.
I will presume you have access to register-level processor access so you can get at status bits like carry, overflow, etc.
Given two 64-bit numbers, X = ab and Y = cd and Result = KL, where a, b, c, d, K, and L are "32-bit digits" in your numberland the following approaches work:
Addition:
- L = b + d. Set Cl to the carry value after the addition.
- K = a + c + Cl. (If more than two 'digits', track the carry and move up the chain to higher-order digits.)
- Use the addition rule to check for overflow in KL.
Subtraction:
- L = b - d. Set Cl to the underflow (same as carry on a lot of processors) value after the addition.
- K = a - c - Cl. (If more than two 'digits', track the underflow/carry and move up the chain.)
- Use the subtraction rule to check for overflow in KL.
As described in the link above, multiplication and division can be performed by addition and subtraction, but you must account for a possible 128-bit result.
subterm bits: 64 96 96 128
=========================================================
X * Y = ab * cd = b*d + b*c << 32 + a*d << 32 + a*c << 64
i.e. 4 multiplies and 3 adds.
...however, you must account for the intermediate terms which can be larger than your 64-bit numberland.
Represent the result as 2 of your 64-bit numbers, either as an array of two, or as a struct with 4 32-bit values. The following shows use of an array, because it is extensible to arbitrary precision by making the array larger.
Let result = R[2]
, where each entry in R is one of your 64-bit numbers:
Initialize result (simplifies the algorithm a bit.)
R[0].lo = 0; // call this L, lowest-order result "digit"
R[0].hi = 0; // call this K
R[1].lo = 0; // call this J
R[1].hi = 0; // call this I, highest-order result "digit"
If X.hi is a, X.lo is b, Y.hi is c, and Y.lo is d, this becomes
(KL) = X.lo * Y.lo; // 62-bits max, rule #4.
// Note no carry possible from K.
// Carry from L handled by Addition, above.
(JK) += X.lo * Y.hi; // 63-bits max, rule #4, #2.
// Note that K contains the value from the
// previous step, hence the "+=" operator.
// Also note no carry possible from J.
// Carry from K handled by Addition, above.
(JK) += X.hi * Y.lo; // 64-bits max, rule #4, #2.
// Note that JK contains the value from the
// previous step, hence the "+=" operator.
// Also note no carry possible from J.
// Carry from K handled by Addition, above.
(IJ) += X.hi * Y.hi; // 62-bits max, rule #4, #2.
// Note that J contains the value from the
// previous step, hence the "+=" operator.
// Also note no carry possible from I
// because we kept a 128-bit result.
// If we only wanted a 64-bit result,
// the overflow tests would be by testing K.
Now apply the addition rules from the link above to determine if you overflowed 64 bits.
For subtraction you do the same, subtracting instead of adding, and tracking underflow for Rc. Then you apply the subtraction rules from the link to determine if you overflowed 64 bits.
Once you have multi-precision working for addition/subtraction for two "digits" of arbitrary bit width, extending it to three, four, etc., is very straightforward.
Then, for simplicity, you use the multiply/divide approach used in the above link to multiply and divide in your two-digit system or any higher-precision system.