Large exponentials and 8-bit PIC processors

Question

Does anyone know if 8-bit (PIC) processors could typically handle large exponentials, e.g., 5^15 mod 23, with a reasonable speed? Does doing so result in more cycles than on 16-bit processors?

I'm hoping to implement Diffie-Hellman Key Exchange on one, but the thought of computing large exponentials seems daunting.

Are large exponentials commonly done on 8-bit processors? Is DH key exchange commonly implemented on such low-end devices?

Well I can't comment on the exact issue you describe, but I can say that I used the JAL language once for 8-bit PIC micros. I setup a library which used a floating-point math library (Vasile Surducan) to display digits on an LCD. It worked, but the math was fairly slow. This of course is due to the much larger number of operations required for a RISC processor. In the end, if the clock rate can be bumped up really high, it may not matter. — rdtsc, Sep 24 '15 at 11:00
I've never used it, but I believe MPLAB X IDE has a stopwatch which will measure cycle count between breakpoints. I believe it is under "Window>Debugging>Stopwatch". I think you can do this with the simulator (which I have also not used). — Tut, Sep 24 '15 at 13:01
The old MPLAB had a stopwatch like @Tut mentions, so I expect that MPLABX does, too. The trick is that it __only__ worked in the MPLAB simulator, not when you're debugging on actual hardware. — bitsmack, Sep 24 '15 at 17:09
Actually the coefficients are way larger than the given example implies. see https://tools.ietf.org/html/rfc5114#page-4 — Roland Mieslinger, Sep 25 '15 at 08:18

Olin Lathrop · Answer 1 · 2015-09-24T11:39:58.010

2

Any math can be done on any processor. The only question is how many operations, and therefore how much time it will take.

5¹⁵ = 2^34.8, so requires at least 35 bits to express. Bits come in bytes of 8 each on a PIC 18, so you'd use 40 bits or 5 bytes to represent this value.

So all you have to write is a routine that multiplies a 1 byte number by a 5 byte number into the 5 byte number. This is actually pretty easy since the PIC 18 has a hardware 8x8 into 16 bit multiplier. Basically, you multiply the one input byte by each of the wide number bytes separately and add up the results. I'd probably write this as a unrolled loop, so the guts would be 5 sets of a multiply and two adds, or 15 instructions.

Added

As Dan D correctly pointed out in a comment, I forgot to address the modulo part. That is significantly more complicated than computing 5¹⁵. Basically, you do a divide by 23 and keep only the remainder. You may be able to save a few instructions because you don't need the quotient at the end, but mostly you still have to write a 5-byte divided by 1-byte with 1-byte remainder algorithm.

Again, this can all be done on a PIC 18 or other small processors, but will take many more instructions than on a processor that can manipulate 40 bit numbers natively.

edited Sep 24 '15 at 11:39

answered Sep 24 '15 at 11:17

Olin Lathrop

310,974
36
428
915

You seem to have missed that it was _modular exponentiation_. The given example can be done quickly in with single bytes with modular exponentiation by squaring. `5^15 mod 23` is only `19`. – Dan D. Sep 24 '15 at 11:33
@Dan: See addition to answer. – Olin Lathrop Sep 24 '15 at 11:41
It might be possible to find inspiration in the GMP library, which has code to do such calculations on arbitrary length integers stored in nibbles (4bit entities) – PlasmaHH Sep 24 '15 at 11:48
@OlinLathrop Thanks. But if it was a PIC 16, there won't be a hardware 8x8 hardware multiplier in the ALU unfortunately, so I believe the multiplication will be more difficult. Is it a good idea to write the expression in C as it is and let the MPLAC XC8 do the work? – Kar Sep 24 '15 at 13:35
Modular exponents are easily calculated using results old number theory. Doesn't require ever actually computing the large exponential value – crasic Sep 24 '15 at 16:53
1

The old school way is using modular arithmetic to rewrite the expression into a product of 5 raised to prime powers mod 23. Fermats little theorem them applies and you only need to take a mod of a small number. There are also multiple memory efficient algorithms for this operation on digital computers and have been extensively researched – crasic Sep 24 '15 at 17:08

helloworld922 · Accepted Answer · 2015-09-25T03:38:31.360

As crasic pointed out in the comments, you don't actually need to compute the actual exponential. You can use modular arithmetic to reduce the max value you need to store:

\begin{gather} (a b) ~\textrm{mod}~ m = ((a ~\textrm{mod}~ m) (b ~\textrm{mod}~ m)) ~\textrm{mod}~ m \end{gather}

For \$(a b)~\textrm{mod}~m\$, the maximum value you will ever need to store is \$(m-1)^2\$, which for m=23 means you must be able to store 484. This is outside the range of 8 bit numbers but fits into 16-bits. However, as Olin pointed out some 8-bit micros can manipulate 16-bit values efficient so this isn't a huge issue.

A general implementation in C might look like:

uint16_t exp_mod(uint16_t base, uint16_t power, uint16_t m)
{
    uint16_t res = 1;
    while(power)
    {
        if(power & 1)
        {
            res = (res * base) % m;
        }
        res = (res*res) % m;
        power >>= 1;
    }
    return  res;
}

If your uC has an efficient modulo instruction, this is usually good enough. However, if you don't and know that n will always be a certain value, there is an efficient division (and thus modulo) implementation by T. Granlun and P.L. Montomery. Agner Fog also has a description of how this works.

Here's a hacked implementation for m=23.

uint16_t mod23(uint16_t m)
{
    // note: I didn't actually figure out what n and the error correction term is suppose
    // to be, I just guessed a value for n which produces a simple function
    // only checked to be correct for m <= 484
    // 
    // n = 11
    uint16_t res = a - 23*((m*89)>>11);
    return (res == 23) ? 0 : res;
}

The only required 16-bit operations then are:

addition/subtraction
multiplication
bitwise shifting
bitwise and
equality comparison

I'm not familiar with the PIC instruction set, but here's roughly a worst-case instructions count:

4*log2(power) comparisons
4*log2(power) jumps
2*log2(power) subtractions
3*log2(power) bitwise shift
6*log2(power) multiplications

For power=15, this is roughly 76 instructions. Note that this can go down if you have access to multiply-add instructions or other special instructions. I would consider this quite reasonable.

note: This implementation has variable timings for how long exp_mod takes depending on the input parameter, so is vulnerable to timing-based cracks (though for a 16-bit key, can just be brute-force cracked, too). It's relatively easy to modify this code to be resilient to this by injecting dummy instructions, which in theory should only hurt average performance and worst-case performance is unchanged.

score 1 · Answer 3 · answered Sep 24 '15 at 16:51

1

Disclaimer: If you want to do it for educational purpose, go ahead. If you want to use it to actually protect communication, go an get a copy of "Handbook of Applied Cryptography".

As already stated, the problem is not the exponentiation but the modulo operation (if done naively). Montgomery multiplication is an efficient way to solve this. In short, the problem is mapped onto a different group that has the property that the modulo operation can be replaced by bit shifting (and/or memory copy)

I don't know about DH implementations on 8-Bit chips, but I know for sure that there are RSA implementations for 8-Bit microprocessors.

answered Sep 24 '15 at 16:51

Roland Mieslinger

618
3
7

Adjusting things so that the two most significant bytes of the modulus are 1 and 0 will help things immensely, but unfortunately the operand usage of the multiply instructions is not particularly amenable to efficient extended multiplies. If the PIC had a "factor" register (which could hold its value between multiplies) and an instruction which would compute PRODH+FACTOR*op -> PRODH:W, that would have allowed the main loop of a long multiply to be reduced to two single-cycle instructions: "XMULA POSTINC0,c / ADDWFC POSTINC1,f,c" – supercat Sep 24 '15 at 19:05

Large exponentials and 8-bit PIC processors

3 Answers3

Added