As crasic pointed out in the comments, you don't actually need to compute the actual exponential. You can use modular arithmetic to reduce the max value you need to store:
\begin{gather}
(a b) ~\textrm{mod}~ m = ((a ~\textrm{mod}~ m) (b ~\textrm{mod}~ m)) ~\textrm{mod}~ m
\end{gather}
For \$(a b)~\textrm{mod}~m\$, the maximum value you will ever need to store is \$(m-1)^2\$, which for m=23
means you must be able to store 484. This is outside the range of 8 bit numbers but fits into 16-bits. However, as Olin pointed out some 8-bit micros can manipulate 16-bit values efficient so this isn't a huge issue.
A general implementation in C might look like:
uint16_t exp_mod(uint16_t base, uint16_t power, uint16_t m)
{
uint16_t res = 1;
while(power)
{
if(power & 1)
{
res = (res * base) % m;
}
res = (res*res) % m;
power >>= 1;
}
return res;
}
If your uC has an efficient modulo instruction, this is usually good enough. However, if you don't and know that n will always be a certain value, there is an efficient division (and thus modulo) implementation by T. Granlun and P.L. Montomery. Agner Fog also has a description of how this works.
Here's a hacked implementation for m=23
.
uint16_t mod23(uint16_t m)
{
// note: I didn't actually figure out what n and the error correction term is suppose
// to be, I just guessed a value for n which produces a simple function
// only checked to be correct for m <= 484
//
// n = 11
uint16_t res = a - 23*((m*89)>>11);
return (res == 23) ? 0 : res;
}
The only required 16-bit operations then are:
- addition/subtraction
- multiplication
- bitwise shifting
- bitwise and
- equality comparison
I'm not familiar with the PIC instruction set, but here's roughly a worst-case instructions count:
4*log2(power)
comparisons
4*log2(power)
jumps
2*log2(power)
subtractions
3*log2(power)
bitwise shift
6*log2(power)
multiplications
For power=15
, this is roughly 76 instructions. Note that this can go down if you have access to multiply-add instructions or other special instructions. I would consider this quite reasonable.
note: This implementation has variable timings for how long exp_mod takes depending on the input parameter, so is vulnerable to timing-based cracks (though for a 16-bit key, can just be brute-force cracked, too). It's relatively easy to modify this code to be resilient to this by injecting dummy instructions, which in theory should only hurt average performance and worst-case performance is unchanged.