Efficient inverse (1/x) for AVR

Question

I'm trying to find an efficient way of calculating an inverse on an AVR (or approximating it).

I'm trying to calculate the pulse period for a stepper motor so that I can vary the speed linearly. The period is proportional to the inverse of the speed (p = K/v), but I can't think of a good way of calculating this on the fly.

My formula is

p = 202/v + 298; // p in us; v varies from 1->100

Testing on the Arduino, the division seems to be ignored completely leaving p fixed at 298 (though perhaps this would be different in avr-gcc). I've also tried summing v in a loop until it exceeds 202, and counting the loops, but this is quite slow.

I could generate a lookup table and store it in flash, but I was wondering if there was another way.

Edit: Maybe the title should be "efficient divide"...

Update: As pingswept points out, my formula for mapping period to velocity is incorrect. But the main problem is the divide operation.

Edit 2: On further investigation, divide is working on the arduino, the problem was due to both the incorrect formula above and an int overflow elsewhere.

An integer, but as it gives a period in us, integer division is accurate enough here. — Peter Gibson, Sep 13 '10 at 23:36
You could precompute the values of the 100 integers and make a lookup table of pre scalers for multiplication if you're really concerned with speed. Of course there is a memory trade off. — RYS, Sep 30 '15 at 11:21

score 7 · Accepted Answer · answered Sep 14 '10 at 19:43

One nice thing about division is that more or less everyone is doing it. It is a pretty core feature of the C language, and compilers like AVR-GCC (called by the Arduino IDE) will choose the best division algorithm available, even when the microcontroller does not have a hardware division instruction.

In other words, you don't need to worry about how division is implemented unless you have a very strange special case.

If you do worry, then you might enjoy reading Atmel's official suggested division algorithms (one optimized for code size, and one optimized for execution speed; neither take any data memory). They are in:

http://www.atmel.com/dyn/resources/prod_documents/doc0936.pdf

which is the Application Note "AVR200: Multiply and Divide Routines" listed on the Atmel page for its (reasonably big) Atmega processors like the Atmega 168 and Atmega 328 used in the standard Arduinos. The list of data-sheets and application notes is at:

http://www.atmel.com/dyn/products/product_card.asp?part_id=4720

vicatcu · Answer 2 · 2010-09-15T14:07:35.967

4

looks to me like all you need is a 100-entry lookup table. Doesn't get much faster than that.

#define VALUE_FOR_V_EQUALS_ZERO 0
uint16_t formula_lookup[100] = {VALUE_FOR_V_EQUALS_ZERO, 500, 399, 365, 348, ..., 300};

...

//"calculate" formula
p = formula_lookup[v > 67 ? 67 : v];

EDIT you actually only a 68 value lookup table since values of v greater than 67 always evaluate to 300.

edited Sep 15 '10 at 14:07

answered Sep 15 '10 at 02:51

vicatcu

22,499
13
79
155

As I said in the question, I was wondering if there was another way – Peter Gibson Sep 15 '10 at 12:37

score 3 · Answer 3 · answered Sep 13 '10 at 01:31

3

There are some very good techniques mentioned in the book "Hackers Delight by Henry Warren and on his website hackersdelight.org. For a technique that works well with smaller microcontrollers when dividing by constants have a look at this file.

answered Sep 13 '10 at 01:31

timrorr

1,713
10
10

These look good for dividing by constants as you say, but don't really apply to my problem. He uses techniques such as precalculate the inverse - multiply by it, then shift. – Peter Gibson Sep 13 '10 at 02:09
That's an excellent book! – Windell Oskay Sep 15 '10 at 06:53

score 3 · Answer 4 · answered Sep 13 '10 at 05:30

3

Your function doesn't seem like it would give the result you want. For example, the value 50 returns roughly 302, while 100 returns roughly 300. Those two results will cause almost no change in the speed of the motor.

If I understand you correctly, you're really looking for a fast way to map the numbers 1-100 to the range 300-500 (approximately), such that 1 maps to 500 and 100 maps to 300.

Perhaps try: p = 500 - (2 * v)

But I might be misunderstanding-- are you trying to calculate the on-time of a constant frequency square wave? What's the 298?

answered Sep 13 '10 at 05:30

pingswept

12,581
4
46
64

Yes thanks, the formula is wrong. The point is to get linear acceleration from the output of the stepper, by varying the target speed by a constant each time interval (speed++ say). This must be mapped to the period (frequency) that a +ve edge is sent to the stepper motor controller - hence the inverse relationship (p=1/v). – Peter Gibson Sep 13 '10 at 23:25
Do you mean constant acceleration, i.e. a linearly increasing velocity? – pingswept Sep 13 '10 at 23:43
Ah yes, constant acceleration, I mucked that up when originally writing the question and remember fixing it there too – Peter Gibson Sep 14 '10 at 04:22

score 3 · Answer 5 · answered Sep 13 '10 at 12:42

An efficient way to approximate divides is by shifts. e.g. if x = y / 103; dividing by 103 is the same as multiplying by 0.0097087, so to approximate this first select a 'good' shift number (i.e. a base-2 number, 2,4,8,16,32 and so on)

For this example 1024 is a good fit as we can say that 10 / 1024 = 0.009765 Its then possible to code:

x = (y * 10) >> 10;

Remembering of course to ensure the variable y does not overflow its type when multiplied. Its not exact, but its quick.

This is similar to the techniques in the links that timrorr supplied and works well for dividing by constants, but not when dividing by a value that is unknown at compile time. — Peter Gibson, Sep 13 '10 at 23:28

score 3 · Answer 6 · answered Sep 14 '10 at 17:53

On another note if you are trying to do a divide on a CPU that doesn't support divide there's a really cool way to do it in this Wiki article.

http://en.wikipedia.org/wiki/Multiplicative_inverse

To approximate the reciprocal of x, using only multiplication and subtraction, one can guess a number y, and then repeatedly replace y with 2y − xy2. Once the change in y becomes (and stays) sufficiently small, y is an approximation of the reciprocal of x.

Interesting, I wonder how this compares to the other methods mentioned — Peter Gibson, Sep 15 '10 at 12:45

score 1 · Answer 7 · edited Jul 12 '12 at 05:14

You want fast so here goes..... Since the AVR cant do normalisation efficiently (shifting left until you cant shift anymore), ignore any pseudo floating point algorithms. The simplest way for very accurate and fastest integer division in an AVR is via a reciprocal look-up table. The table will store reciprocals scaled by a large number (say 2^32). You then implement a unsigned32 x unsigned32 = unsigned 64 multiplication in assembler, so answer = (numerator * inverseQ32[denominator]) >> 32.
I implemented the multiplication function using inline assembler, (wrapped in a c function). GCC does support 64-bit "long longs", however, to get the result you have to multiply 64bits by 64bits, not 32x32=64 due to C language limitations on 8-bit architecture......

Downside of this method is you will use 4K x 4 = 16K of flash if you want to divide by integers from 1 to 4096......

Very accurate unsigned division is now achieved in about 300 cycles in C.

You could consider using 24 bit or 16 bit scaled integers for more speed, less accuracy.

score 1 · Answer 8 · answered Sep 13 '10 at 18:06

1

This process here looks mcu-friendly, though it might need a bit of porting.

Though it seems as if the LUT would be easier. You would only need 100 bytes, less if you used some interpolation, and since the LUT is filled with constants then the compiler might even locate it in the code area instead of the data area.

answered Sep 13 '10 at 18:06

ajs410

8,381
5
35
42

I tried something similar in summing the divisor until it equals or exceeds the dividend, but found it to be fairly slow. It looks like the LUT will be the way to go - using avr-gcc you need special macros in to store it in flash. – Peter Gibson Sep 13 '10 at 23:34

score 1 · Answer 9 · answered Sep 13 '10 at 18:51

1

Check to make sure that the division is being performed as floating point. I use Microchip not AVR, but when using C18 you need to force your literals to be treated as floating point. Eg. Try changing your formula to:

p = 202.0/v + 298.0;

answered Sep 13 '10 at 18:51

mjh2007

3,899
24
49

score 1 · Answer 10 · edited Sep 30 '15 at 02:08

p = 202/v + 298; // p in us; v varies from 1->100

The return value of your equation already is p=298 since the compiler divides first then add, use integer muldiv resolution that is:

p = ((202*100)/v + (298*100))/100

Using this is same multiply a*f, with a=integer f=fraction.

That yield r=a*f but f=b/c then r=a*b/c but it doesn't work yet because position of operators, yield the final r=(a*b)/c or muldiv function, a manner to compute fraction numbers using only integer.

Efficient inverse (1/x) for AVR

10 Answers10