7

Obviously, the number of operations affect a CPU's power consumption, but does it solely depend on the operations themselves? For example, adding 0 and 1 involves setting one single bit, but adding 0xFF and 1 requires clearing 8 bits and setting one bit.

This might be evident in a simple adder circuit with e.g. a 74283 IC, but does the same logic apply to a more complicated CPU? For example, if I was given a task to roughly estimate the power-consumption of an embedded microcontroller for a number of integer adding/subtracting operations, do I need to take into account which numbers may be involved?

JRE
  • 67,678
  • 8
  • 104
  • 179
polfosol
  • 227
  • 2
  • 8
  • This probably can't be answered in general. There surely can be architectures where it would depend and these were it wouldn't. For your case I'd neglect the former possibility though. – Eugene Sh. Apr 18 '22 at 16:22
  • Even if you are only adding 1 + 1, full 8, 16 or 32 bit registers and data paths will be involved, so I wouldn't expect much difference in power consumption between a 1 bit value and something that uses the full register width. – Peter Bennett Apr 18 '22 at 16:48
  • 17
    Yep! In fact these small variations in power [have been used to attack cryptographic systems](https://en.m.wikipedia.org/wiki/Power_analysis). – TypeIA Apr 18 '22 at 17:02
  • @TypeIA that's very interesting... – polfosol Apr 18 '22 at 17:04
  • @J... That's often the case, but not always: as answered below, the operands do also matter, and the difference can sometimes be measured. It depends on the processor and the ability of the attacker to isolate and manipulate the inputs. – TypeIA Apr 19 '22 at 13:20
  • @J... The data you want (secret key, plaintext message) may not be available on the external bus. Think small hardened systems like TPMs. And these small, low performance MCUs are exactly those on which these differences are greatest and most easily measurable. – TypeIA Apr 19 '22 at 14:26
  • @TypeIA Most TPMs use masking to protect from power analysis attacks. – forest Apr 19 '22 at 20:14
  • @forest Yes, exactly because of my comment: power variations "have been used to attack cryptographic systems." ;) When a group of us looked at this at the University of Illinois in 2000-2001, crypto chips were much less robust against a variety of attacks. And we definitely weren't the first, or the most successful. – TypeIA Apr 20 '22 at 07:18

3 Answers3

10

In general, power consumption in a modern CMOS CPU is dependent on the number of signals changing state (that is, dynamic power) plus leakage (static power).

Dynamic power thus varies based on the operands being processed and the operation being applied to them, as these influence the number of signal toggles.

A machine with a parallel multiplier is going to activate a lot of signals to perform that operation, more so than an add/subtract, and both in turn more than logic op or move instructions.

Your intuition is correct about the operands themselves: certain combinations with certain operations will cause a lot of toggles, which use more power than others.

hacktastical
  • 49,832
  • 2
  • 47
  • 138
  • 1
    25 years ago I created a synthetic "max power" program to characterize (as part of initial bring-up) power consumption of a single-core x86 CPU. The main power drain in this tiny test app was the *floating-point multiplier* in the x87 FPU. My memory is vague, but as I recall clever operand selection for the multiply resulted in an increase in the CPU's power draw of about 5%-10%. I have *not* been able to repeat this on recent CPUs, where instead I find that feeding a stream of (pseudo-)random data to the FMA unit seems to maximize power draw. – njuffa Apr 19 '22 at 05:20
  • @njuffa recent CPUs, you say? Don't forget you can dispatch 4-5 instructions per cycle - you could have 4-5 units busy at a time! Hopefully this doesn't invalidate your power draw testing... Also AVX512 is a thing. – user253751 Apr 19 '22 at 10:32
  • 2
    @njuffa at least on modern AMD CPU's, I've found any of prime95's "torture test" variations will get the CPU to increase its draw to the max allowed level and stay there. Things that would in theory draw more power (like AVX) instead cause the chip to downclock to stay within the built-in power restrictions. – mbrig Apr 19 '22 at 16:35
  • @njuffa denormal numbers? – leftaroundabout Apr 19 '22 at 16:36
  • @leftaroundabout On x86 CPUs subnormal operands typically trigger some sort of internal microcode exception handler. I am not aware that this leads to maximizing power consumption. – njuffa Apr 19 '22 at 19:05
  • @njuffa: Some CPUs handle some cases of subnormals without a microcode assist. Especially AMD, but also Intel Sandybridge for cases like normal + subnormal regardless of result, according to [Agner Fog's microarch PDF](https://www.agner.org/optimize/microarchitecture.pdf). I'd guess that would have fewer bits flipping. I also expect you're right that a microcode assist would reduce overall power by limiting throughput of uops on execution units. (Probably treated like an exceptions that drains the ROB.) – Peter Cordes Apr 19 '22 at 21:23
  • @PeterCordes I wrote "typically" (most but not all) for a reason. – njuffa Apr 19 '22 at 21:27
6

Yes power consumption can (and often does) depend on the operands at least to some extent.

Older ARMv4 CPUs made this explicit for some operations like multiplication, where if you added more zeros to the number the result would finish sooner and save not only power but also CPU cycles. This was a very common power saving optimization in those days.

Modern CPUs have even more elaborate optimizations like this, with the instruction decoders checking for certain operand values that can be optimized away. For example, Intel CPUs check for and eliminate certain zeroing idioms, where for example a register is set to zero by subtracting from itself. If the CPU sees this idiom, it will skip the subtraction entirely and just set the value to zero. This article discusses some of these optimizations and gives links to additional sources:

https://easyperf.net/blog/2018/04/22/What-optimizations-you-can-expect-from-CPU

If you mean for randomly selected numbers drawn from all possible register values and not specific idioms or numbers with lots of leading zeros, you can probably assume that all numbers use the same amount of power since you aren't likely to have too many random values that the system can optimize by chance.

user1850479
  • 14,842
  • 1
  • 21
  • 43
  • Recognizing a [zeroing *idiom* (like `xor ecx, ecx`)](//stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and) is a totally different thing. As you say, no SUB or XOR runs at all, so it isn't the operand *values* that matter, it's the operand *names* (recognizing it as the same register). (AMD CPUs also recognize zeroing idioms, but only Sandybridge-family (not all Intel) avoids running a uop on an execution-unit at all, just handled in the register-renaming stage to point that architectural register at a physical zero register) – Peter Cordes Apr 19 '22 at 21:12
1

One easy example where input values make a difference is when doing division. There are many different ways to implement division, and a lot of them are iterative. Your input values will determine how many cycles are needed for the algorithm to converge, and more cycles means more power consumed. Any other instruction that's implemented iteratively (barrel shifts, etc.) would have similar behavior.

Also, consider CPUs that support operations on multi-word integers (such as a 32-bit CPU that can handle operations on 64-bit data). The size of your operands will determine how many registers and clock cycles are required to perform that operation, and thus how much power is required.

bta
  • 915
  • 4
  • 8
  • nitpick: taking more cycles means more *energy* consumed by the division, but likely similar *power*. Just over more time. Also, a barrel shifter is *not* iterative; being constant-time instead of 1 cycle per shift-count or whatever is [the whole point of a barrel shifter](https://en.wikipedia.org/wiki/Barrel_shifter). But yes, variable-performance division is a good example. Variable-size extended precision is arguably less good, since it's obvious that running more instructions is going to be slower and take more energy. – Peter Cordes Apr 19 '22 at 21:16