11

Is it possible to monitor a block of code and determine the number of processor clock cycles that code took on an Arduino and/or AVR atmel processor? or, should I rather monitor microseconds passed before and after code run? Note: I'm not concerned with real time (as in, how many real seconds passed) as much as I am in "how many clock cycles does this code require from the CPU"

The current solution I can come up with is from time.c:

#define clockCyclesPerMicrosecond() ( F_CPU / 1000000L )
#define clockCyclesToMicroseconds(a) ( (a) / clockCyclesPerMicrosecond() )

wiring.c adds:

#define microsecondsToClockCycles(a) ( (a) * clockCyclesPerMicrosecond() )

By this account i could calculate clock cyles passed by monitoring microseconds passed and then pass that to microsecondsToClockCycles(). My question is, is there a better way?

sidenote: are there good resources for performance monitoring of the AVR. lmgtfy.com and various forums searches do not provide any obvious results, other than exploration of timers

thanks

cyphunk
  • 602
  • 1
  • 6
  • 12

5 Answers5

6

The simplest method is to make your code pull some pin up before it executes the code you want to time, and pull it low after it has finished doing whatever. Then make the code loop (or use digital oscilloscope with memory in single-shot mode) and just scope then pin. The length of the pulse tells you how long it took to execute the piece of code plus one clock cycle from changing the pin state (I think it takes one cycle, not 100% sure).

Dago
  • 910
  • 7
  • 12
  • Thanks. Yes, I can see this is probably the most accurate solution. I'm still chucking away at code that would give me at least even general cycle use analysis inside the code. I'm going to be using this to build some testing tools and it would be nice to set my upper bounds for parameters such as max allowed run time based on how efficiently the code + everything related to it is running on the current Atmel CPU in use – cyphunk Nov 26 '09 at 22:11
4

What do you mean by "monitor"?

It shouldn't be hard to count clock cycles for AVR for small pieces of assembly code.

You can also set a port before the code is executed and reset it afterwards, and monitor that with a logic analyzer or an oszilloscope to get the timing.

And you could also read the time from a fast-running timer, as you say.

starblue
  • 6,462
  • 2
  • 21
  • 35
  • By monitor I mean determine the number of cycles used by code. something like (note, formatting of code will probably be flattened by comment engine): clocks = startCountingAtmegaClocks(); for ... { for ... { digitalRead ... }} Serial.print("num of used cycles :"); Serial.print(currentCountingAtmegaClocks() - clocks, DEC); – cyphunk Nov 26 '09 at 11:18
  • But yeah, your response is what I've assumed my options are. I guess, I assume if I can calculate the clock cycles the assembler would take by hand that someone has perhaps already written some nice code to do this programatically – cyphunk Nov 26 '09 at 11:23
3

This is an example for Arduino using the clockCyclesPerMicrosecond() function to calculate the clocks that have passed. This code will wait 4 seconds, then print the time passed since start of program. Left 3 values are total time (microseconds, milliseconds, total clock cycles) and right most 3 are elapsed times:

Output:

clocks for 1us:16
runtime us, ms, ck :: elapsed tme us, ms ck
4003236 4002 64051776 :: 4003236 4002 64051760
8006668 8006 128106688 :: 4003432 4004 64054912
12010508    12010 192168128 :: 4003840 4004 64061440
16014348    16014 256229568 :: 4003840 4004 64061440
20018188    20018 320291008 :: 4003840 4004 64061440
24022028    24022 384352448 :: 4003840 4004 64061440
28026892    28026 448430272 :: 4004864 4004 64077824
32030732    32030 512491712 :: 4003840 4004 64061440
36034572    36034 576553152 :: 4003840 4004 64061440
40038412    40038 640614592 :: 4003840 4004 64061440
44042252    44042 704676032 :: 4003840 4004 64061440
48046092    48046 768737472 :: 4003840 4004 64061440
52050956    52050 832815296 :: 4004864 4004 64077824

I'm sure there is a reasonable explanation why the first too loops had shorter elapsed clock cycles than most and why all other loops toggle between two lengths of clock cycles.

Code:

unsigned long us, ms, ck;
unsigned long _us, _ms, _ck;
unsigned long __us, __ms, __ck;
void setup() {
        Serial.begin(9600);
}
boolean firstloop=1;
void loop() { 
        delay(4000);

        if (firstloop) {
                Serial.print("clocks for 1us:");
                ck=microsecondsToClockCycles(1);
                Serial.println(ck,DEC);
                firstloop--;
                Serial.println("runtime us, ms, ck :: elapsed tme us, ms ck");
        }

        _us=us;
        _ms=ms;
        _ck=ck;

        us=micros(); // us since program start
        ms=millis();
        //ms=us/1000;
        ck=microsecondsToClockCycles(us);
        Serial.print(us,DEC);
        Serial.print("\t");
        Serial.print(ms,DEC);
        Serial.print("\t");
        Serial.print(ck,DEC);     
        Serial.print("\t::\t");

        __us = us - _us;
        __ms = ms - _ms;
        __ck = ck - _ck;
        Serial.print(__us,DEC);
        Serial.print("\t");
        Serial.print(__ms,DEC);
        Serial.print("\t");
        Serial.println(__ck,DEC);     

}

Sidenote: if you remove the 4 second delay you will start to see the effects of the Serial.print() much more clearly. Note, here 2 runs are compared. I've only included 4 samples near each other from their respective logs.

Run 1:

5000604 5000 80009664 :: 2516 2 40256
6001424 6001 96022784 :: 2520 3 40320
7002184 7002 112034944 :: 2600 3 41600
8001292 8001 128020672 :: 2600 3 41600

Run 2:

5002460 5002 80039360 :: 2524 3 40384
6000728 6000 96011648 :: 2520 2 40320
7001452 7001 112023232 :: 2600 3 41600
8000552 8000 128008832 :: 2604 3 41664

The elapsed time increases over total run time. After a second has elapsed the clocks increase on average from 40k to 44k. This happens consistently a few milliseconds after 1 second and the elapsed clocks remains around 44k for at least the next 10 seconds (I havent tested it further). This is why monitoring is useful or needed. Perhaps the decreased efficiency has to do with configuration or bugs in serial? Or perhaps the code is not using memory properly and has a leak that effects performance, etc.

cyphunk
  • 602
  • 1
  • 6
  • 12
  • many years later, i would still like something that shows the clocks more accurately with code (as apposed to an oscilloscope). At I'm trying to determine the number of clock cycles required for a digitalWrite() in both 16MHZ and 8MHZ. In 16MHZ I get 8us/64clk. But in 8MHZ I get 0us/0clk. – cyphunk Sep 24 '12 at 17:20
1

Since every line of code added to your source will have an impact on performance and could change optimizations applied. The changes should be the minimum required to perform the task.

I just found an Atmel Studio plugin called "Annotated Assembly File Debugger". http://www.atmel.com/webdoc/aafdebugger/pr01.html It seems like stepping through the actual generated assembly language while probably tedious is going to show you exactly what is happening. You might still have to decode how many cycles it takes for each instruction but it would get a lot closer than some of the other posted options.

For those that don't know in the Output folder of your project is a file with an LSS extension. This file contains all of your original source code as comments and below each line is the assembly language that was generated based on that line of code. Generating the LSS file can be turned off so check the following setting.

Project Properties | Toolchain | AVR/GNU Common | OutputFiles

Checkbox ".lss (Generate lss file)

James
  • 322
  • 3
  • 11
1

You could use one of the built in timers. Get everything set up for prescaller=1 and TCNT=0 before the block. Then enable the timer on the line before the block and disable it on the line after the block. The TCNT will now hold the number of cycles the block took, less the fixed cycles for the enable and disable code.

Note that the TNCT will overflow after 65535 clock cycles on a 16 bit timer. You can use the overflow flag to double run time. If you still need longer, you can use a prescaler, but will get less resolution.

bigjosh
  • 9,888
  • 29
  • 48