@vicatcu's answer is pretty comprehensive. One additional thing to note is that the CPU may run into wait states (stalled CPU cycles) when accessing I/O, including program and data memory.
For example, we're using a TI F28335 DSP; some areas of RAM are 0-wait state for program and data memory, so when you execute code in RAM, it runs at 1 cycle per instruction (except for those instructions that take more than 1 cycle). When you execute code from FLASH memory (built-in EEPROM, more or less), however, it can't run at the full 150MHz and it is several times slower.
With respect to high-speed interrupt code, you must learn a number of things.
First, become very familiar with your compiler. If the compiler does a good job, it shouldn't be that much slower than hand-coded assembly for most things. (where "that much slower": a factor of 2 would be OK by me; a factor of 10 would be unacceptable) You need to learn how (and when) to use compiler optimization flags, and every once in a while you should look at the compiler's output to see how it does.
Some other things that you can have the compiler do to speedup code:
use inline functions (can't remember if C supports this or if it's only a C++-ism), both for small functions and for functions that are going to be executed only once or twice. The downside is inline functions are hard to debug, especially if compiler optimization is turned on. But they save you unnecessary call/return sequences, especially if the "function" abstraction is for conceptual design purposes rather than code implementation.
Look at your compiler's manual to see if it has intrinsic functions -- these are compiler-dependent builtin functions that map directly to the processor's assembly instructions; some processors have assembly instructions that do useful things like min / max / bit reverse and you can save time doing so.
If you're doing numerical computation, make sure you're not calling math-library functions unnecessarily. We had one case where the code was something like y = (y+1) % 4
for a counter that had a period of 4, expecting the compiler to implement the modulo 4 as a bitwise-AND. Instead it called the math library. So we replaced with y = (y+1) & 3
to do what we wanted.
Get familiar with the bit-twiddling hacks page. I guarantee you'll use at least one of these often.
You also should be using your CPU's timer peripheral(s) to measure code execution time -- most of them have a timer/counter that can be set to run at the CPU clock frequency. Capture a copy of the counter at the beginning and end of your critical code, and you can see how long it takes. If you can't do that, another alternative is to lower an output pin at the beginning of your code, and raise it at the end, and look at this output on an oscilloscope to time the execution. There are tradeoffs to each approach: the internal timer/counter is more flexible (you can time several things) but harder to get the information out, whereas setting/clearing an output pin is immediately visible on a scope and you can capture statistics, but it's hard to distinguish multiple events.
Finally, there is a very important skill that comes with experience -- both general and with specific processor/compiler combinations: knowing when and when not to optimize. In general the answer is don't optimize. The Donald Knuth quote gets posted frequently on StackOverflow (usually just the last part):
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil
But you're in a situation where you know you have to do some kind of optimization, so it's time to bite the bullet and optimize (or get a faster processor, or both). Do NOT write your whole ISR in assembly. That is almost a guaranteed disaster -- if you do it, within months or even weeks you'll forget parts of what you did and why, and the code is likely to be very brittle and difficult to change. There are likely to be portions of your code, however, that are good candidates for assembly.
Signs that parts of your code are well-suited for assembly-coding:
- functions that are well-contained, well-defined small routines unlikely to change
- functions that can utilize specific assembly instructions (min/max/right shift/etc)
- functions that get called many times (gets you a multiplier: if you save 0.5usec on each call, and it gets called 10 times, that saves you 5 usec which is significant in your case)
Learn your compiler's function calling conventions (e.g. where it puts the arguments in registers, and which registers it saves/restores) so that you can write C-callable assembly routines.
In my current project, we have a pretty large codebase with critical code that has to run in a 10kHz interrupt (100usec -- sound familiar?) and there aren't that many functions that are written in assembly. The ones that are, are things like CRC calculation, software queues, ADC gain/offset compensation.
Good luck!