24

I see the mantra of "profiling before optimization" repeated again and again here, on SO, and elsewhere. Although I certainly use profiling tools, I'm only occasionally surprised by the results. It seems like, as often as not, the profiler is just giving the same information that you could reasonably deduce by knowing the likely execution path of your program, understanding how your architecture works, and having a good idea of what optimization techniques the compiler can employ for you.

Because of this, I generally find that when I'm developing, I see areas of the code that I can sense are going to be bottlenecks (often times when writing the code I will think to myself "hey, this is going to be a critical part of the code, and needs to be fast, but I will use a slower implementation first to prove the concept, then optimize it later") and I just go ahead and optimize these areas before I bother doing much profiling.

Is this really such a bad practice, or is it just the natural result of gaining experience doing optimization?

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Cercerilla
  • 1,969
  • 11
  • 14
  • 2
    I don't think that it's a bad practice but it could certainly be a waste of time if the performance is sufficient without the optimizations. Have in mind that profiling is useful anyway, since it can discover things that you didn't expect/think to be a bottleneck. – sakisk Mar 31 '11 at 14:46
  • 12
    About the only optimization you should do is algorithmic optimization. This function is O(n^3) I bet I can reduce that to O(n.log(N)). But before you do that you must also weight the cost/benefit ratio. If the cost is a weeks worth of work on a routine that gets run once once a week on a data set of 10 items (thus taking 10 minutes) then is it really worth a week of development to reduce that time to 14 seconds? Note: Your cost $3000 per week. Computer cost (non critical thing) 0.0000000000000000000000000000000000000000000000000000000001 cent per second – Martin York Mar 31 '11 at 15:59
  • @Martin: Please post that as an answer. "Optimization" is often irrelevant, where proper algorithm design is the real issue. – S.Lott Mar 31 '11 at 18:09
  • 1
    @Martin: I don't know anybody who intentionally writes a O(too-big) algorithm, but stackoverflow is full of questions by people who want to know why their program seems too slow. I try to share what experience I have in performance tuning. Occasionally (very) the problem is big-O, but much more common is over-designed, over-abstracted data structures, generating call trees far more bushy, and doing far more things, than are really necessary. – Mike Dunlavey Mar 31 '11 at 23:50
  • 3
    Note that most posts on optimization on SO are about micro-optimizations. The infamous `++i` vs `i++` comes a lot for example. Shifting instead of divisions... People feel themselves SO clever for thinking about it, not even realizing that those are so simplistic that they are built into the compilers already :/ There's nothing wrong about optimizing an algorithm, but micro-optimizations are generally better left to compilers, Peephole optimization is that good. Even worse, code obsfucation *may* fool the compiler and prevent optimization :/ – Matthieu M. Apr 01 '11 at 06:31

11 Answers11

35

I think the golden rule here is "Everything in moderation".

If you are fairly certain a piece of code is going to prove to be a bottleneck, its not a horrible practice to do some initial optimization. At the very least, its a good idea to take steps to make sure it will be easy to refactor later.

What you want to avoid is going overboard by sacrificing time and readability in the name of micro-optimizations before you've seen actual data to justify such an effort.

dbyrne
  • 1,378
  • 1
  • 11
  • 15
  • 8
    +1: There has been so much talk about i++ vs ++i it makes me want to punch babies. – unholysampler Mar 31 '11 at 14:49
  • @unholysampler, doesn't that really depend on the processor and what the compiler produces? Those types of optimizations will work on some machines and not others. Kind of silly if you ask me. – Berin Loritsch Mar 31 '11 at 15:00
  • @unholysampler: There is the fact that ++i will always be at least as good as i++, but the reverse isn't true. – David Thornley Mar 31 '11 at 15:05
  • 3
    @Berin most (all?) fine-grained optimizations depend on the processor and what the compiler produces. Thats why its not bad to make some obvious, high-level optimizations early on, and wait until after you've done some profiling to do any sort of micro-tuning. – dbyrne Mar 31 '11 at 15:06
  • @BerinLoritsch: There are never perfect generalizations, but many compilers can optimize, especially when the result is not being stored anywhere. I am more focused on the fact that there a likely other things you will be able to change that have a much bigger impact than changing how you increment a number. – unholysampler Mar 31 '11 at 15:11
  • @dbyrne: This. You said it better than I did. – unholysampler Mar 31 '11 at 15:13
  • This is pretty much my thought as well. There are things that I would classify as optimizations, but sometimes it seems quite obvious that it's going to be a bottleneck. When I know that I'm trying to squeeze all the performance I can out of the hardware, sometimes it makes sense to do a little performance tweaking while I'm working in a function if it won't hurt maintainability and is going to be called a lot. – Cercerilla Mar 31 '11 at 20:57
12

The point of profiling before optimizing is that you need a baseline to determine how much improvement the optimization gave you. The "surprising" information I get from my profiler will be along these lines:

  • I knew that method was called a lot, but I didn't realize it was called that much.
  • Thread monitor contention almost never slows things down where I expect it.
  • I'm generating that many instances of XYZ? It should only be about n...

That said, many times the profiler merely confirms my suspicions. Good scientific method involves healthy doses of monitoring and experimentation. It's a bit difficult, but I do try to figure out more systemic problems with the profiler. I could do the obvious change and get a 5% improvement, or if I approached the problem differently I might be able to get a 25% improvement. (Most optimizations don't yield such a large improvement, but on occasion they do.) Of course, I wouldn't know how much my optimization improved the performance without a baseline measurement.

SusanW
  • 1,035
  • 10
  • 14
Berin Loritsch
  • 45,784
  • 7
  • 87
  • 160
  • 1
    Measuring the performance improvement is a good point. There are certainly cases where I know a given optimization will speed things up, but I make the change before gathering before and after data to see how much of an improvement it is. – Cercerilla Mar 31 '11 at 15:46
  • @CodeninjaTim: Not to mention that, if you're a contractor, presenting the speedup results is a nice way of showing those who have hired you that you haven't been wasting time with that code fiddling. – David Thornley Apr 01 '11 at 15:56
12

The line between "optimizing" and just "sensible design" is sometimes fairly fine, but other times pretty obvious. Just for example, you don't need a profiler to be pretty sure that if you're sorting a few million items, it's worth using an O(N log N) algorithm rather than an O(N2) algorithm. IMO, that just falls under being reasonable sensible, not optimization though.

There are also some things you might as well do, simply because they might provide a benefit, and the cost is minimal to nonexistent. To use @unholysampler's example, writing ++i instead of i++ may have some minuscule cost if you're accustomed to typing it as a post-increment, but (at most) it's temporary and trivial. I wouldn't spend any time rewriting working code for the sake of possibly saving a nanosecond, unless the profiler had shown that I really needed that time, and stood a reasonable chance of saving there. At the same time, when I'm just typing in new code, I'd work at habitually using the form that's likely to be better, because once you do so habitually it's free. It frequently won't gain anything, and even when it makes a difference it often won't be large enough to notice or care about -- but it's still free, so there's not reason not to do it.

Cases like those are fairly obvious, and in my mind would really fall under sensible design rather than what I'd think of as truly optimization. Most other cases of "optimization without representation" would be considerably harder to justify though. If you're going to spend any significant time or effort on the optimization, you should have something much more solid than a "gut feel" to justify it.

I should add that part of the reason I say that is that I think profiling code is extremely useful, even when your goal isn't optimization. A profiler gives a high-level overview of a piece of code that can be extremely useful even when you don't particularly care about optimization at all. Just for example, if you see 10x as many calls to allocate a resource as to free that type of resource, it's a pretty good clue that there's a real problem, even if the code currently seems to run just fine. When you get down to it, a lot of code has a lot of things that should match up (not necessarily 1:1, but somehow or other) and a profiler can show mismatches like that much more quickly than most other tools.

Jerry Coffin
  • 44,385
  • 5
  • 89
  • 162
  • I guess the question is where the line is between "sensible practices" and truly optimization. What about changing floating point operations to fixed point when the target machine doesn't have an FPU, or modifying the way you access elements of an array to exploit locality, or using SSE when your doing millions of additions inside a tight loop? – Cercerilla Mar 31 '11 at 16:11
  • @CodeninjaTim: Like I said, sometimes it's a fine line -- I don't have a solid answer for all those. Some will depend on your level of experience with the target environment, and the level of performance you expect -- unless you know enough to see an obvious discrepancy, you're probably better off profiling first. – Jerry Coffin Mar 31 '11 at 16:27
  • @CodeninjaTim: "where the line" is pretty simple. If you're designing the right algorithm, that's a good thing to do. If your just "tweaking and tuning", stop. Design the right algorithm first. Optimization is very, very rarely needed if you just picked the right algorithm to start with. – S.Lott Mar 31 '11 at 18:10
  • @CodeninjaTim: "changing floating point operations to fixed point when the target machine doesn't have an FPU." This isn't "optimization". That's architecture. "Modifying the way you access elements of an array to exploit locality." Rarely helpful and then only **after** you provably have the optimal algorithm and you can **prove** that it's the central performance problem. "Using SSE when your doing millions of additions inside a tight loop." Again. Only after you can prove the loop is already the optimal algorithm. – S.Lott Mar 31 '11 at 18:13
  • @S.Lott: Don't get me wrong, I'm all for using the right algorithms as the first step, but it does seem a bit overboard to me to say you need to prove that every algorithm is optimal before you do any optimizations. – Cercerilla Mar 31 '11 at 21:09
  • @CodeninjaTim: "a bit overboard"? Far from it. It's the minimum required to justify optimization. Taking a wrong algorithm and "optimizing" is a waste of time. Replacing a wrong algorithm with a right algorithm eliminates the need to "optimize". Pick a dumb sort, O(n^2) or worse, and tweak all you want. Compare with a Quicksort and you'll see that tweaking and optimizing the wrong algorithm is useless. – S.Lott Mar 31 '11 at 21:27
  • @S.Lott: While I agree that you need to assure you have a *reasonable* algorithm before optimizing, assuring it's truly optimal is a whole different story. Just for an obvious example, in many cases there are two or three reasonable choices, and the only way to know which is truly optimal is to test all of them on your specific data. Yes, Quicksort will beat bubble sort pretty consistently -- but what about Quicksort vs. Introsort vs. Mergesort? Quicksort probably wins the most often -- but Introsort is often favored because its worst-case performance is better. – Jerry Coffin Mar 31 '11 at 22:58
  • @Jerry Coffin: That's a kind of proof that the algorithm and data structure are appropriate. Only after that proof is passed **and** there is a measurable bottleneck can "optimization" be considered. – S.Lott Mar 31 '11 at 23:40
  • @S.Lott: I assumed by "theoretically optimal" you meant "prove the algorithm has a complexity no higher than the theoretical minimum complexity of the problem". – Cercerilla Apr 01 '11 at 14:04
  • @CodeninjaTim: I don't know why you're mentioning "theoretically optimal" at all. I never used the words. Optimal means optimal. Proof of optimal is easy (or you should do some research before coding.) A suboptimal algorithm that meets the need is still the result of proof. Not speculation. And not premature optimization. Design == proof. – S.Lott Apr 01 '11 at 14:49
  • @CodeninjaTim: I'd say that the line between "sensible practices" and "optimization" comes when either you're sacrificing readability or you're spending significant time and effort to produce local improvements. – David Thornley Apr 01 '11 at 15:58
  • @David Thornley: A fair point, although "readability" is subjective, and things that might seem sensible and readable to one person may seem like an obfuscated micro-optimization to someone else. – Cercerilla Apr 01 '11 at 18:27
8

"I can sense are going to be bottlenecks" (sic)

The problem with this statement is observer error. Just because you think it may be bad doesn't mean it IS bad. The profiler will give empirical evidence and keep you from spending time in an area that may give no improvement. That's why you should start with the profiler before optimizing. Let the program and profiler PROVE that it is a slow section first.

jmq
  • 6,048
  • 5
  • 28
  • 39
5

There are known performance killers and known best practices which avoid them. I don't think you need to profile to determine a cursor is slower in SQL Server than a set-based operation. If you start out knowing this, you will write code that performs better from the start without the need to profile the two options every time you write code.

If you are ajusting existing code, it is better to profile not only so you can be sure that the code you know is inefficient in general is working badly, but it can also show up problems you didn't know you had. I remember profiling one thing where I suspected the stored proc could be optimized (it could) and found out that it being called hundreds of times by the application when it only needed to be called it once.

Additionally, you have the benefit of being able to prove that you did in fact improve performance when you profile. I personally save these before and after figures and use them in my performance appraisal write-ups and in discussing my achievements when job hunting. It's nice to have actual figures to show how good you are at tuning.

HLGEM
  • 28,709
  • 4
  • 67
  • 116
  • 1
    I completely agree that you need to go with the profiler-first approach for code you didn't write (or wrote a while ago and don't still have fresh in your mind)- I'm thinking of cases like the cursor example, e.g. write code you know will be slow to prove the concept, then go back and write a more efficient version without profiling. – Cercerilla Mar 31 '11 at 15:51
2

The reason people say that is because you won't know what needs to be optimized before your profiling is finished, so you could end up wasting a lot of time for no benefits.

Still, if something leaps out at you as a bad idea (e.g. your coworker chooses to use deep recursion instead of a simple iterative loop) you can fix it. But you should be focused on moving forward, not naval gazing on old code. There is a stage in the process where that is appropriate.

Satanicpuppy
  • 6,210
  • 24
  • 28
1

I think everyone tries to write good code from the beginning. That sounds like what you're doing.

What I think should be done in the beginning is to just keep the design simple, especially the data structure. Often people start off assuming they need more sophisticated data structure, redundant data, and detailed notification techniques because they are worried about performance. In my experience, those things cause the problem they are supposed to avoid.

In spite of good coding practice and good design, performance problems creep in, and you need to remove them periodically. These are almost never things you could have guessed, and most profilers are not very good at finding them either. What's more, the optimization level of the compiler seldom has any effect on them, because mostly they are not tight compute-bound loops. Mostly they present as innocent-looking (or even invisible) function calls that, if you randomly snapshot the stack, are in the middle of it, and are consuming way more wall-clock time than you ever would have imagined, as shown by how often they appear there.

Here's my favorite example.

Mike Dunlavey
  • 12,815
  • 2
  • 35
  • 58
1

There are two kind of optimisation that can (relatively) safely be done before profiling.

  1. Algorithmic optimisation: choosing an algorithm with better average (or worst-case) complexity. This can even (should?) be done before beginning coding. You'll still have to check that the selected algorithm is the correct one given your real data set, but it is a good idea to start with an algorithm that is expected to fare better, isn't it?

  2. Data structure optimisation: laying out your data correctly, or using a data structure with better locality can increase your performance, but it will have an impact on the algorithms that can be used, so it is easier to do such an optimisation before coding (and thus you cannot use a profiler if there is no code). For example, when programming a video game, it is generally better to use struct-of-array (SoA) instead of array of struct (AoS) to store data as it will benefit from data locality, cache coherency, ...

SusanW
  • 1,035
  • 10
  • 14
0

The critical distinction is between:

  1. The optimized code is as simple or simpler than the un-optimized.

  2. The optimized code is more complex (and therefore more error prone and harder to modify in the future) than the un-optimized code.

In the first case, sure go ahead. In the second case you have to weigh the investment in development time (including opportunity cost on not using the same time to fix bugs or deliver features) and the future higher cost of maintenance for a more complex solution. You have to weigh this cost against the observable improvements to performance. How will you perform this judgement if you have no idea what the performance cost is? A function might be obviously inefficient, but if it only takes a few milliseconds anyway, a 1000x performance optimization will not provide any value. Not to mention the opportunity cost of not working on an optimization where it actually matters.

Second, your intuition about performance might very well be wrong - and you will never know if you "optimize" before measuring. For example many developers tend to think that say a O(log n) algorithm is faster than a O(n). But you don't know that. The O(n) algorithm might be faster as long as n is below some threshold. What is that threshold? You probably don't know. And what is n actually in your particular program? Is it usually above or below this threshold? How will you find out?

What if you intuition is wrong and you optimization actually makes the program run slower? If you profiled before and after, you realize your mistake and roll the changes back. At worst you wasted some time. If you don't profile you are none the wiser and have caused long term damage to the program.

The really difficult decision is when you can go down different roads in the architecture. Should the network communication be JSON over HTTP(simple) or protocol buffers over TCP (performant)? The problem here is you have to decide up front, before you can even measure performance, if you don't want to waste work by having to change protocol later. In this case you cannot just start out with the simple version and then optimize later when it turns out to be a problem. But you shouldn't just choose the most performant version by default either. You will have to do some educated guesses and projections.


I note you state that profiling "as often as not" gives the same result as you intuition based on your understanding of the program. I take that to mean that you have a 50% success rate in predicting the best way to allocate resources for optimizing. Given the short and long term cost of misapplied optimization, that is not really a very good rate to rely on.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
0

Yes it wrong to optimize before profiling BUT

Good programming, programming that makes code simpler and more straightforward doesn't require profiling. Good programming like moving unneeded initializations out of loops does not need anymore justification than the fact that you are improving the quality of the code.

IMHO the only times you need to profile is when you are specifically looking to improve performance. In order to show the improvement you must baseline first, and then show the delta.

Anytime you add complexity under the guise of optimization without showing proof that it was a bottleneck and the code does improve performance; is just doing bad programming.

dietbuddha
  • 8,677
  • 24
  • 36
-1

There is the 90/10 law which states that 90% of the time is spent in 10% of the code and only 10% of the time is spent in 90% of the code. A programmer cannot always anticipate which 10% of the code will cause the most impact if optimised so until you use a profiler to figure this out, any optimisation you do is premature. Better to figure out where the problem is before attempting to solve it, no?

xczhang
  • 29
  • 2
  • 1
    Just because you can't always figure out where the time is being spent doesn't mean you never can. – Cercerilla Apr 01 '11 at 13:59
  • That 90/10 stuff comes from a model of programming that has no subroutines. Suppose `main` calls `foo` 10 times, but only 1 time is necessary, and nearly all the time is spent in `foo`. So the way to speed up the program is to get rid of the 9 unnecessary calls, not to try to speed up `foo`, even though that's the 10% of code taking 90% of the time. – Mike Dunlavey Apr 03 '11 at 21:48