Where do you optimize?

Question

There are two areas to possibly optimize for speed in:

Where the most time is spent
The code that is called the most

Which is the best place to start optimizing?

Often code that is called the most often has low execution times already. Do you optimize the slower, less called areas or spend time optimizing the faster, heavily used areas?

Optimize the area of the application that stresses your clients or your architecture the most, depedent on whether your customers or your servers are complaining loudest. — Andy, Feb 22 '11 at 18:56
It's a value equation - the answer could be either. When you don't have real analysis, you go with your gut, based on likely payoff of your best ideas. — Nicole, Feb 22 '11 at 22:22
Neither. Look for code that is on the stack a large fraction of the time. — Mike Dunlavey, Mar 01 '11 at 14:58

greyfade · Accepted Answer · 2011-02-22T20:20:44.313

4

You should ignore the small efficiencies 95% of the time. First, make it work correctly, then analyze...

Your design.

Your choice of high-level algorithms can have a huge impact on the overall performance of your software, to the point where one seemingly trivial choice can mean the difference between waiting 20 minutes for the program to start and having a quick, responsive UI.

For example, in a 3D game: if you start with a simple flat list of objects for your scene graph, you'll see extremely poor performance for a relatively small number of objects; but if you instead implement a volume hierarchy (like an octree or BVH) and cull parts of the tree while drawing, you'll see a massive performance boost.

When your design seems correct, then you can move on to...

Low-level logic.

Lower-level algorithms can also have a significant impact. When doing image processing, for example, if you read the image in the wrong order, you'll experience massive slow-downs as you run into constant L2 cache misses; reordering your operations could mean a ten-fold increase in performance.

At this point, profile and find the place where the majority of the program's time is spent, and find a way to eliminate it.

edited Feb 22 '11 at 20:20

answered Feb 22 '11 at 18:56

greyfade

11,103
2
40
43

The program I'm working on is correct. We want to make it faster if we can, since it is a webservice that can take upwards of 30s to a minute to run. – Michael K Feb 22 '11 at 19:06
1

@Michael: In that case, it's time to get some profiling tool to analyze some typical program executions and pinpoint the sections of code that are running slowest. I'd really recommend using a tool for this. You can do a certain amount by intuition, but sometimes you'll find a surprise where an it's an API that call takes a significant amount of time, rather than your own code. In that case, time to look into the API and see what other functions are available, or if you should write your own replacement function... The *actual* performance hog is not always the *suspected* performance hog... – FrustratedWithFormsDesigner Feb 22 '11 at 19:13
1

We do have a profiling tool. I'm still learning it and what to do with the information. It's a relatively new subject for me, and very interesting. – Michael K Feb 22 '11 at 19:24
@Michael: Good! You are on (hopefully) your way to sucessful performance tuning! :) – FrustratedWithFormsDesigner Feb 22 '11 at 19:28
+1: This what I was going to say, but much more eloquently. – Dominique McDonnell Feb 22 '11 at 21:06

Ethel Evans · Answer 2 · 2011-02-22T19:58:48.750

First, run a profiler to find out where your code is spending its time.

Then, look at those places to see which ones look easy to optimize.

Look for the easiest fixes that will get the biggest gains first (go for the low-hanging fruit). Don't worry too much about how important it is, precisely. If it's easy, fix it. It will add up. 25 easy fixes might be faster than 1 big fix, and their cumulative effects might be larger. If it's hard, make a note or file a bug report so you can prioritize it later. Don't worry so much about "big" or "little" at this point - just do it, until you get to functions that are using very little time. Once you do this, you should have a better idea of which of the other issues you've uncovered might get the biggest wins for the least time investment.

Don't forget to follow up with profiling after your fixes as a sort of regression test, to verify that your performance changes had the effects you hoped for. Also, don't forget to run your regression suite, to ensure no functionality was broken. Sometimes bad performance indicates work-arounds, and trying to fix the performance will break functionality.

Small functions that can't be optimized but are using a lot of time might still be hints about where to optimize. Why is that function being called so much? Is there a function calling that small function that doesn't need to use it so much? Is work being duplicated, or unnecessary work being done? Look up the stack for the times it gets called until you are confident it should be called that often, and see if you find a larger function with an inefficient algorithm.

Edited to add: Since you have specific functionality that is taking a long time, try doing the steps above with just that specific function being run 10 or so times.

score 2 · Answer 3 · answered Feb 22 '11 at 18:47

It's hard to say. This really depends on what tha code is doing. Run a performance test, get a performance profile, and look and see how much actual time is spent in various areas. Your generalizations are... generalizations and it varies from project to project.

For example, the code that gets called the most might simply log to a file or console. There's not much point in optimizing that if it's already one or two lines of code that can't be made simpler, and it might be that it any effort to optimize something like this might not be worth the cost of actually coding it. The least-called-code could be some monster-sized query used in some horribly complex function. The function might only get called 100 times over an entire execution run (vs. 10000 for the simple logging statement), but if it takes 20 seconds for every call time it runs, maybe that's where optimization should begin? Or it could be the other way around, with the big query being the most-called, and the logging statement only called one for every 100 queries...

I don't usually worry about this sort of thing (until I need to do performance tuning) unless I have some idea ahead of time what's going to be happening.

score 1 · Answer 4 · answered Feb 22 '11 at 18:48

Well "we" usually don't optimize until there is an obvious need for optimization when something is unacceptably slow.

And when this need manifests itself it usually carries with it good hints as to what exactly calls for optimization.

So the answer is usual: "It depends."

score 1 · Answer 5 · answered Feb 22 '11 at 18:49

You should use a profiler on a handful of typical runs and look at the total time spend in each part of the code, no matter how or how often you got there. Optimizing these parts should always give a speed increase.

Depending on how low-level your implementation language is you should also find out what parts cause most cache misses. Consolidating the calling code will help here.

Mike Dunlavey · Answer 6 · 2011-02-22T21:02:29.160

The problem is the phrase "where the most time is spent" is ambiguous.

If it means "where the program counter is found most often" then I've seen programs where the most time was spent in string-compare, memory-allocate, math library functions. In other words, functions that the everyday programmer should never touch.

If it means "where in the programmer's code are statements executed that consume a large fraction of time" that is a more useful concept.

The problem with the concept of "code that is called the most" is, the amount of time it takes is the product of how often it is called and how much time it takes per call (including callees and I/O). Since the amount of time it takes can vary over several orders of magnitude, the number of times it is called doesn't tell you how much of a problem it is. Function A may be called 10 times and take 0.1 second, while function B may be called 1000 times and take a microsecond.

One thing that will tell you where to look is this: Whenever a line of code is causing time to be spent it is on the stack. So, for example, if a line of code is a hot spot, or if it is a call to a library function, or if it is the 20th call in a 30-level call tree, if it is responsible for 20% of time, then it is on the stack 20% of the time. Random-time samples of the stack will each have a 20% chance of displaying it. What's more, if samples can be taken during I/O, they will show you what accounts for the I/O, which can be just as or more wasteful as wasted CPU cycles.

And this is totally independent of how many times it is invoked.

By 'everyday programmer should never touch' do you mean not likely to touch? Also, is sampling the stack a practical profiling method? — Michael K, Feb 22 '11 at 20:49
@Michael: Yes, sampling the stack is a method that modern profilers are based on, such as [Zoom](http://www.rotateright.com). Also, [totally manual](http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024) works [surprisingly well](http://stackoverflow.com/questions/4387895/if-profiler-is-not-the-answer-what-other-choices-do-we-have/4390868#4390868). — Mike Dunlavey, Feb 22 '11 at 21:08
@Michael: It's like the legal concept of joint responsibility. At a point in time, the responsibility for the PC being in an instruction is the joint responsibility of not only that instruction, but every call above it on the stack. Eliminating any one of them would prevent it getting into that particular state. — Mike Dunlavey, Feb 22 '11 at 23:27

score 0 · Answer 7 · 2017-12-10T02:31:09.807

Typically it's going to be meatier functions in most cases, not the functions called most often a billion times in a loop.

When you do sample-based profiling (with a tool or by hand), often the biggest hotspots are going to be in tiny leafy calls which do simple things, like a function to compare two integers.

That function often isn't going to benefit from much, if any, optimization. At the very least those granular hotspots are rarely top priority. It's the function calling that leaf function that might be the trouble maker, or the function calling the function calling the function, like a sub-optimal sorting algorithm. With good tools you can drill down from callee to caller, and also see who spends the most time calling the callee.

It's often a mistake to obsess on callees and not look at callers down the call graph in a profiling session unless you are doing things very inefficiently at the micro level. Otherwise you might be overly sweating the small stuff and losing sight of the big picture. Just having a profiler in hand doesn't protect you from obsessing over trivial things. It's just a first step in the right direction.

Also you have to make sure you are profiling operations that align with things the users actually want to do, otherwise being totally disciplined and scientific in your measurements and benchmarks is worthless since it's not aligning with what the customers do with the product. I had a colleague one time who tuned the hell out of a subdivision algorithm to subdivide a cube to a billion facets and he took a lot of pride in that.... except users don't subdivide simple 6-polygon cubes into a billion facets. The whole thing slowed to a crawl when it tried to run on a production car model with over 100,000 polygons to subdivide, at which point it couldn't even do 2 or 3 levels of subdivision without slowing to a crawl. Put simply he wrote code that was super optimized for unrealistically small input sizes that didn't scale well at all to handle real world use cases, partially because he was a brilliant mathematician but someone unacquainted with how users actually used the product.

You have to optimize real use cases aligned with your users' interests or else it's worse than worthless, since all those optimizations which tend to at least somewhat degrade the maintainability of code have little user benefit and only all those negatives for the codebase.

score 0 · Answer 8 · answered Feb 22 '11 at 18:49

Optimize where the most time is spent unless there is a particular reason not to (i.e. most of the time is spent doing asynchronous processing that humans don't really care whether it finishes in 5 minutes or 10 minutes). The code that is called the most, naturally, will tend to rack up a relatively large portion of the total elapsed time simply because even short execution times add up when you do it thousands of times.

score 0 · Answer 9 · answered Feb 22 '11 at 18:53

0

You have to work on the code that is taking the most time. Improving code that only accounts for a few percent of the run time can only give you a small improvement.

Have you taken measurements so that you know which code is taking the most time?

answered Feb 22 '11 at 18:53

kevin cline

33,608
3
71
142

score 0 · Answer 10 · answered Feb 22 '11 at 21:54

I used to do benchmarking and marketing for a supercomputer vendor, so beating the competition speedwise wasn't the most importnt thing, it was the ONLY thing. That sort of result required that you use a combination of good algorthm, and a data structure that would allow the most CPU intensive portions to run a peak efficiently. That meant you had to have a pretty good idea of what the most computationally intensive operations were going to be, and what sorts of data structures would allow them to run fastest. Then it was a matter of building the application around those optimized kernels/data structures.

In the more general sense, the typical after the fact tuning. You profile, look at the hot spots, and the hot spots that you think you can speed up are the ones you work on. But rarely will this approach give you anything close to the fastest possible implementation.

More generally though, (algorithmic blunders not withstanding), for modern machines you got to be thinking that performance is determined by three things: data access, data access, and data access! Learn about the memory hierarchy (registers, caches, TLBs, pages, etc.) and design your data structures to make as good a use of them as possible. Generally this means that you want to be able to have loops run within a compact memory footprint. If you instead just write (or are given) an application and then try to optimize it, you are usually saddled with data structures that make poor use of the memory hierarchy, and changing the data structures usually involves a major refactoring excercise, so you are often stuck.

score 0 · Answer 11 · answered Feb 22 '11 at 23:57

If you want a return on your optimization effort you must look at the code that take the most time. My general target is something that takes at least 80% of the time. I generally target a 10 times performance gain. This sometimes requires a major change in how that code is designed. This kind of change gets you something that runs roughly four times faster.

My all time best performance gain is reducing a run time from 3 days to 9 minutes. The code I optimized went from 3 days to 3 minutes. The application the replaced that application reduced this to 9 second, but that required a change in language, and a complete rewrite.

Optimizing an already fast application can be a fools errand. I would still target the area taking the most time. If you can take something using 10% of the time to return instantaneously you still require 90% of the time. You quickly hit the rule of diminishing returns.

Depending on what you are optimizing for the rule still applies. Look for the major resource users and optimize them. If the resource you are optimizing is the systems bottleneck, you may find that all you do is change the bottleneck to another resource.

Where do you optimize?

11 Answers11

Your design.

Low-level logic.