How to find bottlenecks in an application?

Question

I'm building an application with lots of components, a lot of which are third-party so I only know what I can get from their documentation.

From time to time, by pure luck, I find out one of these components was pretty heavy and had an option to cache stuff.

I can't stop thinking that if I weren't lucky, one of these components would end up killing the performance of my app and I wouldn't have a clue what to do. I would end up adding more hardware like an idiot.

So how do you find these kind of bottlenecks? I mean I wouldn't have a nice report saying "this part is slow", like it happens with the database.

For example, today I found that a serializer I was using was heavy and wasn't being cached. Are there repeatable steps I could have followed to discover that the serializer was slowing things down?

You are looking for profiling. Which one you use depends significantly upon the language you are programming in. See [How to see what parts of your java code are run most often?](http://programmers.stackexchange.com/questions/207819/how-to-see-what-parts-of-your-java-code-are-run-most-often) for a java answer. — , Sep 20 '13 at 22:17

score 21 · Accepted Answer · 2013-09-21T13:26:18.843

There are two paths to follow.

Blindly follow hunches.
Get an application profiler.

Let's look at the pros & cons.

Blindly follow hunches

This is:

Low capital cost (hunches are free)
Exceptionally easy to instrument (there isn't any instrumentation)
Relies heavily upon your programming experience to see problems
Not terribly accurate at finding issues
Great at creating rabbit holes that consume amazing amounts of time but generally don't generate much measurable improvement. (which assumes you're measuring things)

Application Profiling

Generally has an acquisition cost; and the really good profilers can get expensive
Provides measurements of application performance. The better ones isolate methods that
1. Consume the most amount of time
2. Have the most amount of calls
Require you to be able to instrument your application or at least manually step through use cases that are of concern.
Can flood you with too much information, so you need to understand what to look for

Depending upon how large your application, you can get away with following blind hunches and relying upon wall-clock time or throwing some DateTime calls in your code to keep track of how long execution takes. I have personally taken some applications with 24+ hour runtimes and brought them down to 15 minutes and 1 - 2 hours using this approach. But they were small applications and I could understand all of the code involved.

On the other hand, I have used profilers a couple of times with really large applications. You need to be able to recreate the use cases that are of most concern, and that can be a pain. Manually stepping through long processes can be problematic. But I have used these tools to track background start-up processes and have been able to generate significant performance improvements for those applications.

To find some profilers, just search for "your_language + profiler" and you will likely have quite a few options to pick from.

Follow up to comments:

Mason Wheeler correctly points out that some good profilers can be inexpensive. The functionality provided is most likely the biggest driver for cost. In some ways, it's like any other software product - there is a correlation between cost and features.
There can be a half-way point as ChocoDeveloper pointed out. You can build your own profiler, but this can be a significant investment of time. And you run the risks of the blind path. If you just know the problem is in MyModule.Foo(), then you can be oblivious to the problems in Module2.Bar(). But if you're aware of that risk, you can make sure you profile everything.
- You can create a lighter-weight profiler this way by being very specific about what's measured and logged. Some of the better profilers provide this ability too though.
Profilers can be run on production code. It's generally not done as they do have a performance impact (as Doc Brown points out). But if you can't recreate the performance issues anywhere else except production, then you run the profiler in production. You have to understand the risks and potential issues it will create, but it can be done.
My initial response didn't give a lot of credit to building your own profiler. In my experience, I have found it less expensive to go with an already existing profiler than to build my own. Profilers aren't necessarily super, super complicated, but they are another project that you're taking on and will need to make sure is working correctly before you can trust the results. For the organizations I have worked for, building a profiler was tangential to our core competencies so it didn't make business sense for us to do so.
Have a look at the Wikipedia article on Program Optimization if you haven't already. Here are a few choice quotes cited in that article.

"Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is." — Rob Pike

"The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." — Michael A. Jackson

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" — Donald Knuth

`the really good profilers can get expensive.` That definitely depends on the language. For Delphi, for example, easily the most useful profiler is a freeware tool simply called [Sampling Profiler.](http://delphitools.info/samplingprofiler/) — Mason Wheeler, Sep 21 '13 at 01:36
@ChocoDeveloper: there is an option between those 2 extremes, namely adding some time measuring functions to your application in critical areas on your own. The advantage of that approach is that you have it full under your control, for example you could allow your application to switch on time-logging whenever you want (also in production). The disadvantage is when you don't know exactly which processes to measure, you may end up adding that measuring function *at a lot of places* in your application. — Doc Brown, Sep 21 '13 at 07:21
@ChocoDeveloper: on the other hand, it may be possible to find a third-party application profiler which can be used in production, but that depends heavily on your environment (which you still did not describe). One problem, however, which can occur with such a tool, is that full-profiling of your application may slow it down significantly, making production-use infeasible. — Doc Brown, Sep 21 '13 at 07:26
@ChocoDeveloper, sometimes adding a standard profiler to a production environment is not acceptable, either due to performance degradation, security, or both. If you have the flexibility to do so, then you should because it's the simplest approach. If you cannot, then you may need alternatives. — Brandon, Sep 21 '13 at 14:38
Another option between the two extremes is a poor man's sampling profiler. Basically, you hit break 10-20 times while your application is running and see where it breaks. This several benefits (can see the call stack, requires less setup than standard profilers, easy) but also has several downsides (less practical with multi-threaded code, code that is especially slow with a debugger attached is over-reported, can be tedious). See [here](http://stackoverflow.com/a/378024/18192) for discussion. To some extent, this is equivalent to running a sampling profiler, but with fewer samples. — Brian, Sep 21 '13 at 19:24
@Brian: there are OS-level tools that can capture call stacks from each thread, CPU and memory usage at regular intervals. (These tools will require administrator privilege to run.) SysInternals provides a suite of these tools *for free*. Security, or rather the security-mindset (distrust) of the IT administrator, is the only reason for not doing it in production. — rwong, Sep 21 '13 at 20:04

score 8 · Answer 2 · edited Apr 12 '17 at 07:31

Glen's answer is pretty accurate and comprehensive, but I'd like to throw Load Testing into the mix as well.

Profiling is difficult and sometimes expensive to do. It's essentially impossible to do in production. But worse than that, if your entire approach to performance management is based on profiling, you won't find out about these problems until it's too late.

If your team/project is managed well enough, you should have load tests, and this is of course assuming you already have all the prerequisites like Continuous Integration and Integration Tests. Good load tests don't just measure stuff, they establish a baseline so it's easy to see sudden or even gradual changes.

When you see a sudden change in the output of your load tests, you know it's due to a change you made since the last test run. And if you're running load tests every day or even every week, it's not hard at all to retrace your steps and find out which commit caused your throughput and/or latency to take a dive.

Load testing may not help you much after you discover a problem in production. But once you've got performance up to acceptable levels, you really should consider investing in a load testing solution so that you can get fast feedback on performance issues, preferably before you're entrenched in a particular architecture or product and can't easily back out.

score 4 · Answer 3 · edited May 23 '17 at 12:40

Better late than never:

So how do you find these kind of bottlenecks? I mean I wouldn't have a nice report saying "this part is slow"...

Here's how I find them.

Slowness will not always be localized to any "this part", and if it isn't, that doesn't mean you can ignore it.

ADDED: Responding to the comments, I hoped the video would explain it best, but @gnat has a good point. I've written a lot on this, and there's a more scientific explanation here.

In a nutshell, you can get more stack samples on wall-clock time, for example from a profiler (if it's a good one). The result of having a large number of samples is you are not allowed to see the rich information in any of them, but only summary measurements, and those are too vague to tell you what you could fix.

For example, there can be something distributed around the code, so it's not concentrated in any particular routine, that, if fixed, saves 20% of the time. If you look at 10 or 20 samples, it will appear on about 2 to 4 of them, or more, and you will see it. However, if there are 10000 samples, 2000 of them, or more, will be doing that activity, but since all you get to see are time measurements of routines, you won't see the problem.

StackOverflow is full of questions of the form "what does this profiler output mean?" So the common response to much profiler output is either "what is this?" or its cousin "there's no way to speed up this code".

When in fact, there is a way, that you would see if you examined a small number of the samples in detail.

That's the price you pay for all those samples.

Interesting, thanks. I'm not sure I understood it though. Can't you just generate more data samples for your profiler with the same script? — ChocoDeveloper, Sep 25 '13 at 14:46
would you mind explaining more on what it does and why do you recommend it as answering the question asked? ["Link-only answers"](http://meta.stackoverflow.com/tags/link-only-answers/info "what's this") are not quite welcome at Stack Exchange. Please also note that many corporate networks block youtube, making your answer useless for readers behind these — gnat, Sep 25 '13 at 14:56
@ChocoDeveloper: Revisiting this 2 years later. It's like interviewing 10 candidates for a job, versus 1000. With the small number you're going to pay close attention to each one. With the large number you're going to skim, missing important details. That's the trouble with most profilers - it's not the taking of samples, it's the conversion to numbers. — Mike Dunlavey, Jun 22 '15 at 18:57

score 1 · Answer 4 · answered Sep 22 '13 at 13:16

Performance tuning should be guided by observing which resource is actually critical for the execution time or for the throughput. In a multi-user multi-processing system, this is difficult to assess with a general purpose profiler and largely depends on the type of application.

The following steps should be gone through as often as necessary:

Define a performance target in terms of time and/or throughput you are aiming at.
Determine the overloaded resource (CPU, RAM, Lan, graphics board, separate database, processes/handles ...).
Find out why the overload occurs.
Change the application to reduce the load.
Back to 2. until target is reached or your time budget is exhausted

For PC-based applications, Microsoft/Sysinterals Process Explorer gives a good overview which resource might limit the performance. Use tools like top, ntop, iostat or vmstat to get an overview on Unix systems. Excessive page file traffic indicates too high memory consumption. Databases like Oracle usually have a tool to find SQL statements which cause most of the database load.

Make sure that the system under test is free of any interfering external influences (malware scanners for example). Repeat the tests to make your results reliable.

Once you have cured one bottleneck, the next bottleneck might become visible.

Your step #3 kind of reminds me of [this](http://star.psy.ohio-state.edu/coglab/Miracle.html)... I think that's the real focus of the question here. The point about identifying the overloaded resource is a good one, but often you won't find one under normal circumstances (e.g. due to locking, slow database queries, etc.) Stressing the system generally will cause at least one resource to be overloaded, although it's not *necessarily* going to identify the same problem you're experiencing at lighter load. — Aaronaught, Sep 22 '13 at 13:31
Yes, you have a point there. But rather than "wild guessing" it usually helps to focus on the resource identified in 2. A profiler can be used more efficiently and effectively if you have narrowed down the tuning options. — Axel Kemper, Sep 22 '13 at 14:21

How to find bottlenecks in an application?

4 Answers4

Blindly follow hunches

Application Profiling