62

A colleague of mine today suggested that we go through all of the queries in our application and to add indices accordingly.

I feel this is premature optimisation because our application is not even released yet. I suggested to monitor for slow queries once we go live and then add indices accordingly.

What is the general consensus when designing your database, should you add a matching index every time you write a new query? Or is it better to just monitor and see how it goes?

Marco de Jongh
  • 897
  • 6
  • 13
  • 32
    It might be a matter of opinion, however I feel that *some* indexes could be added a priori. – Basile Starynkevitch Feb 24 '15 at 12:05
  • 2
    @BasileStarynkevitch Totally agree on that we already have primary key indexes and the works. But where do you draw the line? – Marco de Jongh Feb 24 '15 at 13:45
  • This is a matter of opinion. You need to provide a lot more detail about your case if you want to get anything more than vague generalities. – Walter Mitty Feb 24 '15 at 14:22
  • 1
    My two cents from experience: I was testing some of my early search queries on a subset of our database. The tests I ran were totally fine on my local copy. I then pushed the application to the staging area that hosts the full database. My tests ran in **< 500 ms**, whereas the staging system took several **minutes** to resolve. My boss was thoroughly confused as to why the app wasn't loading. *Explain*-type operations are your friend... At least look for sequential scans on large tables, at the very least! – Chris Cirefice Feb 25 '15 at 04:58
  • 2
    Not adding indexes is like using bubblesort. Most often you won't find any problems when you test it but once your program starts scaling up on live you're in for a whole lot of problems. And indexes can easily make a factor 100 in speed difference. – Pieter B Feb 25 '15 at 07:41
  • 3
    Just always remember: An Index is not a magical thing that will speed up your queries. An Index will incure cost on most DML-Operations and depending on the type can lead to a lot of waiting when many people update the same table. For queries: There are many queries which don't benefit at all from an index, where a FTS is the fastest or where Partitioning does all the work for you. - Only add index where you KNOW they will be beneficial! – Falco Feb 25 '15 at 09:24
  • @Falco that sounds like part of an answer. – dcorking Feb 25 '15 at 12:01
  • A lot of time when I think "oh we can add an index" I find I actually have the wrong query. Maybe that's just me. – corsiKa Feb 25 '15 at 19:45
  • 1
    Can't answer, because I can't find the "answer" button on here... protection may've removed it. Anyway, of "I feel this is premature optimisation because our application is not even released yet. I suggested to monitor for slow queries once we go live and then add indices accordingly." - BOTH opinions are wrong. You should add indexes neither at development time, nor maintenance time, but at test time. If you go straight from dev to release, then you most likely have far greater problems than just a few unnecessary indexes in your DB. – Dewi Morgan Feb 26 '15 at 01:54
  • @DewiMorgan You probably don't have sufficient reputation to answer yet. – bye Feb 26 '15 at 12:16
  • @Poldie Protected questions require a rep of 10. I have 101. Should be enough, but I don't see an "answer" link. I am likely just being blind. Or it's a CSS issue or adblock issue or some such. I've had reply buttons be blocked as ads on other sites before. [Edit: neither using chrome nor turning off ABP caused a button to appear. Conclusion: I am most likely being blind.] – Dewi Morgan Feb 27 '15 at 03:02
  • 1
    @DewiMorgan Actually, I don't see an answer button either, and i've also got 101 reputation! So it's either a bug or the text describing the score you need is incorrect. – bye Feb 27 '15 at 08:45
  • @Poldie Yay! And there was me convinced I was just being an idiot somehow! – Dewi Morgan Feb 27 '15 at 20:04
  • After you've been working with databases for a while, its clear from the point of table creation how you'll be querying the data most of the time. So certain indexes would be pretty obvious to add when first building the tables. Also you don't want to *only* profile queries when they are *slow*. Optimizing fast queries may give you bigger performance boost if they are done more frequently. – GrandmasterB Mar 04 '15 at 20:21

10 Answers10

133

Premature optimization is "optimizing" something because of a vague, intuitive sense that, y'know, this will probably be slow, especially to the detriment of code readability and maintainability. It doesn't mean willfully not following well-established good practices regarding performance.

Sometimes that's a difficult line to draw, but I'd definitely say that not adding any indices before you go live is too-late optimization; this will punish early adopters--your most eager and most important users--and give them a negative view of your product, which they will then spread around in reviews, discussions, etc. Monitoring queries to find pain points that need indexing is a good idea, but I'd make sure to do that no later than the beta.

Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
  • 11
    Yes, it should be done in the load testing phase – Alvaro Feb 24 '15 at 14:29
  • 152
    Optimising before you know where the slow parts are is premature optimisation. Releasing the thing before you know where the slow parts are is *premature releasing*! – MathematicalOrchid Feb 24 '15 at 16:35
  • 4
    @MathematicalOrchid: That's a great phrasing! May I borrow it elsewhere? – Pieter Geerkens Feb 25 '15 at 23:11
  • I'd add that if you understand the query patterns of your application and have set expectations regarding the amount of data in various tables (and its uniformity or lack thereof), you should be able to set up reasonable indices before any kind of release. You might miss a few, or need to tweak a few, after profiling the production app, but that's better than releasing a database with no indices at all and relying strictly on profiling rather than analysis. Load testing can miss scenarios that common sense would catch. Also: there's generally not much of an excuse for not having a primary key. – Aaronaught Feb 26 '15 at 04:40
  • 3
    @PieterGeerkens Sure, knock yourself out! ;-) I'm just sad that 91+ upvotes don't earn me any rep... heh. – MathematicalOrchid Feb 26 '15 at 09:53
  • Specifically with databases, it's usually recommended to turn indexing off *before* a major update, then resuming the indexing *after*. Even if this a production database, you may have a specific time when you should choose to turn on or turn off indexing. Indexing or not indexing is a feature of performance tuning. Indexing speeds up reads and slows down writes. Not indexing speeds up writes and slows down reads (assuming the indexes are out of date). – phyrfox Feb 26 '15 at 17:23
  • 3
    @MathematicalOrchid should've been an answer. Could run for "smallest-straight-to-the-point" answer ever. – Mindwin Remember Monica Feb 26 '15 at 19:56
48

monitor for slow queries once we go live

because nothing says quality like making your users suffer for a lack of design!

You should know which queries need indexes when you design the tables, you know which columns are being queried on in where clauses and joins. These should be indexed already because what might not be apparent in a live environment may quickly become apparent when the load or data stored increases. What you don't want to be doing when this happens is slapping indexes on every 'slow' query, you'll end up with an index on everything.

gbjbaanb
  • 48,354
  • 6
  • 102
  • 172
  • 10
    Right. Consider indexes as part of the database design. Use indexes to avoid a full table scan for any query which the end-user will commonly be doing in real-time. – A E Feb 24 '15 at 13:29
  • "You should know which queries need indexes when you design the tables" - I do not think so, IMHO you should know which queries need indexes when you design **those queries**. – Doc Brown Feb 24 '15 at 14:08
  • 1
    @DocBrown I'm not so sure, when you design a table you have (or should have) some understanding how its going to be used. A person table will be queried by ID, or possibly surname. If someone starts to access via DoB, address or phone number then you're going to add indexes for every field - and where does that end?! – gbjbaanb Feb 24 '15 at 14:14
  • 4
    @gbjbaanb: it ends when people stop adding features to the product, which might be "never" depending on your methodology. – Steve Jessop Feb 24 '15 at 14:18
  • 1
    @SteveJessop I mean you index according to the primary columns that you want to be accessing. For a person table, you might have a search function (if you forget your username you might search on email for example) but afterwards you always use the ID. So ID is the only one that needs indexing. If you do a lot of searching on other fields you might want an index, this will come out in time, but generally you don't want to index every column just because someone sometime decided to write a non-standard query, but you might utilise a different mechanism for these "one-off" cases. – gbjbaanb Feb 24 '15 at 14:22
  • @gbjbaanb: oh, I see, yes. The way I'd put it is that not all queries need indexes because not all queries need to be fast. This means that the questioner's colleague's suggestion (add indexes for every query) might well be wrong, but I didn't think DocBrown was suggesting that. I thought he was just suggesting that it's fine to use a table in a new way not anticipated when the table was first designed, even if it needs to be fast, and to add that index when the requirement arises. – Steve Jessop Feb 24 '15 at 14:24
  • @SteveJessop cool. I don't think the requirements for DB tables change much without it becoming a different table in all but name. Once you've found the right ID you should keep using it throughout the app, not querying on surname because that's what you have to hand. You should get the ID and query on that instead (for example) but too often people simply add a new index because they didn't write the query to suit the table, they wrote it to suit themselves at the time ... and you end up with indexes on everything. At least that's my take on the question. – gbjbaanb Feb 24 '15 at 14:31
  • 2
    @gbjbaanb: sure, people shouldn't repeatedly look up the *same* surname in a table on account of it being a marginally more convenient handle for them to hold than the proper key for the table. I'd say that's the case whether the table is indexed on surname or not, in fact, since there's something very fishy about a stretch of code that assumes it's all operating on "the same user" but can't quite manage to express this in code by remembering the ID :-) I was imagining cases where the need for reverse-lookup wasn't anticipated until the client mentioned it... – Steve Jessop Feb 24 '15 at 14:34
26

"Premature optimization", in its derogatory sense, means costly optimization that might not be needed. It doesn't mean all optimization implemented before the latest possible point to prevent bankruptcy!

In particular, it's legitimate to optimize based on performance tests before going live, to ensure you can meet some sensible (albeit approximate) requirements for your app to not completely suck.

At a absolute minimum you should load up your database with a plausible amount of test data and check the responsiveness of your app. This isn't premature, since you know it's going to happen, and it will catch any queries that trigger absurdly slow scans. As A E says in a comment:

Use indexes to avoid a full table scan for any query which the end-user will commonly be doing in real-time

At least, for tables that are planned to grow in use.

Then as a shortcut to that, if you have significant experience with the database engine and you've already planned the tests when you write the first cut of the code, then often you'll know without even running it that the query you're writing will be too slow without an index. Of course you're free to pretend you don't know, and watch the test fail before adding the index to make it pass, but there's no reason for known faulty code (because unresponsive) to go live.

Steve Jessop
  • 5,051
  • 20
  • 23
20

I feel this is premature optimisation because our application is not even released yet. I suggested to monitor for slow queries once we go live and then add indices accordingly.

You can't treat your end-users and production environment like quality-assurance. In more words, you're saying that you'll figure it out in production. I don't think that's the right way, and I see that approach go horribly wrong every day.

You need to keep one thing in mind, as you can't paint this with a broad brush.

What is your common workload?

That might sound obvious or dull, but it's significant in practice. If you have 10 queries that make up 98% of your workload (quite common, believe it or not), my recommendation would be a hard analysis before production. With realistic and representative data, make sure those 10 queries are as good as they possibly can be (perfect is a waste of valuable time, and almost not achievable).

For the other 200 queries that make up the 2% of the workload, those are ones that most likely aren't worth a ton of effort, and will make up the corner-case perf troubleshooting oddities in production. That's also a reality, and not a terribly bad thing. But that doesn't mean ignore indexing best practices or make estimated assumptions about data retrieval.

It is common and good practice to figure out database performance prior to production. In fact, there is a relatively common position for this type of thing called a development DBA.

But...

Some take that too far and go crazy adding indexes "just in case". Somebody recommends this is a missing index? Add it, and four other variations. Also a bad idea. You need to not only think about your data retrieval, but what about data modification? The more indexes you have on a table, generally speaking the more overhead you have when you modify data.

Like most things, there is a healthy balance.

As a fun little side note... The pluralization of "Index"

"Indices" are for financial people

"Indexes" are for us

Thomas Stringer
  • 2,237
  • 2
  • 16
  • 19
4

No, it is not premature optimization, but it must be done correctly as any optimization should be.

Here is what I would do:

  1. Load the database with enough test data to mimic a production load. You cannot get this 100% accurate but that is fine: just put enough data in. Does one table have a fixed amount of data? Load it up. Do you have one table that holds a lot of data, e.g. whatever table holds questions on this site? Load a few million records even if just dummy data.
  2. Turn on profiling in your database server.
  3. Bang away at the application using a combination of automated scripts (provides volume) and real users (they know how to break things).
  4. Review the profiling data. Are specific queries slow? Check the explain plans and see if the database server is telling you it wants an index but it does not exist.

Database servers are complex and intelligent pieces of software. They can tell you how to optimize them if you know how to listen.

The keys are to measure performance before and after optimizing and and let the database tell you what it needs.

3

Following proven patterns for known problems (like finding a record by its ID) isn't premature anything. It's just sensible.

That said, indexes aren't always a straightforward business. It's often tough to know during the design phase which indexes your traffic will depend on and which will bottleneck write-operations. So, I'd argue for leveraging some "obvious" schema design best practices (use PK's appropriate for the designed read/write patterns and index FK's); but, don't put an index on anything else until your stress testing demands it.

svidgen
  • 13,414
  • 2
  • 34
  • 60
  • Spending an extra 30 seconds to do something that's almost certain to improve performance and very unlikely to harm it isn't "premature optimization". If 90% of the operations on a table use a particular column as a key, then either indexing it will improve performance, or performance will never be slow enough to matter, and adding code to create the index may take less time than determining whether it's really necessary. – supercat Feb 25 '15 at 00:25
  • @supercat "never" ... Until you start seeing deadlocks in your production environment... – svidgen Feb 25 '15 at 00:53
  • What sorts of realistic scenarios do you envision that would be consistent with 90% of operations using a column as a key, and where adding an index would cause deadlock? – supercat Feb 25 '15 at 01:01
  • @supercat I'm not sure I fully understand your questing. In terms of an active application, almost any increase in the execution time or number of ios has the potential to introduce deadlocks. ... But, more to the point, the presence or absence of an index in *most* applications is negligible until the database reaches a critical size and/or concurrency level. E.g., when all of your indexes no longer fit in memory ... – svidgen Feb 25 '15 at 01:49
  • 1
    The point is, it's tough to know what your query makeup is until typical use cases are run through a stress test (or until you see issues with unexpected user behavior in production). If you have a page that keys off of tablex.fieldy, but it's only hit once for every thousand inserts... The index may result in a net degradation. – svidgen Feb 25 '15 at 01:55
  • That's certainly true, but in that case 0.1% of table operations be using that key, rather than 90%. My point was that in many cases one will know that *at least* 90% of operations will be `select...where id=@@id`, `update...where id=@@id`, etc. If the table never grows beyond 100 items, an index probably isn't going to help much, but if 90% of operations are of the aforementioned style it won't hurt much either. – supercat Feb 25 '15 at 16:18
2

When your application is released, it is too late.

But any proper development process should include performance testing.

Use the results of your performance tests to decide which indexes to add, and verify their effectiveness by repeating the performance tests.

Philipp
  • 23,166
  • 6
  • 61
  • 67
  • When an application is release is really a good time to tweak the indices. Look at this site, stachexchange, you can bet your hat the indices have changed long time after it went live. – LosManos Feb 25 '15 at 20:50
  • @LosManos: Nobody pays to use Stack Exchange. – Lightness Races in Orbit Feb 26 '15 at 18:56
  • @LightnessRacesinOrbit: O contraire, advertisers pay to use Stack Exchange. –  Feb 26 '15 at 20:13
  • @JonofAllTrades: They don't care if we have a few hours of poor performance due to a missing index. My point is that a large, free-to-use community-oriented website with a perpetual distribution cycle is very different from a periodically released, self-contained commercial product. Thus, SE is not a good example. – Lightness Races in Orbit Feb 26 '15 at 21:18
1

Although I don't think every query should be optimized, indexes are so much a part of RDBMS that they need to be given consideration before releasing. When you execute a query, unlike other forms of programming you're not telling the system how to execute it. They develop own plans and almost always base it on the availability of an index. The makeup and volume of data will be considered as well at later times.

Here are some things I would consider:

  1. There are some queries that you should identify in your early development that you just know will be used frequently. Focus on them.
  2. There will be slow queries. By indexing them first, you can then determine if the performance still isn't fast enough and then consider a redesign (Denormalizing may be premature). I'd rather do this before a release. Nobody wants a system where it takes 10 minutes to find something in the inventory.
  3. Indexes may improve query performance but they're not to hinder data modification.
  4. Many systems have tools to analyze your queries, so don't be afraid to use them.

After your initial review, you should follow it up with some considerations for when you should review this again and how you're going to be able to collect the information to do this (monitor usage, get copies of client data , etc.).

I realize you don't want to prematurely optimize, but it's almost certain you will have poor performance without indexing your database. By getting this out of the way, you can determine if there are other areas causing performance issues.

JeffO
  • 36,816
  • 2
  • 57
  • 124
0

It also depends on how many users you expect. You should definitely do some load testing and make sure your database can keep up with 10s to 100s to 1000s of simultaneous requests. Again, it depends on how much traffic you expect, and what areas you expect to be used more than others.

In general, I would fine tune the areas that I expect the user to hit the most first. Then i'd fine tune anything that's slow from the user experience standpoint. Whenever user has to wait for something, they get a bad experience and might get turned down. Not good!

harsimranb
  • 168
  • 6
0

It is a good practice to identify which columns definitely need an index by some upfront analysis. There is a real risk of gradual or unexpected performance degradation in production as database size increases if you have absolutely no indices. The situation you want to avoid is where a commonly run query requires scanning a large number of table rows. It is not premature optimization to add indices to critical columns since you have much of the necessary information available and the potential performance differences are significant (orders of magnitude). There are also situations where the benefit of indices is less clear or more dependent on the data - you can probably defer deciding for some of these cases.

Some questions you need to ask are:

  • What are design limits for the size of each table going to be?

If tables are always going to be small (say < 100 rows), it isn't a disaster if the database has to scan the entire table. It may be beneficial to add an index, but this requires a little more expertise or measurement to determine.

  • How often will each query be run, and what is the required response time?

If the query is run infrequently and doesn't have strict response time requirements (e.g. report generation) and the number of rows is not huge, then it is probably fairly safe to defer adding indices. Again, expertise or measurement can help tell if it's going to be beneficial.

  • Does the query require looking up the table by something aside from the primary key? E.g. filtering by date range, joining on a foreign key?

If these queries are run frequently and touch tables with many rows, then you should seriously consider preemptively adding an index. If you're not sure whether this is the case for a query, you can populate the database with a realistic amount of data, then look at the query plan.

Tim Armstrong
  • 626
  • 4
  • 3