7

I couldn't find a similar enough thread, so i'm making a new one. My question is about reducing database writes. Context is that I have an application which increments a number in a database every time a user sends a message (which is more than hundreds of messages every second, peaking thousands at times) and i'm concerned that the database may start having troubles coping and will reach bottlenecks. Saving reads is fairly easy, check caching solution (redis/program cache), if does not exist, hit up database and cache that data with a TTL. But my question is about saving writes.

It's quite a tricky task as I want to keep the cache updated as well as the database. The database would not need to be updated 100% of the time as every cluster uses a shared Redis database. Even if the program crashes, a little bit of data loss from not saving to the database would not be a huge issue.

I was thinking of something like this (assuming the data that needs to be processed is already cached and needs to be updated):

  • Request comes in
  • Cached copy of data is found in Redis and returned to cluster
  • Cached Redis copy of data is updated with new data
  • Updated data gets put in a "queue" which runs in an interval, creating a transaction/pipeline, updating the database with every data updated in the created transaction.

I'm unsure if this would be the best strategy, if anyone has anything better, i'd like to see it and I would also like an opinion on this strategy.

Thank you.

Running on MySQL 8 (thinking of switching to MongoDB) Running on Redis (always kept updated) Running on NodeJS (this is not the main scope of the question) Running on Ubuntu 18.04

The counter is used to count the users messages for the purpose of showing it to the other users, as well as checking if the user has sent a specific amount of messages for some internal processing.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Epic Speedy
  • 251
  • 1
  • 4
  • 7
    Obvious fix. Drop the counter. If it is to give out unique identifiers then why not use a GUID? If it is some form of accounting, why not have this done offline? Use a sorted/sharded queue to distribute the load, and relevant data perform the bulk of the counting in a single pass and update the system as a bulk operation. – Kain0_0 Aug 31 '20 at 00:09
  • 2
    Please tell us exactly what the counter is required for. – Doc Brown Aug 31 '20 at 05:41
  • 1
    Depends on what is causing those many writes.Writing a good caching solution is challenging but can improve performance considerably as data is stored in memory, not on disk.you will need to do some additional data analysis on your database and data model. – AjayGohil Aug 31 '20 at 10:11
  • 1
    This sounds interesting, but nethertheless I am voting to close with the "needs details or clarity" reason until the OP edits the question and tells us what the counter is used for. – Doc Brown Aug 31 '20 at 11:51
  • I think that the key element to understand if and how this become feasible is: do you need to persist the counter value in a reliable way? don't you mind if sometimes the count is not accurate? – Carmine Ingaldi Aug 31 '20 at 16:12
  • Did you measure if the naive method of just updating a simple counter is insufficient? – Lie Ryan Aug 31 '20 at 17:05
  • Are you sure there really is a problem? The database should be able to aggregate multiple writes. – George Barwood Aug 31 '20 at 19:05

3 Answers3

4

Assuming that you really need this counter, does it really have to be constantly updated every time a user sends a message? Could you use a more "granular" counter which increments every time the user sends 10 messages instead? If so, you just reduced write load by 90%!

Another idea would be to keep the counters in memory and write all the updated ones to persistent storage maybe every minute or so. If a server crashes, you lose an average of 30 seconds' worth of updates.

You could even combine both of these ideas to increase your throughput even more.

One more point to think about: Is this really a bottleneck, or are you just imagining that it might be one day? You might burn a lot of engineering effort chasing after scalability, only to find that your product doesn't succeed and you never have many users anyways. Or if the product does succeed, you may find that by the time your user base grows big, the product may have changed and that counter might not be needed any more.

Alex D
  • 1,308
  • 9
  • 14
  • Updating every x messages is not so great, if x were 100 and the client stops sending at 397, other users would continue to see 300. Every 5 or 10 seconds and only if there's a change seems better. – Martin Maat Aug 31 '20 at 21:26
  • @MartinMaat: true, but it really just depends on the details of the application. Even if you set "x" to 2, you have just reduced the number of writes by 50%. Aside from that, increasing the granularity of operations is a basic technique for reducing contention/coordination overhead, which readers of this thread might benefit from learning. (For example, grabbing items from a work queue 10 at a time instead of 1 at a time.) – Alex D Sep 01 '20 at 05:45
  • 1
    A variant of only writing every nth event is to write upon an event with probability 1/n. This has the same result on average but is highly tunable. Can also scale that n depending on the current value of the counter. – amon Sep 01 '20 at 09:37
3

I'd recommend to first start by doing the simplest thing of just incrementing a simple counter. For most applications, this is probably going to be sufficient and simple to implement.

If you really need to scale this though, the number of database writes are not the issue. The problem with implementing a shared counter is write contention. Every time you increment a shared counter, you have to take a lock on the current value, read the value, write an updated value, and then release the lock. The lock here means that you're forcing all operations that touches the counter to be serialised.

If you want to scale out a shared counter, you need to convert that cycle into non-contented operations that can run in parallel. For example, instead of updating a count field, you can instead replace it with inserting a new row on a table. You can then replace the task of updating a shared counter with a task of counting the number of rows on a table. Doing this also opens up optimisation possibilities when you need to run a distributed database, as replicas will only need to sync up inserted rows instead of having to serialise updating of a single field.

Counting rows in a table is an O(n) operation, so at first glance this actually looks like a slower operation than simply doing a shared counter, but the crucial bit here is that counting rows doesn't cause read/write contention, and there are ways to optimise the counting operation. If you're okay with the count being slightly off from time to time, you can can cache the count so that you don't need to do a full recount everytime, or in some databases it might be possible to do an estimated row count, which is much faster than doing an exact count.

If you needed exact count all the time, you can have a background job occasionally clear out (or mark) old rows from the table and store the aggregated semitotal somewhere else. So when you do a full recount, what you do is add the semitotal to the number of new rows still in the table.

Lie Ryan
  • 12,291
  • 1
  • 30
  • 41
1

Batch, batch, BATCH! - that is the answer.

Either accumulate updates within certain reasonable time window, or when it goes up by reasonably long step.

Make sure you don't send data that haven't changed. Therefore, you must compute some reasonable diff, which in your case is just a numeric difference between actual counter and what it was at last sync with your DB.

P.S. Avoid it if possible. P.P.S. Measure, don't trust your intuition; SQL servers can easily handle the load you've mentioned even on comparatively cheap hardware.

Zazaeil
  • 345
  • 1
  • 8