15

This question is about best practices in architecture.

Our Current Architecture

I have a PHP class that accesses MySQL for user info. Let's call it User. User is accessed many times, so we have implemented layers of caching to reduce load.

The first layer is what we call the "per request" cache. After the data has been retrieved from MySQL, we store the data in a private property of User. Any subsequent requests for the data returns the property instead of re-requesting the data from MySQL.

Since the web request lives and dies on a per-request basis, this cache only prevents the application from accessing MySQL more than once in a single request.

Our second layer is Memcached. When the private property is empty, we first check Memcached for the data. If Memcached is empty we query MySQL for the data, update Memcached, and update the private property of User.

The Question

Our application is a game, and sometimes it is imperative that some data be as up-to-date as possible. In the span of about five minutes, a read request for the user data may happen 10 or 11 times; then an update may occur. Subsequent read requests need to be up to date or game mechanics fail.

So, what we've done is implement a piece of code that is executed when a database update happens. This code sets the key in Memcached with the updated data, so all subsequent requests to Memcached are up to date.

Is this optimal? Are there any performance concerns or other "gotchas" we should be aware of when trying to maintain a sort of "living cache" like this?

Stephen
  • 2,200
  • 6
  • 22
  • 24
  • What does this have to do with deleting and re-adding data? – Mike Nakis Dec 29 '11 at 21:17
  • Clarified the question title. – Stephen Dec 30 '11 at 13:55
  • Why not just expire the cached data? Updating it means that you'll need to ensure the update is maintained (so that if new data needs to be updated this way, you'll have to continue to change the update). Expiring the cache means that everything is pulled newly from the database --- and any new updates don't need new changes to the updating code. The downside is that the database load might be higher. – Peter K. Dec 30 '11 at 14:06
  • @Peter Yeah, we thought about that too. If no other problems with our current approach come up, we'll stick with it. Otherwise we may go with what you've described. – Stephen Dec 30 '11 at 14:11
  • 2
    @Stephen The approach you describe is called "Write Through Cache", and is a fairly common approach. – Sripathi Krishnan Jan 06 '12 at 13:57
  • Why is this question so focused on memcached? Isn't that true with any cache technology? – JensG Apr 06 '14 at 16:58

2 Answers2

12

My recommendation is to look at your usage profile and your requirements for the cache.

I can see no reason why you would leave stale data in memcached. I think you have picked the right approach ie: update the DB.

In any case, you're going to need a wrapper on your DB update (which you've done). Your code to update the User in the DB and in-RAM should also do a push to memcached, OR an expiry in memcached.

For example - If your users normally do an update once per session as part of log off, there's not much point updating the data in cache (eg high score total) - you should expire it straight away.

IF however they are going to update the data (eg current game state) and then 0.2 seconds later you'll have an immediate PHP page hit that is going to request the data, you'd want it fresh in the cache.

jasonk
  • 1,693
  • 1
  • 11
  • 9
4

I wouldn't go about it quite like you outlined. What you need to do is decide whether you actually NEED completely up-to-date data. Then, if you do need it, decide which parts of the data need to be up-to-date at all times and separate them from things that can be cached in your architecture.

For example, you probably want to update your user's email address as soon as they change it, so you don't send out mails to the wrong address, but it's unlikely that the user's date of birth or surname is going to need to be completely up-to-date to provide a decent user experience. (N.B. I'm not using a game-architecture example as I don't know what kind of game to aim it at, and I think this one is fairly easy to understand).

This way you have two clear sets of data: short- and long-term cacheable data. You can probably get away with a cache duration of a minute or so on the short-term data, just to relieve load on the DB, but the long-term data can be left in the cache on a sliding duration for as long as it's used.

Then you need to deal with updates. I'd first look at using a DB trigger to simply remove items from the cache once they're out of date. That will force your business layer to trigger a cache refresh on the next time it requests the data, freeing up some space in the cache if the data isn't being used (for example if a user changes their email address then logs out immediately). If this is going to cause performance issues in the UI (i.e. introduce too much lag while waiting for cache refreshes) then you can look at simply triggering the cache call once the item is removed from the cache. I'd also look at optimising the DB read times for this small set of data, to ensure that any lag induced in refreshing the cache is minimal (this should be easier as you only need to load data you really need).

What I would not do, in any circumstance, is add an additional method of filling the cache, as then you will need to maintain the call (and API hooks etc.) in two places.

As for gotchas, the main thing you need to be careful of if you're writing directly to the cache is synchronisation. If many threads try to read while you're doing your silent update you might have some serious invalid data issues, which will defeat the point of trying to keep the data up-to-date in the first place.

Ed James
  • 3,489
  • 3
  • 22
  • 33