-1

One of my web services hits a few hundred thousand visits per day where most of the operations are CRUD and we are thinking to get a cache sitting in front of our SQL database.

I have had implemented cache before in some other projects using Redis. Basically, before I touch the database, I see if there's a cached version of the requested resource (one or a collection of models), otherwise, get data from the database and cache it. This pattern is known as the Cache-aside pattern.

Cache-aside pattern

(Picture from https://docs.microsoft.com/en-us/azure/architecture/patterns/cache-aside)

Here are some of the entities that need to be fetched from the database in every HTTP request:

  • The logged-in user data.
  • Dependencies of the user (preferences, profile picture... etc).

Those entities need to be cached to reduce database queries. But, I have imagined some scenarios that made me ask this question:

Should I make all my business logic depend on the data that comes from the cache?

  • An example scenario: When the user wants to log in, should I compare their credentials directly from the database or I simply fetch them from the cache (a cache miss can lead to reading from the DB)?

My concern here is that sometimes the cache is out of sync with the database which may lead to severe issues.

parse
  • 107
  • 4
  • 4
    "My concern here is that sometimes the cache is out of sync with the database which may lead to severe issues." Only you know what those severe issues might be. You need to perform risk analysis to know whether you can afford to deal with stale data. – Vincent Savard Oct 28 '21 at 12:43
  • 1
    I think Vincent is right - the implicit (wrong) assumption in this question is that this can be answered in general, without investing any thoughts into the specific processes where the data is used. – Doc Brown Oct 28 '21 at 12:57
  • @VincentSavard performing risk analysis seems a wise thing to consider! thank you – parse Oct 28 '21 at 17:09

2 Answers2

2

You should rely on cache for business logic when it makes sense. For Login I would say don't cache credentials and just hit the DB. For showing comments, being potentially stale isn't a big deal. It depends on how bad the out of sync data potential is for a process.

Ryathal
  • 13,317
  • 1
  • 33
  • 48
2

I'll split my answer based on the different needs. Based on what you have described, you are hitting the database for all information about a user multiple times. There are a few approaches on how to deal with that, without resorting to Redis.

#1: Security

Never use cached credentials to authenticate (confirm a user is who they claim to be). However, if you can delegate to some other service than a database that is designed to handle credentials, it would be wise to explore that. Some common options include:

  • LDAPS authentication. It works well if you are developing an enterprise application and the user doesn't have to remember another set of credentials
  • OAuth2 or OpenID providers. Works well for apps on the internet, and users can leverage their Facebook, Github, or Google identities to authenticate
  • PKI. Really only works if the whole enterprise is built around this concept. The client certificate can only be presented to your system if it is valid, which means you don't even need a login form.
  • IDAM server. Depending on your environment, you may need to use an IDentity And Management (IDAM) server to authenticate your users.

Using the external systems to authenticate still requires you to store the user's permissions to your application. That needs to be read in to ensure the user doesn't access things they are not allowed to.

#2: User Information

The ideal here is to only read the user's information once per session. A good way that my team has approached this is to use JWT tokens. The JWT token can be built in a way that it can be validated to ensure it has not been tampered with and has not expired. If you load that token with the information you need to access regularly, you don't have to hit your database or a Redis server. That token needs to be sent with every request.

We generate the token when the user is authenticated, and return it to the front end code. The front end then sends that token to the servers. When the token expires, the front end can either require re-authentication or use a token exchange concept. I.e. we would return 2 tokens, a session token with the current user information and a refresh token to exchange for a new session token.

The main concern here is to ensure your information is current enough. If the front end has to do a token exchange every 5 minutes, you only have that period of time that a user would have those old permissions.

#3: General guidance about business logic and caching

Ideally, cache access, management, and such really should be the responsibility of any "repositories" you have. The business logic will get overly complex if it has to know to retrieve information from the cache or the source database. If you keep that behind a Repository object, it can handle the cache access, invalidation, and management policies so you have only one place to fix things.

Berin Loritsch
  • 45,784
  • 7
  • 87
  • 160
  • "Never use cached credentials to authenticate" is something I needed to hear. We can't delegate auth to other services. We have social auth enabled, but still, we store all data in our db. The JWT won't meet our needs too. We have implemented the repository pattern, but not sure if it's a good practice not to separate it with cache repositories so the developers know where the data is coming from. – parse Oct 28 '21 at 17:15
  • https://softwareengineering.stackexchange.com/a/355398/347725 "Does caching belong in a repository class?" – parse Oct 28 '21 at 17:30