69

I understand that a each service in a microservice architecture should have its own database. However, by having its own database, does it actually mean simply having another database within the same database instance or literally having another database instance?

By this, I don't mean sharing of databases, which is a no-no, but rather the database instance.

For example, if I were using AWS and have 3 services, do I create 3 databases for each service on a single RDS instance or do I create 3 RDS instances each containing a database which is used independently by each of the 3 services?

If using multiple databases on a single RDS instance is a better idea, will it defeat the purpose of having independent services because for:

  1. The RDS instance's resource will be shared amongst services. Will Service A which may have heavy database usage at a particular time impact Service B which uses a different database but on the same RDS instance?

  2. All services will be dependent on the database version on that RDS instance.

xenon
  • 887
  • 1
  • 6
  • 9
  • 8
    It is whatever best meets your specific requireements. – Robert Harvey Jun 19 '18 at 14:28
  • 1
    I'm not sure I would call myself an expert in 'microservices' but you could have any manner of setups and dbs. You could have a db that is read by one service and written to by another. Or alternatively you could only have 1 db (or less technically) for the whole system. – Mark Rogers Jun 19 '18 at 14:48
  • Here's a good read on the matter: https://plainoldobjects.com/2015/09/02/does-each-microservice-really-need-its-own-database-2/ – RandomUs1r Jun 19 '18 at 19:38
  • Read about 'Single Responsibility Principle'. Have you thought about implementing a 'database microservice' that other microservices use? – ChuckCottrill Jun 20 '18 at 00:18

6 Answers6

80

Assumed you have some services which can use the same kind of DB system and version, if you use different database or db instances is a decision you should not need to make at design time. Instead, you should be able to make the decision at deployment time, something you can simply configure. Design your services to be agnostic of the place where other services' databases are hosted.

During operation, you can start with one instance, and if the system works fine, leave it that way. However, if you notice this does not scale well for your system, because different databases on one instance share too many resources, you have always the option to use different instances, if that helps.

So a service does not violate the microservice architecture just because you let two of them share some resource - it violates it when sharing the resource becomes mandatory.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • This kind of sounds like a premature optimization. What if the resources consumed never merit extra instances? Then you've wasted time building in the flexibility – reggaeguitar Jun 19 '18 at 18:34
  • 5
    @reggaeguitar: costs for this should normally be negligible - in fact, for a microservice architecture, it may be more effort in trying to centralize the database configuration between different services than keeping the db location for each service individually configurable. Moreover, the whole point of a microservice architecture is high scalability, if one does not need that, one should not make a decision for microservices in first place. – Doc Brown Jun 19 '18 at 19:04
  • 1
    @DocBrown That makes sense, thanks for the response! – reggaeguitar Jun 19 '18 at 21:35
  • 2
    @DocBrown I disagree that scalability is the only reason for considering microservices. If 10-15 developers are committing to same large code-base, then merging, building, test and deploying steps will take far more time than having one or two developers doing the same for each service. Feature rollouts are so much faster with microservices. Supporting outages are easier (as only a part of application is not working). We've been doing this for last two years. Yes, it adds complexity to the overall, and each service needs to handle network assumptions, but managing releases is so much easier. – narendra-choudhary Dec 24 '19 at 00:33
  • 1
    @DocBrown In case of DB, schema tend to involve. We add tables, alter existing ones, add constraints, drop some as well. Someone added ACLs to control access.Few years down the line, tables are interlinked so badly, developers are scared to release frequently out of fear that something will break. And, this is how the big-fat database problem starts. Software evolves. Maintenance becomes difficult, and that is another reason micro-services approach of not having shared databases is worth discussing. – narendra-choudhary Dec 24 '19 at 00:43
  • Not suggesting micro-services architecture is a silver bullet (if there is one at all), but there are reasons other than scalability for considering micro-services. – narendra-choudhary Dec 24 '19 at 00:45
32

It really depends on your scalability requirements, and how/if your microservice instances need to cooperate to provide a single result. It helps to know what the trade-offs are:

Keeping it all in one database

  • Easier configuration
  • No coordination or communication with other instances of your service needed
  • Easier to discover your full dataset
  • System performance limited by database performance

Keeping the databases separate

  • The full answer for a request may be spread across microservice instances
  • In that case you have increased communication and negotiation to resolve the request
  • Handling data when you loose that microservice node (even when the database is still up, you can't get at it until a new one with the right configuration is stood back up)
  • Increased configuration complexity

What's the problem you are solving?

In some cases, you are only worried about ephemeral data. If the database goes down, it's no big issue. In those cases you might not even need a database to begin with. Just keep it all in memory and make things blazingly fast. This is the easiest solution to work with.

In other cases, you need the data integrity, but your database is capable of expanding it's capacity based on the number of nodes it has. In this case, a single database is probably more than sufficient, and managing it's responsiveness independently is the right answer.

There are a number of cases in between. For example, you might have databases that are regionally specific, so for each instance of your service in a different region you have a separate database. Typically sharding databases don't do well across regions, so this is a way to localize the data a bit and control coordination yourself.

Doctrine and Reality

I've read a number of articles about microservices and how modular they should be. The recommendations range from keeping the front end, microservice, and data tier as a whole unit to sharing database and/or front-end code for all instances. Usually, more isolation provides the greatest scalability, but it comes at the cost of increased complexity.

If your microservice is calculation heavy, it makes sense to allow the number of those microservices scale as needed--sharing the database or even front end code doesn't hurt or hinder this approach.

The reality is that the specific needs of your project are going to need a different set of compromises to get work done in a timely fashion and handle the system load you are measuring (plus a little more). Consider the fully isolated front-end, microsrervice, and data tier trio to be the lofty goal. The more demand on your system, the closer to that goal you will likely need to be. We aren't all [insert name of highly successful web entity here], and they didn't start out where they are now. Sometimes you just need to start out with a less than perfect situation, and be happy with that.

Berin Loritsch
  • 45,784
  • 7
  • 87
  • 160
13

It doesn't matter.

The only scenario where it could theoretically matter is if one service needs to migrate to a different versions of the database. But even then, there's no real difference between having separate instances from the start versus migrating that one service from a shared instance to a separate one. I'd actually say that having separate instances only because of this scenario is an example of YAGNI.

Michael Borgwardt
  • 51,037
  • 13
  • 124
  • 176
  • 2
    Assuming if a particular service has a heavy usage on a single RDS instance, will it end up eating up the resources on that instance and affect the other services using that same RDS instance? – xenon Jun 19 '18 at 10:17
  • 1
    @xenon: yes, but that is a reason to think about improving RDS performance via tuning, better hardware or clustering, not about changing your system architecture - if that service is leaving capacity for the other services, then it will soon run out of capacity all by itself. Though I guess you could have special requirements that an overloaded service must not affect others. Some RDS may in fact still allow that on a single instance by defining resource caps on a user basis. – Michael Borgwardt Jun 19 '18 at 10:24
  • the scenario where it matters is when the microservice _instance_ has its own state. Then it should be deployed with its own _instance_ db, which may also be a performance bottleneck – Ewan Jun 19 '18 at 12:16
3

An RDS instance is a single box. If you have multiple databases on a single instance then they share the CPU/Memory etc.

If your microservice performance is bound by its database performance: then deploying multiple copies of the microservice, each using a different database, but with each database on the same RDS instance. Is pointless* (except for failover). Your microservice cluster will run at the same speed as a single microservice

However, I would say that a microservice which is bound by database performance is unusual.

Usually your microservice will get data from a db, perform some logic and write some info back to the database. The performance bottleneck is the logic, not the select and/or insert.

In this case you can simply share the same database across all your microservice instances

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • I have to question your assertion that the logic is the bottleneck, not the database. In my experience, the *most likely* place to find performance improvements is with the database. – RubberDuck Jun 19 '18 at 13:21
  • hmm yes, but surely those performance improvements are achieved by moving logic _out_ of the db and into the service. Once you have done that, _THEN_ logic is the bottleneck – Ewan Jun 19 '18 at 13:26
  • 1
    Typically, no. Those improvements come from tuning indexes & queries. – RubberDuck Jun 19 '18 at 15:56
  • well, that would fall under the unusual case in my experience. Not that there isnt typically room for those improvements, but that after having removed any really bad stuff the database is still the limiting factor. – Ewan Jun 19 '18 at 16:03
3

I think it might help to be a bit more theoretical here. One of the motivating ideas behind microservices is shared-nothing, message passing processes. A microservice is like an actor in the Actor model. This means each process maintains its own local state and the only way for one process to access the state of another is by sending messages (and even then the other process can respond however it likes to those messages). What is meant by "every microservice has its own database" is really that the state of a process (i.e. microservice) is local and private. To a large extent, this suggests that the "database" should be collocated with the microservice, i.e. the "database" should be stored and execute on the same logical node as the microservice. Different "instances" of the microservice are separate processes and thus should each have their own "database".

A global database or a database shared between microservices or even instances of a microservice would, from this perspective, would constitute shared state. The "appropriate" way to handle this from the microservices perspective is to have the shared database mediated by a "database" microservice. Other microservices that wanted to know about the contents of the database would send messages to that "database microservice". This typically won't eliminate the need for local state (i.e. per microservice instance "databases") for the original microservices! What changes is what that local state represents. Instead of storing "User Sally is an admin", it would store "The database microservice said 'User Sally is an admin' five minutes ago". In other words, beyond any state it controls completely, it would store its beliefs about the state of other microservices.

The benefit of this is each microservice is self-contained. This makes a microservice an atomic unit of failure. You (mostly) don't have to worry about a microservice in some partially functional state. Of course, the problem has been moved to the network of microservices. A microservice may be fail to be able to perform the desired function due to being unable to contact other microservices. The benefit, though, is that the microservice will be in a well-defined state and may well be able to offer degraded or limited service, e.g. by working off out-dated beliefs. The downside is that it is very difficult to get a consistent snapshot of the system as a whole, and there can be quite a lot of (undesired) redundancy and duplication.

Of course, the suggestion isn't to stick an instance of Oracle into every Docker container. First, not every microservice needs a "database". Some processes don't need any persistent state to work correctly. For example, a microservice that translates between two protocols doesn't necessarily need any persistent state. For when persistent state is needed, the word "database" is just a word for "persistent state". It can be a file with JSON in it or a Sqlite database or a locally running copy of Oracle if you want or any other means of locally persistently storing data. If the "database" isn't local, then from a pure microservices perspective, it should be treated like a separate (micro)service. To this end, it never makes sense to have an RDS instance be the "database" for a microservice. Again, the perspective is not "a bunch of microservices with their own RDS databases" but "a bunch of microservices that communicate with RDS databases". At this point it makes no difference whether the data is stored in the same database instance or not.

Pragmatically, a microservices architecture adds a huge amount of complexity. This complexity is just the price of seriously dealing with partial failure. For many, it is overkill that is quite possibly not worth the benefits. You should feel free to architect your system in whatever way seems most beneficial. There's a good chance that concerns about simplicity and efficiency can lead to deviations from a pure microservices architecture. The cost will be extra coupling which introduces its own complexities such as invisible interactions between services and restrictions on your freedom to deploy and scale as you please.

Derek Elkins left SE
  • 6,591
  • 2
  • 13
  • 21
  • "due to being unable to contact other microservices." - I thought Microservices should never contact other microservices? – Marc Aug 10 '19 at 10:44
1

The goal of keeping a database private to a service is encapsulation. Your microservice is a black box that other services in the system will use via a public interface.

There are two planes on which this encapsulation operates:

  • The first is logical, at the application level. Your service owns some business objects in your system, and it needs to persist state about these objects. That some particular database backs these business objects is just implementation detail. By keeping a separate database, you prevent other services from having backdoor access to your implementation, forcing them to use your public interface instead. The goal here is clean architecture and disciplined programming. Where exactly the database lives is irrelevant on this level, as long as your service has the right connection details so that it can find it.

  • The second level is operational. Even if your design is a perfect black box, as you point out, different work colocated on a single machine may compete for resources. This is a good reason to put separate logical databases on separate machines. As other answers have noted, if your needs are not very demanding and your budget is tight, this is a pragmatic argument for colocation on a single machine. However, as always, tradeoffs: this setup may require more babysitting as your system grows. If budget allows, I nearly always prefer two separate small machines to run two tasks versus sharing one larger machine.

ben author
  • 119
  • 3