13

I am currently researching approaches for moving our application to Docker containers and stumbled upon a question to which I could not find a clear answer.

Our application has several separate databases that are currently hosted in one database server. When moving to Docker should we keep the architecture similar (i.e. one container with all databases) or should we use one container per database?

The latter approach seems more "docker" to me. Similarly to not hosting 2 applications in one container, it seems to make sense to also not host 2 databases in one container.

Are there any established best practices? Does it depend on the parameters of the databases in question (size, access frequency, etc.) or the used database server (SQL server, PostgreSQL, etc.)?

As far as I can tell the "container per DB" approach gives more flexibility (e.g. enforce memory limit per DB) at the cost of more overhead (i.e. the database server overhead is incurred once per database instead of just once in total). Are there any other advantages/disadvantages I should consider?

geocodezip
  • 378
  • 3
  • 5
  • 10
chrischu
  • 238
  • 2
  • 5
  • 2
    One container per database and connect them through docker networks if needed. You could wrap the whole logic into a single docker-compose file to ease the process. – Milan Velebit Feb 28 '20 at 12:50

5 Answers5

8

Last time I checked it is not recommend to run databases in docker.

Simply put docker is designed to be a stateless container that you can spin up and take down as required. Where as Databases are very state-full indeed!

With a naive docker database approach you would lose all your data if the container crashed. If you span up a new instance you would get a blank database.

This might be ideal for development environments, but its very bad in production.

Now you can do some clever stuff with volumes, but you really have to ask yourself why you are attempting this thing. Databases are generally very mature products, with various backup, fail-over and high availability options built in. Generally you don't want to run them in containers as they already have the concept of containers built in.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • 1
    The concept of Volumes and data containers has increased in formality and reliability these days. That's one of the mechanisms Kubernetes injects files form Config Maps into a container. Anything that persists data you want later does require mounting a volume. That combined with the way databases are being designed now are much more friendly to containers. However, EBS style volumes are required. – Berin Loritsch Feb 28 '20 at 16:42
  • 18
    There is a difference between hosting the database engine on docker (which should be okay) and hosting the database _data_ on docker (which you shouldn't ever do). A database _engine_ should be fine to be dockerized as long as you keep the actual database data (the .mdf file on SQL Server, for example) on a mounted volume. – T. Sar Feb 28 '20 at 16:43
  • 2
    just as long as you never spin up two looking at the same vol eh? in my view though, even if you can get it to work you havent gained anything. Its no longer containerised – Ewan Feb 28 '20 at 16:47
  • What happens if you try to start two processes looking at the same .mdf file? – JimmyJames Feb 28 '20 at 17:37
  • 1
    @JimmyJames Bad things, I suppose. As far as I remember, SQL Server will put a filelock on it, but I've seem cases online where people managed to do eldritch things and attach two engines to the same file, usually causing one of the instances to not work at all with it. – T. Sar Feb 28 '20 at 18:07
  • Sure you can. Where else would you run them? – Thorbjørn Ravn Andersen Feb 28 '20 at 19:10
  • 2
    "Last time I checked" When did you check? I've never heard this. Maybe this could help your understanding: https://devops.stackexchange.com/questions/1293/what-are-the-reasons-docker-should-not-be-used-for-databases – GammaGames Feb 28 '20 at 21:35
  • 1
    @gamma we did a few investigations and had discussions around this a few years back. things maybhave changed. However the answer you link to agrees with me. NOT recommended for production – Ewan Feb 28 '20 at 21:48
  • 1
    If it's Kubernetes then it will spin up EBS volume together with new database engine slave container and all you need to manage is Kubernetes cluster so it has clear benefits. – ElmoVanKielmo Feb 29 '20 at 01:07
  • 1
    @Ewan that's not what I got from the current top answer... It seems to be pros and cons, but says "wish as I might, I cannot find a technical reason not to run a database in a Docker" from which I don't get "NOT recommended for producution". (I don't know if it's a good idea to dockerize, this is only about the link) – Mark Feb 29 '20 at 20:47
  • "On the production, it will come down to taste, and there at least, I would also prefer the solution that sits best with the specialized DBA/Ops" This answer waxes lyrical about how amazing docker is and then falls short of a recomendation for production? Thats a big "nope" in my eyes – Ewan Feb 29 '20 at 22:51
  • 1
    Databases get all the usual benefits from containers: simple reproducibility, simple sandboxing, and easy resource quotas. Stateless services work especially well in containers, but stateful services get the same benefits. – gntskn Feb 29 '20 at 23:17
  • @gammagames none of which amounts to a recommendation. If you are so sure, why dont you just link to the offical docker recomendation for databases? – Ewan Mar 02 '20 at 15:40
  • 1
    @Ewan I do not share your view of the answer. It doesn't agree with you, it actually says "On the production, it will come down to taste" and does recommend using them if you don't have "decades of experience working bare metal DB servers". As for the cons: not necessary if you have a dedicated db server (not relevant), if you aren't using a volume you risk data loss (goes against all advice), and trying to use multiple containers with one shared database isn't recommended (solved issue). Surely the amount of official images would indicate their support https://docs.docker.com/samples/ – GammaGames Mar 02 '20 at 15:57
  • @GammaGames you are reading stuff that just isn't there. No DBA is going to want to use docker. The docker images all have the data in the image, they are just for dev – Ewan Mar 02 '20 at 16:06
  • 1
    @Ewan "No DBA is going to want to use docker" [Citation needed]. Your response of "The docker images all have the data in the image" is an indication that you don't really understand how mounted volumes work. I would suggest reading up on it a bit, it's an important concept to really understanding containers ;) – GammaGames Mar 02 '20 at 16:17
  • @GammaGames clearly you didn't need a citation for the linked answer's "some dbas might recommend this" Several people here have mentioned that its not recommended to keep your data in the container for obvious reasons. No-one disagrees with that, whats missing is a solid official recommendation from docker or a DB vendor _for_ using docker with exact instructions on how to make all the complex production DB stuff work for specific databases. "You can use mounted volumes!" just doesn't cut it. – Ewan Mar 02 '20 at 16:27
  • 1
    @Ewan "No-one disagrees with that" You were just disagreeing with that in the comment I was directly replying to. I am done here. If you feel the need to learn more you can look it up yourself, there are many examples of official documentation that discuss and walk through deployment with docker. I can only explain so much when you are so willfully against it, I cannot understand it for you. – GammaGames Mar 02 '20 at 17:21
5

Containers are ultimately just small wrappers around processes (not machines!) and it is helpful to think about them in terms of that. In this case, each database has its own long-lived master process, and so each probably deserves its own container. This would also help scale to tools like Kubernetes in the future, where the containers could be transparently distributed across a cluster.

Using multiple processes in a container is fine and normal, of course, but usually one process will control the others in the same container. For example, a web server may spawn multiple worker processes, but the root process for the container is responsible for its children.

A corollary is that if you do add multiple database servers to one container, then you will likely have to add extra logic to manage the many master processes. For example, if one Postgres instance dies, you’ll need some way to restart just that one instance. If each database master process has its own container, then Kubernetes or Docker can manage this for free.

Edit: After your clarifying comment, I see that you are referring to database data (the “database” within the application, not the database application!) I think the above reasoning is still helpful framing: a container is just a process separation, and it’s orthogonal from your other storage and partitioning concerns. Any reasons you have for or against using multiple processes for multiple databases apply the same with containers.

I will note that the database data itself should definitely be placed in a volume (and most database images will probably already declare that volume in their Dockerfile.)

gntskn
  • 274
  • 1
  • 5
  • 1
    Maybe there is a misunderstanding: adding multiple database *servers* to a single container is not what I was asking. I was asking about multiple databases on one server in the container. – chrischu Feb 28 '20 at 13:53
  • 2
    @chrischu This answer is pretty well aligned to what I would recommend. You can run a server inside a container, but what you really want to do is make that distinction disappear. In other words the container is the server. – JimmyJames Feb 28 '20 at 15:08
  • 7
    Use a mounted volume for the data though. Otherwise if the container dies, so does your data. – Berin Loritsch Feb 28 '20 at 16:44
  • 5
    _Please_ add to this answer that the database **data** should not be on the docker itself but instead on a mounted volume. It is a very important detail that, if missed, has the potential to end up in pain and tears. – T. Sar Feb 28 '20 at 16:46
3

The database normally use schemas to logically separate unrelated things apart.

I would suggest that you consider moving each schema in its own docker instance with its own persistence volume(s).

Also be aware that Kubernetes may kill the pod without much notice. You need to configure your database accordingly.

0

In answer to the question about "having multiple databases on one server in the container" I would say:

  • As previously mentioned putting a DB in a container is not advised
  • Containers are not persistent, they can be taken down or replicated at any time which creates its own issues
  • A Database per service, micro-service, is desired and helps with encapsulation and security which makes then more modular

So, implement the DB's outside the container environment with One Container/Service talking to One Database

C J
  • 9
  • 2
0

From my (granted limited) experience I would examine the setup from four different perspectives:

  • Data (lifetime/priority)

  • Scale

  • Risk

  • Performance

Example: Web shop

Let's assume we have a simple web shop with an inventory (database A), some shopping carts (database B) and a list of orders (database C).

Let's start with the most important data: The list of orders. Here, risk aversion is key, as losing or corrupting this data could potentially ruin the business. At the same time, this might contain PII (personally identifiable information, like addresses or payment data), which also need to be protected.

For this, I'd be conservative and go with a more traditional database server setup, as even with docker volumes a dockerized database server cannot fully control how data is actually stored to disk (too many layers of virtualization involved). Don't introduce any unnecessary risk for vital data storage!

Next up, let's have a look at the shopping carts. The basic concept dictates that this data is volatile; shoppers will add and remove items much more often than finalizing an order, i.e. there will be a lot of insertions, updates and removals. Also, while it shouldn't be a regular occurrence, a loss of this kind of data is at worst a slight annoyance for the user.

Here, I'd pay much more attention to the performance and scale: Since most database operations happen here and the risk of accidental loss of data isn't quite as high, I might be tempted to use dockerized database containers for load balancing purposes, especially at a large enough scale. (This depends a lot on actual requirements, so take this with a grain of salt!)

Finally, the inventory. If it's pretty much just static data that gets updated infrequently (a simple price list), then using dockerized caches that pull from a centralized storage might be interesting if performance becomes an issue. (Of course, the original source for the caches itself would need to be kept in a risk-avoiding manner, similar to orders, for liability reasons.)

However, if the inventory also has to keep track of available stock or is much more dynamic (e.g. real-time stock prices, user content, ...), suddenly scale also becomes a much larger issue. At that point, I'd look towards a traditional database server (or maybe a cluster of them).

TL;DR

There are a lot of factors involved, so there is no easy answer. Generally, I tend to prefer a traditional database server setup, unless the data is either short-lived and has low reliability requirements or pretty much static (caches).

hoffmale
  • 777
  • 5
  • 10