2

I'm trying to understand containerized apps and databases and I'm trying to understand the microservice architecture using kubernetes. One thing that I couldn't get my mind convinced is the database part.

For example I have a Venues database and a Venues REST API application in a container, which also has the database inside the same container.

enter image description here

When the first instance crashes for some reason, kubernetes launches a second instance. Or, there may be more than one of the same service running in different instances. Or they might be connecting another DB service all together.

The thing that I couldn't understand, when the service which contains the DB crashes, the newly created container will have an empty database right? How is this handled? If replication is the case, what about memory and disk usages?

Please someone clarify this with different approaches. Or is there something previously asked here, or somewhere else, please direct me to the right direction.

Thanks in advance.

tpaksu
  • 249
  • 2
  • 7

1 Answers1

4

While a database server may run in a container, its storage needs to be an external resource. For example, when running Docker manually, you would mount a volume from the host into the Docker container for the server process to use.

That external storage survives the server process, whether that process is containerized or not. Restarting the process (but with the same storage) will allow the database to recover, if configured appropriately.

In a data center setting, the storage is not part of the same machine that the software runs on. Instead, you might have a dedicated disk array that is connected over a storage area network (SAN). In a cloud setting, you would typically rent a block device or virtual drive to store the persistent database data on.

amon
  • 132,749
  • 27
  • 279
  • 375
  • Then you would need a different setup for the database safety outside of kubernetes right? – tpaksu May 07 '19 at 05:37
  • @tpaksu what do you mean exactly? Kubernetes only manages resources, but does not provide “database safety” by itself. If you want persistent databases you must provide some persistent storage to Kubernetes to manage. – amon May 07 '19 at 09:16
  • I mean kubernetes has a disaster recovery by duplicating pods, if I put the DB resource outside kubernetes pods and then use reference to that resource files from the database engine, I'll need a replication mechanism or something else to secure the resource files. Am I right? – tpaksu May 07 '19 at 09:59
  • @tpaksu if you’re running k8s in a cloud, like Azure or AWS, consider using one of the cloud provider’s database solutions outside of k8s and configuring your pod to connect to that. They’ll have options that make replicating the db for availability easy. – RubberDuck May 07 '19 at 10:02
  • @RubberDuck hi, I'm currently in the learning phase, trying to understand the concepts first. I'm playing with minikube and docker locally with virtualbox images. I know about the 3rd party data solutions will work smoothly, I'm just trying to learn what if I decided to do run things locally, what the correct way(s) are. – tpaksu May 07 '19 at 10:12
  • 1
    If you really want to do this in k8s, you need some `PersistentStorage` and a `StatefulSet`. Those are the k8s terms you’ll need to search for @tpaksu – RubberDuck May 07 '19 at 10:21
  • @RubberDuck thanks for the keywords, I've seen them but didn't have the chance to dive deep. – tpaksu May 07 '19 at 10:38