2

Recently I ran against this question and this Nvidia-docker project, which is an Nvidia Docker implementation and it made me wondering where, why and how this scheme makes sense?

I found out some materials in the web (e.g. this) which states that this environment is used by video renderers and game developers, as well as Data Scientists which need GPU computations. Okay, so that is my question:

is Docker really needed in such scenarios where high-load apps need fast and parallelized computations?

In my understanding Docker stands as a cumbrous and redundant element which brings unnecessary virtualization and no gain in parallelization.

Can you give real-life examples where this union makes sense?

Suncatcher
  • 129
  • 3
  • see [Why do 'some examples' and 'list of things' questions get closed?](https://softwareengineering.meta.stackexchange.com/a/7538/31260) – gnat May 06 '18 at 16:15
  • Well, the question is rather concrete (feasible or not?). I mentioned examples so that mentors can easily illustrate the answers. – Suncatcher May 06 '18 at 18:17
  • its an interesting question, and one that I would love to hear other peoples comments on, But it fails the singular answer test. – Michael Shaw May 10 '18 at 13:58

1 Answers1

3

Containerization is completely orthogonal to “high load” or “parallelization”. Containerization also does not imply any virtualization, and is better interpreted as sandboxing.

So why do people use containers? Images.

Thanks to image layering, a container image can contain a complete application with all its dependencies (other services, libraries, …) without having to install them on the host system. This makes it feasible to run applications without having to install them permanently, or to run multiple instances of one application, or to run multiple applications with conflicting dependencies.

This has limited benefits for most users, except when running a cluster. Being able to launch a container image instead of permanently installing dependencies is a huge benefit and gives us loads of flexibility on a cluster: tomorrow, I want to run a completely different workload with a different set of dependencies.

You already mentioned two kinds of users where clusters are common: render farms, and scientific computing. These often happen to require GPUs. Specifially, CUDA-based programs require Nvidia GPUs.

So there is a desire to use both GPUs and containers for their cluster-management benefits. By itself, this combination does not require a special docker runtime. You need to configure the container runtime to pass through necessary resources, possibly allow extra syscalls from the container (which also weakens the sandboxing), and set up the necessary configuration in the container.

But why should you have to configure that? And there might be other problems: This is a scenario where GPU drivers are relevant, yet containers cannot load drivers into the kernel. A container might want to use a GPU as an exclusive resource. Using GPUs under Linux can sometimes still involve nontrivial configuration.

So clearly there is a desire to make GPU–container combinations less cumbersome. Nvidia-docker seems to be one approach. Kubernetes has experimental support for managing GPU resources and can use either nvidia-docker or a Google plugin as a GPU interface – but has apparently no support for OpenCL yet.

amon
  • 132,749
  • 27
  • 279
  • 375
  • Great answer. I was missing the hardware pass-through capability of Docker, it seems purely "soft" platform, utilizing only network stack of host system. Need to research it more. – Suncatcher May 06 '18 at 18:11
  • However, in resource-greedy applications like science and farms, images and templating (`different set of dependencies`) is less important than computing speed, and one would probably lean towards computational power in this trade-off. Docker itself eats a decent amount of resource and that amount is not so small. – Suncatcher May 06 '18 at 18:15
  • 1
    @Suncatcher not necessarily. I do computer support for a research lab. Computing speed is important, but dependency management ends up taking a lot of our time. In many cases we'd be happy to trade some losses in compute efficiency for some simplification in our dependency management. The desire to make this tradeoff may depend on the exact nature of the science you are doing: exploratory bioinformatics vs running a single well established hydrodynamics package. This [recent XKCD](https://xkcd.com/1987/) is relevant. – Charles E. Grant May 06 '18 at 19:14
  • Similarly, in the lab where I work, we are not interested in computing efficiency. We are interested in getting results as quickly as possible. Run time is one component of that, experiment setup another. If I spend half a day configuring dependencies on the cluster, that's half a day those machines are not running the experiments. While we're not currently using containers on the cluster b/c our experiments are fairly self-contained, that's probably going to change soon. – amon May 06 '18 at 19:33