Hi,
in order to train our models inside the docker container using the GPUs on the unterlying host machine, our framework (e.g. torch / caffee) needs the CUDA Driver / Toolkit to be installed.
Actually there are two possible solutions to solve this problem:
The first is to use nvidia-docker to run the container instead of the normal docker command (https://github.com/NVIDIA/nvidia-docker/wiki/Why%20NVIDIA%20Docker).
On the nvidia-docker website, it says:
"To make the Docker images portable while still leveraging NVIDIA GPUs, the solution used by nvidia-docker is to make the images agnostic of the NVIDIA driver. The required character devices and driver files are mounted when starting the container on the target machine."
So if it would be possible to run our container with nvidia-docker, this would be the easiest solution i think.
The second solution would be to install the NVIDIA driver inside the docker container like they describe it at nvidia-docker:
"To solve this problem, one of the early solutions that emerged was to fully reinstall the NVIDIA driver inside the container and then pass the character devices corresponding to the NVIDIA GPUs (e.g. /dev/nvidia0) when starting the container. However, this solution was brittle: the version of the host driver had to exactly match driver version installed in the container."
Therefore we have to know the driver version of the underlying host machine.
Best regards
Michael Mielimonka
Created by Michael Mielimonka mimie001 Thanks for your answers.
This will help a lot to build my docker container. Thomas gave a comprehensive answer. I will just add:
> the NVIDIA docker plugin looks at the driver[s] installed on the host (your computer) and mount them inside the container
Specifically the "drivers" that are "mounted inside the container" are the user-level libraries which talk to the underlying kernel module on the host machine. There is more background information here:
https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver
We use the 'volume-driver' approach and mount the user level libraries under `/usr/local/nvidia`.
If you need further guidance or run into issues, please let us know! Also, don't forget to attend tomorrow webinar as we will perform a demo of how to build a Docker container that you can submit to the Challenge.
https://www.synapse.org/#!Synapse:syn4224222/discussion/threadId=823 Welcome to the DM DREAM Challenge Michael!
We are using nvidia-docker on the Challenge Cloud. You can start from one of the CUDA Docker images provided by NVIDIA.You can find the list of tags for the image nvidia/cuda here:
https://hub.docker.com/r/nvidia/cuda/
Put the following line on the first line of your Dockerfile:
FROM nvidia/cuda:7.5-cudnn5-devel
If you compile this Dockerfile, you will obtain a container that already has CUDA and cuDNN installed. You can now install your inference method and its dependencies in the container starting on line 2 of the Dockerfile. Note that this container can only be run using "nvidia-docker run" (and not "docker run"). When running this container on your computer with "nvidia-docker run", the NVIDIA docker plugin looks at the driver installed on the host (your computer) and mount them inside the container. When the same container is submitted to the Challenge Cloud, the NVIDIA drivers installed on them are mounted inside your container. The great advantage of using the NVIDIA docker plugin is that a container can be run on multiple platforms with different versions of the NVIDIA drivers installed as long as the CUDA Docker images provided by NVIDIA supports all those drivers.
The Unix distribution installed in the NVIDIA Docker images is Ubuntu, however NVIDIA also provides them for CentOS. If you prefer to use CentOS, the first line of your Dockerfile would become:
FROM nvidia/cuda:7.5-cudnn5-devel-centos7
If you intend to use Caffe, you can start from one of the Dockerfiles available here:
https://github.com/BVLC/caffe/tree/master/docker
Drop files to upload
Docker Nvidia CUDA Driver/Toolkit Problem page is loading…