Troubleshooting

Installing and Using DOCKER and NV-DOCKER on CentOS 7

May 5, 2017

10 min read

DOCKER-ENGINE is a containerization technology that allows you to create, develop and run applications. In this article we will focus primarily on the basic installation steps for DOCKER and NV-DOCKER, and the ability for DOCKER, working with NV-DOCKER (a wrapper that NVIDIA provides) to provide a stable platform for pulling docker images, which are used to create containers. Containers are 'instances' of an environment, which are created based on the docker image. The containers can be run once, or live on as persistent daemon processes - for which there will be examples of nvidia-docker below.

Installing and getting DOCKER and NV-DOCKER running in CentOS 7 is a straight forward process:

# Assumes CentOS 7


# Assumes NVIDIA Driver is installed as per requirements ( < 340.29 )

# Install DOCKER

sudo curl -fsSL https://get.docker.com/ | sh

# Start DOCKER

sudo systemctl start docker

# Add dockeruser, usermod change

sudo adduser dockeruser

usermod -aG docker dockeruser

# Install NV-DOCKER

# GET NVIDIA-DOCKER

wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm

# INSTALL

sudo rpm -i /tmp/nvidia-docker*.rpm

# Start NV-DOCKER Service

systemctl start nvidia-docker

After the steps above you should have a running Docker and NVIDIA-DOCKER services.

This can be checked via:

[username@host ~]# systemctl status docker


● docker.service - Docker Application Container Engine

   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)

   Active: active (running) since Thu 2017-03-23 20:59:01 PDT; 16h ago

..... truncated ....

[username@host ~]# systemctl status nvidia-docker

● nvidia-docker.service - NVIDIA Docker plugin

   Loaded: loaded (/usr/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: disabled)

   Active: active (running) since Thu 2017-03-23 20:58:59 PDT; 17h ago

..... truncated ....

Using Docker/NVIDIA-DOCKER

Pull and Run your First Container

For a quick and dirty test, using NVIDIA GPUs in a container. This can be done from either sudo/root or su - dockeruser from the install instructions above.

You can run the following example :

# Instantiate a container from the nvidia-docker command.


# Note, that nvidia-docker must be run when using any command with docker that involves "run" that you would like to use GPUs with. nvidia-docker is a wrapper that handles setting up the environment (container) in relation to GPUs, GPGPU, Etc.

nvidia-docker run --rm nvidia/cuda nvidia-smi

Command Explanation:

nvidia-docker - the NVIDIA shim/wrapper that helps setup GPUs with DOCKER
run - tells nvidia-docker wrapper that you're going to start (instantiate) a container
- Note that for any command that does not include 'run' in it, you can simply use docker, but if you use nvidia-docker the command gets passed through to docker (E.g docker images display the docker images on your system, nvidia-docker images would also execute and show the same info)
--rm - this tells DOCKER that after the command runs, the container should be stopped/removed
- This is a very interesting feature/capability. If you think about it, an entire environment is being created, for nvidia-smi to run, and then the container is destroyed. It can be done repeatedly and is very simple and fast.
nvidia/cuda - this is the name of an image
- Note that, the first time you run this command, DOCKER will go out and find an image with that name, and download the docker image from the hub.docker.com repository. This will only happen the first time. You could also run docker pull nvidia/cuda before hand to be verbose and separate the steps. This one-liner works though.
nvidia-smi - this is the command to be run on the container

You should get output that looks like the below:

Note that the Pull complete portions (The parts above the nvidia-smi output) are a one-time occurrence as the image is not on your system locally and is being fetched to launch the image into a container instance.

[user@host ~]# nvidia-docker run --rm nvidia/cuda nvidia-smi


Using default tag: latest

latest: Pulling from nvidia/cuda

d54efb8db41d: Pull complete

f8b845f45a87: Pull complete

e8db7bf7c39f: Pull complete

9654c40e9079: Pull complete

6d9ef359eaaa: Pull complete

cdfa70f89c10: Pull complete

3208f69d3a8f: Downloading 151.3 MB/421.5 MB

eac0f0483475: Download complete

4580f9c5bac3: Verifying Checksum

6ee6617c19de: Downloading   109 MB/456.1 MB

Fri Mar 24 20:47:52 2017

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  GeForce GTX 1080    Off  | 0000:03:00.0      On |                  N/A |

| 27%   34C    P8     7W / 180W |   7725MiB /  8113MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID  Type  Process name                               Usage      |

|=============================================================================|

+-----------------------------------------------------------------------------+

Running a persistent container / NVIDIA DIGITS

The following below is meant to demonstrate pulling a DIGITS image and running it in daemon/persistent mode.

It should be noted that in order to use DIGITS you will need to provide its data via the -v command line switch when launching the docker container, which you utilize to map a mount point on the local machine to a mount point within the container, for example: -v /mnt/dataset:/data/dataset This would map /mnt/dataset on the host machine to /data/dataset in the container. When interacting with DIGITS you would be able to see this data when creating datasets, etc from the Web UI.

Running nvidia-docker

[user@host~]# NV_GPU=0,1 nvidia-docker run --name digits -d -p 5000:5000 nvidia/digits

6b12a4107569214a3177304ef2c9db0f333e266d0d766d2c8c02e5bbddd3d444 # This is the Instance ID launched from the nvidia-docker run command

Command Explanation:

NV_GPU=0,1
- This is a method of assigning GPU resources to a container which is critical for leveraging DOCKER in a Multi GPU System. This passes GPU ID 0,1 from the host system to the container as resources. Note that if you passed GPU ID 2,3 for example, the container would still see the GPUs as ID 0,1 inside the container, with the PCI ID of 2,3 from the host system.
nvidia-docker - the NVIDIA shim/wrapper that helps setup GPUs with DOCKER
run - tells nvidia-docker wrapper that you're going to start (instantiate) a container
- Note that for any command that does not include 'run' in it, you can simply use docker, but if you use nvidia-docker the command gets passed through to docker (E.g docker images display the docker images on your system, nvidia-docker images would also execute and show the same info)
--name digits
- This names your container instance, you need a unique name for each instance created in this way. It adds another way for instances to be referenced by, the default method is an instance ID Hash
-d
- Instructs DOCKER that this will be a daemonized/persistent container
-p 5000:5000
- This is a way of port mapping. 5000 is being mapped to 5000 for the DIGITS webserver port.
- If you run multiple containers/instances of DIGITS, for example, you could do -p 5001:5000 for the next container and you would be able to connect to it at the IP_ADDRESS:5001 location, and still connect to IP_ADDRESS:5000 of the other DIGITS container.
nvidia/digits
- Which image we're launching

After running this command, you could connect to DIGITS at the URL of the host system, at port 5000. It would have access to GPU ID 0,1 as resources within the container and within DIGITS in that container. If, for example, this was a 4 GPU machine, you could run the following to create another container, based on that same image, but expose a different port so that the two containers don't conflict with each other, and specify different GPUs so the containers don't try and utilize the same GPGPU resources.

[user@host~]# NV_GPU=2,3 nvidia-docker run --name digits -d -p 5001:5000 nvidia/digits

95e42817050c3e6de88f61473692a71ac0ab0948fe873c06155b95b62dad5554 # Instance ID!

Now you would have another DIGITS instance on port 5001 that would be accessible from a web browser, and this DIGITS installation would have access to the GPU ID 2 and 3 from the host system.

Check Running nvidia-docker Containers

You can check your running containers/instances by running either nvidia-docker ps or docker ps, see below for an example :

Note the PORTS section which is very helpful once you get containers up and running to see how they are mapped.

CONTAINER ID        IMAGE                              COMMAND              CREATED             STATUS              PORTS                              NAMES


95e42817050c        nvidia/digits                      "python -m digits"   25 seconds ago      Up 24 seconds       0.0.0.0:5001->5000/tcp             digits1

6b12a4107569        nvidia/digits                      "python -m digits"   16 hours ago        Up 16 hours         0.0.0.0:5000->5000/tcp             digits

Topics

Have any questions?

Troubleshooting