HPC

Exploring NVIDIA NVLink "nvidia-smi" Commands

May 26, 2017
4 min read
NVIDIA-NVLink-Commands.jpg

The addition of NVLink to the board architecture has added a lot of new commands to the nvidia-smi wrapper that is used to query the NVML / NVIDIA Driver. This blog post explores a few examples of these commands, as well as an overview of the NVLink syntax/options in their entirety as of NVIDIA Driver Revision v375.26.

Sometimes these commands can be a bit tricky to execute, the nvidia-smi output below and specific examples should help anyone having trouble with the -i switch for targeting specific GPU IDs.

Overall Syntax From Nvidia-smi nvlink -h

[root@localhost ~]# nvidia-smi nvlink -h

nvlink -- Display NvLink information.

Usage: nvidia-smi nvlink [options]

Options include:
[-h | --help]: Display help information
[-i | --id]: Enumeration index, PCI bus ID or UUID.

[-l | --link]: Limit a command to a specific link. Without this flag, all link information is displayed.
[-s | --status]: Display link state (active/inactive).
[-c | --capabilities]: Display link capabilities.
[-p | --pcibusid]: Display remote node PCI bus ID for a link.
[-sc | --setcontrol]: Set the utilization counters to count specific NvLink transactions.
The argument consists of an N-character string representing what is meant to be counted:
First character specifies the counter set:
0 = counter 0
1 = counter 1
Second character can be:
c = count cycles
p = count packets
b = count bytes
Next N characters can be any of the following:
n = nop
r = read
w = write
x = reduction atomic requests
y = non-reduction atomic requests
f = flush
d = responses with data
o = responses with no data
z = all traffic

[-gc | --getcontrol]: Get the utilization counter control information showing
the counting method and packet filter for the specified counter set (0 or 1).
[-g | --getcounters]: Display link utilization counter for specified counter set (0 or 1).
[-r | --resetcounters]: Reset link utilization counter for specified counter set (0 or 1).
[-e | --errorcounters]: Display error counters for a link.
[-ec | --crcerrorcounters]: Display per-lane CRC error counters for a link.
[-re | --reseterrorcounters]: Reset all error counters to zero.

Showing NVLINK Status For Different GPUs

To show active NVLINK Connections, you must specify GPU index via -i #, see below for an example

[root@localhost ~]# nvidia-smi nvlink --status -i 0
Link 0: active
Link 1: active
Link 2: active
Link 3: active

Display & Explore NVLINK Capabilities Per Link #

Allows you to query to ensure each link associated with the GPU Index (specified by -i #) has specific capabilities related to P2P, System Memory, P2P Atomics, SLI.

[root@localhost Stand_Alone_Validation]# nvidia-smi nvlink --capabilities -i 1
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: false
Link 0, SLI is supported: false
Link 0, Link is supported: false
Link 1, P2P is supported: true
Link 1, Access to system memory supported: true
Link 1, P2P atomics supported: true
Link 1, System memory atomics supported: false
Link 1, SLI is supported: false
Link 1, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true
Link 2, System memory atomics supported: false
Link 2, SLI is supported: false
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: false
Link 3, SLI is supported: false
Link 3, Link is supported: false

NVLink Usage Counters

nvidia-smi nvlink -g N -i N allows you to view the data being traversed on the different NVLink Link.

See example below :

[root@localhost simpleP2P]# nvidia-smi nvlink -g 0 -i 0
Link 0: Rx0: 123511119 KBytes, Tx0: 123511119 KBytes
Link 1: Rx0: 123513999 KBytes, Tx0: 123513039 KBytes
Link 2: Rx0: 123511144 KBytes, Tx0: 123511144 KBytes
Link 3: Rx0: 123511144 KBytes, Tx0: 123512104 KBytes
[root@localhost simpleP2P]# ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = " Quadro GP100" IS capable of Peer-to-Peer (P2P)
> GPU1 = " Quadro GP100" IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from Quadro GP100 (GPU0) -> Quadro GP100 (GPU1) : Yes
> Peer access from Quadro GP100 (GPU1) -> Quadro GP100 (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Quadro GP100 (GPU0) supports UVA: Yes
> Quadro GP100 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (4096MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 69.02GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Disabling peer access...
Shutting down...
Test passed
[root@localhost simpleP2P]# nvidia-smi nvlink -g 0 -i 0
Link 0: Rx0: 178271952 KBytes, Tx0: 178271952 KBytes
Link 1: Rx0: 178274832 KBytes, Tx0: 178273872 KBytes
Link 2: Rx0: 178271977 KBytes, Tx0: 178271977 KBytes
Link 3: Rx0: 178271977 KBytes, Tx0: 178272937 KBytes

Topics

NVIDIA-NVLink-Commands.jpg
HPC

Exploring NVIDIA NVLink "nvidia-smi" Commands

May 26, 20174 min read

The addition of NVLink to the board architecture has added a lot of new commands to the nvidia-smi wrapper that is used to query the NVML / NVIDIA Driver. This blog post explores a few examples of these commands, as well as an overview of the NVLink syntax/options in their entirety as of NVIDIA Driver Revision v375.26.

Sometimes these commands can be a bit tricky to execute, the nvidia-smi output below and specific examples should help anyone having trouble with the -i switch for targeting specific GPU IDs.

Overall Syntax From Nvidia-smi nvlink -h

[root@localhost ~]# nvidia-smi nvlink -h

nvlink -- Display NvLink information.

Usage: nvidia-smi nvlink [options]

Options include:
[-h | --help]: Display help information
[-i | --id]: Enumeration index, PCI bus ID or UUID.

[-l | --link]: Limit a command to a specific link. Without this flag, all link information is displayed.
[-s | --status]: Display link state (active/inactive).
[-c | --capabilities]: Display link capabilities.
[-p | --pcibusid]: Display remote node PCI bus ID for a link.
[-sc | --setcontrol]: Set the utilization counters to count specific NvLink transactions.
The argument consists of an N-character string representing what is meant to be counted:
First character specifies the counter set:
0 = counter 0
1 = counter 1
Second character can be:
c = count cycles
p = count packets
b = count bytes
Next N characters can be any of the following:
n = nop
r = read
w = write
x = reduction atomic requests
y = non-reduction atomic requests
f = flush
d = responses with data
o = responses with no data
z = all traffic

[-gc | --getcontrol]: Get the utilization counter control information showing
the counting method and packet filter for the specified counter set (0 or 1).
[-g | --getcounters]: Display link utilization counter for specified counter set (0 or 1).
[-r | --resetcounters]: Reset link utilization counter for specified counter set (0 or 1).
[-e | --errorcounters]: Display error counters for a link.
[-ec | --crcerrorcounters]: Display per-lane CRC error counters for a link.
[-re | --reseterrorcounters]: Reset all error counters to zero.

Showing NVLINK Status For Different GPUs

To show active NVLINK Connections, you must specify GPU index via -i #, see below for an example

[root@localhost ~]# nvidia-smi nvlink --status -i 0
Link 0: active
Link 1: active
Link 2: active
Link 3: active

Display & Explore NVLINK Capabilities Per Link #

Allows you to query to ensure each link associated with the GPU Index (specified by -i #) has specific capabilities related to P2P, System Memory, P2P Atomics, SLI.

[root@localhost Stand_Alone_Validation]# nvidia-smi nvlink --capabilities -i 1
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: false
Link 0, SLI is supported: false
Link 0, Link is supported: false
Link 1, P2P is supported: true
Link 1, Access to system memory supported: true
Link 1, P2P atomics supported: true
Link 1, System memory atomics supported: false
Link 1, SLI is supported: false
Link 1, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true
Link 2, System memory atomics supported: false
Link 2, SLI is supported: false
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: false
Link 3, SLI is supported: false
Link 3, Link is supported: false

NVLink Usage Counters

nvidia-smi nvlink -g N -i N allows you to view the data being traversed on the different NVLink Link.

See example below :

[root@localhost simpleP2P]# nvidia-smi nvlink -g 0 -i 0
Link 0: Rx0: 123511119 KBytes, Tx0: 123511119 KBytes
Link 1: Rx0: 123513999 KBytes, Tx0: 123513039 KBytes
Link 2: Rx0: 123511144 KBytes, Tx0: 123511144 KBytes
Link 3: Rx0: 123511144 KBytes, Tx0: 123512104 KBytes
[root@localhost simpleP2P]# ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = " Quadro GP100" IS capable of Peer-to-Peer (P2P)
> GPU1 = " Quadro GP100" IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from Quadro GP100 (GPU0) -> Quadro GP100 (GPU1) : Yes
> Peer access from Quadro GP100 (GPU1) -> Quadro GP100 (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Quadro GP100 (GPU0) supports UVA: Yes
> Quadro GP100 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (4096MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 69.02GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Disabling peer access...
Shutting down...
Test passed
[root@localhost simpleP2P]# nvidia-smi nvlink -g 0 -i 0
Link 0: Rx0: 178271952 KBytes, Tx0: 178271952 KBytes
Link 1: Rx0: 178274832 KBytes, Tx0: 178273872 KBytes
Link 2: Rx0: 178271977 KBytes, Tx0: 178271977 KBytes
Link 3: Rx0: 178271977 KBytes, Tx0: 178272937 KBytes

Topics