others-how to solve 'NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver' error when trying to view GPU information using nvidia-smi on ubuntu?

1. Purpose

In this post, I will show you how to view GPU information using nvidia-smi on ubuntu system.

Sometimes, when we want to view nvidia GPU information on ubuntu system, we got this error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. 
Make sure that the latest NVIDIA driver is installed and running.



2. Solution

2.1 What is nvidia-smi?

The NVIDIA System Management Interface (SMI) is a command line utility that helps with managing NVIDIA Graphics Processing Unit (GPU) devices.

2.2 How to solve the problem?

First , we need to use dkms to install nvidia GPU drivers, dkms(Dynamic Kernel Module Support) is a program/framework that enables generating Linux kernel modules whose sources generally reside outside the kernel source tree. The concept is to have DKMS modules automatically rebuilt when a new kernel is installed. Dynamic Kernel Module Support.

sudo apt install dkms

Then we need to know the exact version of our GPU driver:

ls /usr/src | grep nvidia

Our version is 418.87.00.

Then , we can use dkms to install nvidia driver as follows:

sudo dkms install -m nvidia -v 418.87.00

Now run the nvidia-smi again:

ubuntu@myubuntu:/opt/dbgpt$ sudo nvidia-smi
Thu Oct 12 07:06:38 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10                     Off | 00000000:17:00.0 Off |                    0 |
|  0%   45C    P0              56W / 150W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A10                     Off | 00000000:98:00.0 Off |                    0 |
|  0%   45C    P0              56W / 150W |      4MiB / 23028MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Ok, now it works!



3. Summary

In this post, I demonstrated how to solve the NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver problem when trying to view GPU information using nvidia-smi on ubuntu system . That’s it, thanks for your reading.