others-how to solve 'NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver' error when trying to view GPU information using nvidia-smi on ubuntu?
1. Purpose
In this post, I will show you how to view GPU information using nvidia-smi on ubuntu system.
Sometimes, when we want to view nvidia GPU information on ubuntu system, we got this error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is installed and running.
2. Solution
2.1 What is nvidia-smi?
The NVIDIA System Management Interface (SMI) is a command line utility that helps with managing NVIDIA Graphics Processing Unit (GPU) devices.
2.2 How to solve the problem?
First , we need to use dkms
to install nvidia GPU drivers, dkms(Dynamic Kernel Module Support) is a program/framework that enables generating Linux kernel modules whose sources generally reside outside the kernel source tree. The concept is to have DKMS modules automatically rebuilt when a new kernel is installed. Dynamic Kernel Module Support.
sudo apt install dkms
Then we need to know the exact version of our GPU driver:
ls /usr/src | grep nvidia
Our version is 418.87.00
.
Then , we can use dkms
to install nvidia driver as follows:
sudo dkms install -m nvidia -v 418.87.00
Now run the nvidia-smi
again:
ubuntu@myubuntu:/opt/dbgpt$ sudo nvidia-smi
Thu Oct 12 07:06:38 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10 Off | 00000000:17:00.0 Off | 0 |
| 0% 45C P0 56W / 150W | 4MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A10 Off | 00000000:98:00.0 Off | 0 |
| 0% 45C P0 56W / 150W | 4MiB / 23028MiB | 4% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Ok, now it works!
3. Summary
In this post, I demonstrated how to solve the NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver
problem when trying to view GPU information using nvidia-smi on ubuntu system . That’s it, thanks for your reading.