Hi Guys,
Do you have an issue with CUDA compatibility? And trying to resolve this issue? Well, maybe this post is right for you.
Today I'm gonna share my experience on how do I resolve my problem installing CUDA/GPU + Tensorflow + Python on my Linux Machine, Ubuntu 20.04.
I've been trying and searching on the Internet.
I have my NVIDIA installed on my machine.
So, if I check my VGA by using the command "lspci", I got my NVIDIA driver installed correctly.
abdusy@troiz:~$ lspci | grep VGA
06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
And to verify that everything that related to NVidia, CUDA, I check with command “nvidia-smi”.
But, when I check my tensorflow with the script below :
import tensorflow as tf
print("Tensorflow version:", tf.__version__)
# Check if GPU is available
if tf.test.is_gpu_available():
print('CUDA is available! Using GPU for TensorFlow.')
else:
print('CUDA is not available. Using CPU for TensorFlow.')
It gives “CUDA is not available”.
Tensorflow version: 2.10.0 CUDA is not available. Using CPU for TensorFlow.
However, when I used another library, PyTorch, and I check with this script:
import torch print("Torch version:",torch.__version__)
# Check if CUDA is available
if torch.cuda.is_available():
# Set the default device to GPU
device = torch.device('cuda')
print('CUDA is available! Using GPU for computations.')
else:
# Set the default device to CPU
device = torch.device('cpu')
print('CUDA is not available. Using CPU for computations.')
and it shows that my CUDA is available.
Torch version: 2.0.1+cu117 CUDA is available! Using GPU for computations.
So, it means that I have a CUDA compatibility issue on my machine.
SOLUTION
How to solve this problem.
If you read this website (https://docs.nvidia.com/deploy/cuda-compatibility/index.html) about CUDA compatibility carefully, you will find all information that describes everything on CUDA.
So, how to resolve this issue? Well, first of all, I was thinking to use docker which is already build-in. Just to make it simple. Because I love everything that simple :).
Then I found this website (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) that provides TensorFlow containers with comprehensive tools and libraries in a flexible architecture allowing easy deployment across a variety of platforms and devices.
So, you guys, just go run this command:
nvidia-docker run -it --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3
In my case I use “nvidia-docker run -it –rm nvcr.io/nvidia/tensorflow:22.01-tf2-py3”, because I’m using TensorFlow v2 and Python 3.9.
After you finished cloning this docker, then you can run it with this command :
docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3
In my case, I run a command like this “docker run –network host –gpus all -it –rm -v /home/abdusy:/workspace/my_ml nvcr.io/nvidia/tensorflow:22.01-tf2-py3”. This is because I want to use my network host, gpu, and my documents from this docker.
Voila….!
View the video on Youtube : https://youtu.be/RwXTQTMFi0g
Colmar, 06 July 2023 (Summer Time)