Hi Guys,
Do you have an issue with CUDA compatibility? And trying to resolve this issue? Well, maybe this post is right for you.
Today I'm gonna share my experience on how do I resolve my problem installing CUDA/GPU + Tensorflow + Python on my Linux Machine, Ubuntu 20.04.
I've been trying and searching on the Internet.
I have my NVIDIA installed on my machine.
So, if I check my VGA by using the command "lspci", I got my NVIDIA driver installed correctly.

 

abdusy@troiz:~$ lspci | grep VGA

06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)

 

And to verify that everything that related to NVidia, CUDA, I check with command “nvidia-smi”.

 

 

But, when I check my tensorflow with the script below :

 

import tensorflow as tf

print("Tensorflow version:", tf.__version__)

# Check if GPU is available

if tf.test.is_gpu_available():

  print('CUDA is available! Using GPU for TensorFlow.')

else:

  print('CUDA is not available. Using CPU for TensorFlow.')

 

It gives “CUDA is not available”.

Tensorflow version: 2.10.0 CUDA is not available. Using CPU for TensorFlow.


However, when I used another library, PyTorch, and I check with this script:

import torch print("Torch version:",torch.__version__)

# Check if CUDA is available

if torch.cuda.is_available():

# Set the default device to GPU

  device = torch.device('cuda')

  print('CUDA is available! Using GPU for computations.')

else:

 # Set the default device to CPU

 device = torch.device('cpu')

 print('CUDA is not available. Using CPU for computations.')

 

and it shows that my CUDA is available.

Torch version: 2.0.1+cu117 CUDA is available! Using GPU for computations.


So, it means that I have a CUDA compatibility issue on my machine.

 

SOLUTION

How to solve this problem.

If you read this website (https://docs.nvidia.com/deploy/cuda-compatibility/index.html) about CUDA compatibility carefully, you will find all information that describes everything on CUDA.

So, how to resolve this issue? Well, first of all, I was thinking to use docker which is already build-in. Just to make it simple. Because I love everything that simple :).

Then I found this website (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) that provides TensorFlow containers with comprehensive tools and libraries in a flexible architecture allowing easy deployment across a variety of platforms and devices.

So, you guys, just go run this command:

 

nvidia-docker run -it --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

 

In my case I use “nvidia-docker run -it –rm nvcr.io/nvidia/tensorflow:22.01-tf2-py3”, because I’m using TensorFlow v2 and Python 3.9.

After you finished cloning this docker, then you can run it with this command :

 

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

 

In my case, I run a command like this “docker run –network host –gpus all -it –rm -v /home/abdusy:/workspace/my_ml nvcr.io/nvidia/tensorflow:22.01-tf2-py3”. This is because I want to use my network host, gpu, and my documents from this docker.

Voila….!

 

View the video on Youtube : https://youtu.be/RwXTQTMFi0g

 

Colmar, 06 July 2023 (Summer Time)