SOLVED - CUDA Compatibility with Tensorflow and Python

TROI-Z tech

Hi Guys,
Do you have an issue with CUDA compatibility? And trying to resolve this issue? Well, maybe this post is right for you.
Today I'm gonna share my experience on how do I resolve my problem installing CUDA/GPU + Tensorflow + Python on my Linux Machine, Ubuntu 20.04.
I've been trying and searching on the Internet.
I have my NVIDIA installed on my machine.
So, if I check my VGA by using the command "lspci", I got my NVIDIA driver installed correctly.

abdusy@troiz:~$ lspci | grep VGA

06:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)

And to verify that everything that related to NVidia, CUDA, I check with command “nvidia-smi”.

But, when I check my tensorflow with the script below :

import tensorflow as tf

print("Tensorflow version:", tf.__version__)

# Check if GPU is available

if tf.test.is_gpu_available():

print('CUDA is available! Using GPU for TensorFlow.')

else:

print('CUDA is not available. Using CPU for TensorFlow.')

It gives “CUDA is not available”.

Tensorflow version: 2.10.0 CUDA is not available. Using CPU for TensorFlow.

However, when I used another library, PyTorch, and I check with this script:

import torch print("Torch version:",torch.__version__)

# Check if CUDA is available

if torch.cuda.is_available():

# Set the default device to GPU

device = torch.device('cuda')

print('CUDA is available! Using GPU for computations.')

else:

# Set the default device to CPU

device = torch.device('cpu')

print('CUDA is not available. Using CPU for computations.')

and it shows that my CUDA is available.

Torch version: 2.0.1+cu117 CUDA is available! Using GPU for computations.

So, it means that I have a CUDA compatibility issue on my machine.

SOLUTION

How to solve this problem.

If you read this website (https://docs.nvidia.com/deploy/cuda-compatibility/index.html) about CUDA compatibility carefully, you will find all information that describes everything on CUDA.

So, how to resolve this issue? Well, first of all, I was thinking to use docker which is already build-in. Just to make it simple. Because I love everything that simple :).

Then I found this website (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) that provides TensorFlow containers with comprehensive tools and libraries in a flexible architecture allowing easy deployment across a variety of platforms and devices.

So, you guys, just go run this command:

nvidia-docker run -it --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

In my case I use “nvidia-docker run -it –rm nvcr.io/nvidia/tensorflow:22.01-tf2-py3”, because I’m using TensorFlow v2 and Python 3.9.

After you finished cloning this docker, then you can run it with this command :

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

In my case, I run a command like this “docker run –network host –gpus all -it –rm -v /home/abdusy:/workspace/my_ml nvcr.io/nvidia/tensorflow:22.01-tf2-py3”. This is because I want to use my network host, gpu, and my documents from this docker.

Voila….!

View the video on Youtube : https://youtu.be/RwXTQTMFi0g

Colmar, 06 July 2023 (Summer Time)

TROI-Z tech

Troi-z IT Consultant