28 Apr 2021

Day 848

Installing CUDA and pytorch and tensorflow

~~Following this: CUDA 10.1 installation on Ubuntu 18.04 LTS | Medium~~ nope, errors

In the same github discussion about installing CUDA on ubuntu that I’ve been to twice this bit is mentioned: ¹

The very very important thing is that never install “nvidia-driver-***” driver by yourself.

Required nvidia drivers are installed while doing sudo apt install -y cuda=10.0.130-1

Zsh wildcards and apt-get remove

sudo apt remove --autoremove nvidia-* doesn’t work as-is in zsh! * gets interpreted as files in current directory. Explains my CUDA issues, everything seemed to work till I ran the above in a directory containing files with matching names that got helpfully shown.

sudo apt remove --autoremove nvidia-\* is the answer.

(or 'nvidia-*')

Not the first time this bites me, at least the third, and all of them in the context of doing CUDA stuff.

German

“Es funktioniert fabelhaft” - heard at work

Purging packages

apt --fix-broken install didn’t help as advertised, but removing all the broken packages together with sudo dpkg -P cuda-libraries-10-0 libnvidia-common-390 helped! After this removing/cleaning up everything else worked. A lot of this mentioned changes to initramfs, I really hope I’ll be able to boot up next time :(

Also - if 90% of the tutorials about how to install $thing start with “Remove any traces of installs of $thing you have” it’s a nice sign that something’s shady.

Docker logs

docker logs 09348209840239

i3 skype floating window fix

Skype fix : i3wm:

Option 1: hide the floating window:

for_window [title="^Skype$" floating] move scratchpad

Option 2:

Clever idea. Although, are you talking about the little window that can be disabled in Skype’s “Settings > Calling > Show call window when Skype is in the background”?

Slack show all messages in all channels

In search, before:Tomorrow is a nice catch-all filter

Pytorch installs its own CUDA!

Your system installations of CUDA and cudnn won’t be used, if you install PyTorch binaries with these libs. E.g. conda install pytorch torchvision cudatoolkit=10.1 -c pytorch will install CUDA 10.1 and cudnn in your current conda environment. ²

Tensorflow CUDA Docker doesn’t need CUDA on host machine, only the nvidia drivers

Nvidia drivers are needed on host machine, but not CUDA! ³

Random / UX / Design?

On TF’s official CUDA install page⁴, the bash listings (that are usually copypasted) contain the standard $ at the beginning, it’s visible, but not copypastable!

Installing CUDA 11.0 using official Tensorflow tutorial

So, hopefully the last time today, as the previous couple of times I end up in the official TF tutorial⁴ about installing CUDA. Armed with the knowledge that:

pytorch installs its own CUDA and doesn’t care, as long as GPU drivers are there
Docker installs its own CUDA and doesn’t care, as long as GPU drivers are on the host machine
Installing nvidia drivers should not be manual, it has to be done by the cuda packages

Snippet:

# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update

wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt-get update

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-11-0 \
    libcudnn8=8.0.4.30-1+cuda11.0  \
    libcudnn8-dev=8.0.4.30-1+cuda11.0

# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install TensorRT. Requires that libcudnn8 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
    libnvinfer-dev=7.1.3-1+cuda11.0 \
    libnvinfer-plugin7=7.1.3-1+cuda11.0

Done, no conflicts, no anything, worked better than most Medium tutorials I’ve read today.

# Reboot.

Let’s hope for the best.

UPD: no black screen, booted fine, but nvidia-smi sees no driver.

sudo apt list --installed shows all cuda stuff and nvidia driver to be installed:

nvidia-driver-465/unknown,unknown,now 465.19.01-0ubuntu1 amd64 [installed,automatic]

More worryingly, I see mentions of cuda-10-1 and cuda-11-1 together

list processes ubuntu

I should use ps axf instead of ps aux, the former gives a nice tree representation

Nvidia CUDA official installer documentation

Yet another place that makes it look easy: CUDA Toolkit 11.0 Download | NVIDIA Developer

Nel mezzo del deserto posso dire tutto quello che voglio.

serhii.net