Day 848
Installing CUDA and pytorch and tensorflow
Following this: CUDA 10.1 installation on Ubuntu 18.04 LTS | Medium nope, errors
In the same github discussion about installing CUDA on ubuntu that I’ve been to twice this bit is mentioned: 1
The very very important thing is that never install “nvidia-driver-***” driver by yourself.
Required nvidia drivers are installed while doing
sudo apt install -y cuda=10.0.130-1
Zsh wildcards and apt-get remove
sudo apt remove --autoremove nvidia-*
doesn’t work as-is in zsh! *
gets interpreted as files in current directory. Explains my CUDA issues, everything seemed to work till I ran the above in a directory containing files with matching names that got helpfully shown.
sudo apt remove --autoremove nvidia-\*
is the answer.
(or 'nvidia-*'
)
Not the first time this bites me, at least the third, and all of them in the context of doing CUDA stuff.
German
“Es funktioniert fabelhaft” - heard at work
Purging packages
apt --fix-broken install
didn’t help as advertised, but removing all the broken packages together with sudo dpkg -P cuda-libraries-10-0 libnvidia-common-390
helped! After this removing/cleaning up everything else worked.
A lot of this mentioned changes to initramfs, I really hope I’ll be able to boot up next time :(
Also - if 90% of the tutorials about how to install $thing start with “Remove any traces of installs of $thing you have” it’s a nice sign that something’s shady.
Docker logs
docker logs 09348209840239
i3 skype floating window fix
Option 1: hide the floating window:
for_window [title="^Skype$" floating] move scratchpad
Option 2:
Clever idea. Although, are you talking about the little window that can be disabled in Skype’s “Settings > Calling > Show call window when Skype is in the background”?
Slack show all messages in all channels
In search, before:Tomorrow
is a nice catch-all filter
Pytorch installs its own CUDA!
Your system installations of CUDA and cudnn won’t be used, if you install PyTorch binaries with these libs. E.g.
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
will install CUDA 10.1 and cudnn in your current conda environment. 2
Tensorflow CUDA Docker doesn’t need CUDA on host machine, only the nvidia drivers
Nvidia drivers are needed on host machine, but not CUDA! 3
Random / UX / Design?
On TF’s official CUDA install page4, the bash listings (that are usually copypasted) contain the standard $
at the beginning, it’s visible, but not copypastable!
Installing CUDA 11.0 using official Tensorflow tutorial
So, hopefully the last time today, as the previous couple of times I end up in the official TF tutorial4 about installing CUDA. Armed with the knowledge that:
- pytorch installs its own CUDA and doesn’t care, as long as GPU drivers are there
- Docker installs its own CUDA and doesn’t care, as long as GPU drivers are on the host machine
- Installing nvidia drivers should not be manual, it has to be done by the cuda packages
Snippet:
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt-get update
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install TensorRT. Requires that libcudnn8 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
libnvinfer-dev=7.1.3-1+cuda11.0 \
libnvinfer-plugin7=7.1.3-1+cuda11.0
Done, no conflicts, no anything, worked better than most Medium tutorials I’ve read today.
# Reboot.
Let’s hope for the best.
UPD: no black screen, booted fine, but nvidia-smi
sees no driver.
sudo apt list --installed
shows all cuda stuff and nvidia driver to be installed:
nvidia-driver-465/unknown,unknown,now 465.19.01-0ubuntu1 amd64 [installed,automatic]
More worryingly, I see mentions of cuda-10-1 and cuda-11-1 together
list processes ubuntu
I should use ps axf
instead of ps aux
, the former gives a nice tree representation
Nvidia CUDA official installer documentation
Yet another place that makes it look easy: CUDA Toolkit 11.0 Download | NVIDIA Developer