In the middle of the desert you can say anything you want
Changed the hint I most often use to a better binding:
# Copy url
# map kitty_mod+n>c kitten hints --type path --program @
map kitty_mod+g kitten hints --type path --program @
w track 1728 tag1
automatically ends it `now``.w continue
just continues the last thing running by starting something identical starting “now” and continuing till stopped.alias icat="kitty +kitten icat"
In zshrc:
autoload -Uz compinit
compinit
# Completion for kitty
kitty + complete setup zsh | source /dev/stdin
scrollback_pager vim - -c 'w! /tmp/kitty_scrollback' -c 'term ++curwin cat /tmp/kitty_scrollback'
Vim 8.0 works. Nice colorful etc.
Adding this allows to register the <Esc>
key in 0.1 sec, not default 0.4.
export KEYTIMEOUT=1
A Good Vimrc - TODO
I also love his design!
GitHub - softmoth/zsh-vim-mode: Friendly bindings for ZSH’s vi mode
Out of all the various vim plugins, this is the only one I found that allows to meaningfully work with objects, like ci'
etc. Also the mode indicator works very reliably.
Doesn’t conflict with zsh-evil-registers.
Ubuntu 18.04, qutebrowser etc, as usual. What helped was creating the environment with these options:
python3 scripts/mkvenv.py --pyqt-version 5.14
Should’ve done this a long time ago:
lq() {
jq . "$1" -C | less
}
From config; I should use them more.
# Select a filename and copy it
map kitty_mod+p>c kitten hints --type path --program @
#: Select a path/filename and open it with the default open program.
map kitty_mod+p>o kitten hints --type line --program -
Nicely described: How to switch between multiple GCC and G++ compiler versions on Ubuntu 20.04 LTS Focal Fossa - LinuxConfig.org
# install stuff
$ sudo apt -y install gcc-7 g++-7 gcc-8 g++-8 gcc-9 g++-9
# Add it to update-alternatives
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 7
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 7
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 9
# choose the default one
$ sudo update-alternatives --config gcc
There are 3 choices for the alternative gcc (providing /usr/bin/gcc).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/bin/gcc-9 9 auto mode
1 /usr/bin/gcc-7 7 manual mode
* 2 /usr/bin/gcc-8 8 manual mode
3 /usr/bin/gcc-9 9 manual mode
Press to keep the current choice[*], or type selection number:
From the docs:
--install link name path priority
Editable installations (pip install -e .
) are a thing. TODO - learn more about them.
Given that the standard ones are not enough for me, and even my additional ones for 10-20 are not enough, added a third level:
config.bind('1', 'tab-focus 1')
config.bind('2', 'tab-focus 2')
config.bind('3', 'tab-focus 3')
config.bind('4', 'tab-focus 4')
config.bind('5', 'tab-focus 5')
config.bind('6', 'tab-focus 6')
config.bind('7', 'tab-focus 7')
config.bind('8', 'tab-focus 8')
config.bind('9', 'tab-focus 9')
config.bind('0', 'tab-focus 10')
config.bind('<Alt-1>', 'tab-focus 11')
config.bind('<Alt-2>', 'tab-focus 12')
config.bind('<Alt-3>', 'tab-focus 13')
config.bind('<Alt-4>', 'tab-focus 14')
config.bind('<Alt-5>', 'tab-focus 15')
config.bind('<Alt-6>', 'tab-focus 16')
config.bind('<Alt-7>', 'tab-focus 17')
config.bind('<Alt-8>', 'tab-focus 18')
config.bind('<Alt-9>', 'tab-focus 19')
config.bind('<Alt-0>', 'tab-focus 20')
config.bind('<Alt-Ctrl-1>', 'tab-focus 21')
config.bind('<Alt-Ctrl-2>', 'tab-focus 22')
config.bind('<Alt-Ctrl-3>', 'tab-focus 23')
config.bind('<Alt-Ctrl-4>', 'tab-focus 24')
config.bind('<Alt-Ctrl-5>', 'tab-focus 25')
config.bind('<Alt-Ctrl-6>', 'tab-focus 26')
config.bind('<Alt-Ctrl-7>', 'tab-focus 27')
config.bind('<Alt-Ctrl-8>', 'tab-focus 28')
config.bind('<Alt-Ctrl-9>', 'tab-focus 29')
config.bind('<Alt-Ctrl-0>', 'tab-focus -1')
EDIT: Actually, to think of it, in for a penny, in for a pound!
for i in range(30, 60):
config.bind(','+str(i), 'tab-focus '+str(i))
Takes about 9 seconds to :config-source
everything, but then works like a charm! And doesn’t seem to make anything else slower (strangely, even startup is as usual).
Opened a README.md, and see it being rendered nicely to the left. I can also edit it directly. Wow.
sed Cheat Sheet - very down-to-earth, “praxisnah”, I like it. Except for the idiotic scrolling override animations
I should use '
for the filter, "
for any string elements inside it
select
jq '.results[] | select(.name == "John") | {age}' # Get age for 'John'
Value VS key-value
jq '.something'
gets the content of fields something
removing the keyjq '. | {something}'
gets key-value of something
$ jq '. | select(.tokens[0]=="Tel") | .tokens[]' mvs.json
"Tel"
":"
$ jq '. | select(.tokens[0]=="Tel") | .tokens' mvs.json
[
"Tel",
":"
]
$ jq '. | select(.tokens[0]=="Tel") | {tokens}' mvs.json
{
"tokens": [
"Tel",
":"
]
}
|keys
to extract keys onlyjq Cheet Sheet · GitHub also nice
TIl that you don’t need jq '. | keys'
, jq 'keys'
etc is enough.
jq '.[-2:]'
'sort_by(.foo)'
I think now I’m ready for the holy of holies: jq 1.4 Manual
{user, title: .titles[]}
will return an array of {user, title} for each value inside .titles[]
!()
s around an expression means it’ll be evaluated. {(.user): .titles}
will use the value of the key user
!$ jq '. | {(.id): .id}' mvs.json
{
"7574": "7574"
}
\(foo)
$ echo "[1,2,3]" | jq '"A string \(.)"'
"A string [1,2,3]"
It’s basically synonymous to python3’s f"My f-{string}"
'.a=23'
will produce an output with .a
being set to 23. Will be created if not there.
.a
in the same filter after a comma will still return the old value.|=
will “update” the value by running its previous value through the expression:$ echo '{"one": 23,"two":2}' | jq '.one|=(. | tostring)'
{
"one": "23",
"two": 2
}
jq -s
to use, and previosu input can be piped through to it!
'[...]'
can be used for the same thing. - though I can’t get this to workIt didn’t read the jq-generated multi-line output without commas between items, but jq compact mode does one record (without comma and not as part of an array) per line, and this gets parsed correctly!
JQ compact mode is jq -c '.' sth.json
Before:
{
"id": "7575",
"ner_tags": [
"6",
"6"
],
"tokens": [
"Tel",
":"
]
}
After:
{"id":"7575","ner_tags":["6","6"],"tokens":["Tel",":"]}
How to Create a Shared Directory for All Users in Linux
# Create the group
$sudo groupadd project
# Add user to this group
$sudo usermod -a -G project theuser
# Change the group of the directory
$ sudo chgrp -R project /var/www/reports/
# Turn on the `setGID` bit, so newly created subfiles inherit the same group as the directory
# And rwxrwx-rx
$ sudo chmod -R 2775 /var/www/reports/
“Which story do you want to tell?” (Heard at work, from R)
git commit -F filename
allows to use a pre-written commit message from a textfile.
You can ‘mark’ windows1, a la vim, and then use that as filter - no window classes etc needed - for example, for scratchpads!2
So now I have two scratchpads in i3 config:
bindsym $ms+Shift+plus mark "scratch2", move scratchpad
bindsym $ms+plus [con_mark="scratch2"] scratchpad show
bindsym $ms+Shift+minus mark "scratch", move scratchpad
bindsym $ms+minus [con_mark="scratch"] scratchpad show
The second one originally was meant to be for Ding, but it’s really nice to have it flexible.
Reading “German: An Essential Grammar” by Donaldson found this bit: 1
English has a rule that if the time of an event that
occurred in the past is mentioned, then the imperfect must be used, but if
the time is omitted, the perfect is required, e.g. \
- He returned from Hamburg yesterday.
- He has returned from Hamburg.
- He has returned from Hamburg yesterday. (not grammatical)
TIL.
zsh-specific - to detach & disown a process, there’s &!
: 2
dolphin &!
Long question and answer about fahren zu/nach/in/…: Richtungen und Ziele
The Yojik Website has the FSI courses FSI Languages Courses and the website as I remember it.
Changed ~/.taskrc
to show any active tasks regardless of anything else in my sprint view:
s () {task s \(project:w or \(sprint:$SPRINT \(+A or +O\)\) or +ACTIVE\) "$*"}
Standard lock command leaves both monitors on.
Reddit3 mentioned two commands:
xset s activate
xset dpms force off
The second one worked for me!
Now I have shiny new screen lock (and suspend too, while we are at it) keybinding in i3 config!
bindsym $ms+n exec gnome-screensaver-command -l && xset dpms force off
bindsym $ms+Shift+n exec i3lock -i ~/s/black_lock.png -t -p win -e && systemctl suspend -i
Nvidia has a repo of all docker images it creates, one of them: Torch | NVIDIA NGC
“Das finde ich zielführender als…” - heard at work
docker run --name frontend -p 0:80 frontend:latest
1
Port 0 gets passed to the kernel that assigns any free port.
To see which one, docker port somecontainer
.
docker run --gpus device=3 -e NVIDIA_VISIBLE_DEVICES=0 -e CUDA_VISIBLE_DEVICES=0 myservice
Where the device=3
is the GPU id on the host that we want to use.
lspci | grep -i "nvidia"
-i
== ‘ignore case’ is actually something that I can remember.
Docker will autostart any container with a RestartPolicy of ‘always’ when the docker service initially starts. 1
I can set/unset it in kitematic
, or through terminal:
docker update --restart=no my-container
Quoting SO: 2
apt purge --auto-remove <packagename>
purges packagename
and any packages which are rendered unnecessary by its removal, as well as any other packages which aren’t necessary.
apt autoremove --purge
purges any packages which aren’t necessary (marked as “automatically installed” and with no dependent packages).
The first form is what you’d use when manipulating individual packages; the latter is a clean-up operation across all packages.
This seems nice, TODO: Cleaning up with apt-get | Network World
LVM - Debian Wiki is nice and readable. I used this command to backup the headers:
sudo cryptsetup luksHeaderBackup /dev/nvmeXXXXX --header-backup-file headerBackupFile
… and put it somewhere not on the drive I’ll be recovering if it all goes wrong.
Aaaand the saga continues!
…since the GPU is an eGPU, apparently I do need to do the harder way: Accelerating Machine Learning on a Linux Laptop with an External GPU | NVIDIA Developer Blog
It is, I can see it:
(17:42:42/10815)~/$ lspci | grep -i VGA
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
0c:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
but if it wasn’t, I’d authorize it and check with boltctl list
:
(17:43:13/10817)~/$ boltctl list
[...]
● GIGABYTE GV-N1070IXEB-8GD
├─ type: peripheral
├─ name: GV-N1070IXEB-8GD
├─ vendor: GIGABYTE
├─ uuid: # redacted
├─ status: authorized
│ ├─ domain: domain0
│ └─ authflags: none
├─ authorized: Do 29 Apr 2021 07:57:37 UTC
├─ connected: Do 29 Apr 2021 07:57:37 UTC
└─ stored: no
How to setup an eGPU on Ubuntu for TensorFlow describes other things that can go wrong:
I had to disable the following, otherwise my eGPU was not detected:
- Secure Boot
- Thunderbolt Security Level
From this point on, I follow Nvidia’s tutorial 3 unless stated otherwise.
Using quotes means the *
doesn’t have to be escaped.
sudo apt-get purge "nvidia*"
This is a fuller example: 4
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt remove --autoremove nvidia-cuda-toolkit
sudo apt remove --autoremove nvidia-*
Found and manually removed /etc/apt/sources.list.d/graphics-drivers-ubuntu-ppa-bionic.list
, leaving the .save
file in place.
As per nvidia’s guide,
sudo apt-get update
sudo apt-get dist-upgrade
To be safe, rebooted.
The existing driver is most likely Nouveau, an open-source driver for NVIDIA GPUs. Because Nouveau doesn’t support eGPU setups, install the NVIDIA CUDA and NVIDIA drivers instead. You must also stop the kernel from loading Nouveau. 3
okay!
Found this: NVIDIA/data-science-stack: NVIDIA Data Science stack tools Read about it here: Ubuntu for machine learning with NVIDIA RAPIDS in 10 min | Ubuntu
Official by nvidia, and seems to do automatically what’s needed for supported systems. Let’s run a script from the internet that installs drivers, loads kernel modules etc.
Source is available, yay for open source: data-science-stack/data-science-stack at master · NVIDIA/data-science-stack
Ran ./data-science-stack setup-system
- uses sudo, didn’t ask for root or anything.o
Seems to have installed nvidia driver version 460. Asked to reboot at the end.
Rebooted.
(18:40:30/10909)~/$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
okay. Same results I had. Confirms that my prev. steps weren’t wronger than the script.
(18:41:49/10910)~/$ sudo apt list --installed | grep "\(cuda\|nvidia\)"
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
libnccl2/unknown,now 2.9.6-1+cuda11.3 amd64 [installed]
libnvidia-cfg1-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
libnvidia-common-460/unknown,now 460.73.01-0ubuntu1 all [installed,automatic]
libnvidia-compute-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
libnvidia-container-tools/bionic,now 1.4.0-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.4.0-1 amd64 [installed,automatic]
libnvidia-decode-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
libnvidia-encode-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
libnvidia-extra-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
libnvidia-fbc1-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
libnvidia-gl-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
libnvidia-ifr1-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
nvidia-compute-utils-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
nvidia-container-runtime/bionic,now 3.5.0-1 amd64 [installed,automatic]
nvidia-container-toolkit/bionic,now 1.5.0-1 amd64 [installed,automatic]
nvidia-dkms-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
nvidia-docker2/bionic,now 2.6.0-1 all [installed]
nvidia-driver-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed]
nvidia-kernel-common-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
nvidia-kernel-source-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
nvidia-prime/bionic-updates,bionic-updates,now 0.8.16~0.18.04.1 all [installed,automatic]
nvidia-settings/unknown,unknown,now 465.19.01-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
xserver-xorg-video-nvidia-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
Also, as usual,
(18:48:34/10919)~/$ lsmod | grep nvi
(18:48:37/10920)~/$
lspci -k
shows the kernel modules:
0c:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
Subsystem: Gigabyte Technology Co., Ltd GP104 [GeForce GTX 1070]
Kernel modules: nvidiafb, nouveau
This output implies no nvidia driver is installed on my system5. …though it is.
$ nvidia-settings --version
nvidia-settings: version 465.19.01
software-properties-gtk
tells me I’m using the proprietary nvidia-driver-460, not 465
In any case, can’t blacklist nouveau as still there are no ubuntu kernel modules.
BUT!
(19:04:04/10946)~/$ dkms status
nvidia, 460.73.01: added
Also, inxi -Fxxxrz
(found somewhere on the internet):
Graphics: Card-1: Intel UHD Graphics 620 bus-ID: 00:02.0 chip-ID: 8086:5917
Card-2: NVIDIA GP104 [GeForce GTX 1070] bus-ID: 0c:00.0 chip-ID: 10de:1b81
Display Server: x11 (X.Org 1.19.6 ) drivers: modesetting,nvidia (unloaded: fbdev,vesa,nouveau)
It it sees them as there and loaded? Does dkms somehow bypass lsmod etc?
sudo dkms autoinstall
should autoinstall all added drivers, …let’s hope for the best I guess.
(19:11:47/10958)~/$ sudo dkms autoinstall
Kernel preparation unnecessary for this kernel. Skipping...
applying patch disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
Hunk #1 succeeded at 85 (offset 14 lines).
Building module:
cleaning build area...
unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-72-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-72-generic/build LD=/usr/bin/ld.bfd modules......(bad exit status: 2)
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-dkms-460.0.crash'
Error! Bad return status for module build on kernel: 5.4.0-72-generic (x86_64)
Consult /var/lib/dkms/nvidia/460.73.01/build/make.log for more information.
The file is long, keys seems:
scripts/Makefile.build:269: recipe for target '/var/lib/dkms/nvidia/460.73.01/build/nvidia/nv.o' failed
make[2]: *** [/var/lib/dkms/nvidia/460.73.01/build/nvidia/nv.o] Error 1
Makefile:1754: recipe for target '/var/lib/dkms/nvidia/460.73.01/build' failed
make[1]: *** [/var/lib/dkms/nvidia/460.73.01/build] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-72-generic'
Makefile:80: recipe for target 'modules' failed
make: *** [modules] Error 2
DKMSKernelVersion: 5.4.0-72-generic
Date: Fri Apr 30 18:30:45 2021
DuplicateSignature: dkms:nvidia-dkms-460:460.73.01-0ubuntu1:/var/lib/dkms/nvidia/460.73.01/build/conftest/functions.h:11:2: error: #error acpi_walk_namespace() conftest failed!
Package: nvidia-dkms-460 460.73.01-0ubuntu1
PackageVersion: 460.73.01-0ubuntu1
SourcePackage: nvidia-graphics-drivers-460
Title: nvidia-dkms-460 460.73.01-0ubuntu1: nvidia kernel module failed to build
Smells like a driver/kernel support isse?
First result when googling dkms nvidia 460
is this: Can’t get nvidia 460 module to build on Ubuntu 20.04 to support two A100s - GPU Unix Graphics / Linux - NVIDIA Developer Forums
Please check if the build symlink to the headers for dkms exists:
ls /lib/modules/$(uname -r)/build
Otherwise, create it
ln -s /usr/src/linux-headers-$(uname -r) /lib/modules/$(uname -r)/build
Didn’t have it, created it, trying again, same error, deleted the previous log, full output is:
(19:19:54/10967)~/$ sudo dkms autoinstall
Kernel preparation unnecessary for this kernel. Skipping...
applying patch disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
Hunk #1 succeeded at 85 (offset 14 lines).
Building module:
cleaning build area...
unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-72-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-72-generic/build LD=/usr/bin/ld.bfd modules.......(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.4.0-72-generic (x86_64)
Consult /var/lib/dkms/nvidia/460.73.01/build/make.log for more information.
The file is full of what looks like syntax errors..?
This charming chinese website seems to imply gcc version is to blame: NVIDIA驱动出错:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver. Make sure t_sazass的博客-CSDN博客
(19:22:39/10974)~/$ cat /proc/version
Linux version 5.4.0-72-generic (buildd@lgw01-amd64-021) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #80~18.04.1-Ubuntu SMP Mon Apr 12 23:26:25 UTC 2021
sudo apt install gcc-8
sudo update-alternatives --config gcc
sudo update-alternatives --remove-all gcc
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 10
sudo update-alternatives --install /usr/bin/cc cc /usr/bin/gcc-8 10
Let’s retry dkms autoinstall:
(19:26:03/10981)~/$ sudo dkms autoinstall
Kernel preparation unnecessary for this kernel. Skipping...
applying patch disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
Hunk #1 succeeded at 85 (offset 14 lines).
Building module:
cleaning build area...
unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-72-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-72-generic/build LD=/usr/bin/ld.bfd modules...............
Signing module:
- /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia-modeset.ko
- /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia.ko
- /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia-uvm.ko
- /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia-drm.ko
Secure Boot not enabled on this system.
cleaning build area...
DKMS: build completed.
nvidia.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
nvidia-modeset.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
nvidia-drm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
nvidia-uvm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
depmod...
DKMS: install completed.
WOW. WOOOOOW. WOOOOOOOOOOOOOOOOOOOOOO
Without even restarting, after the first command my screen flashed and changed resolution a bit, BUT THEN IT WORKED
(19:34:17/10983)~/$ nvidia-smi
No devices were found
(19:34:20/10984)~/$ nvidia-smi
Fri Apr 30 19:34:22 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 On | 00000000:0C:00.0 Off | N/A |
| 0% 54C P0 37W / 151W | 7MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
All these attempts failed because the nvidia module in dkms couldn’t install because syntax errors because old gcc compiler version.
What could I have done differently? Why at no point did I see errors about the kernel module failing to build, where should I have looked for them? And why syntax errors instead of something checking the used gcc version and loudly failing when there was a mismatch? Why is that chinese website the only place I found this fix?
(19:42:57/10995)~/$ lsmod | grep nvidia
nvidia_uvm 1015808 0
nvidia_drm 57344 1
nvidia_modeset 1228800 1 nvidia_drm
nvidia 34123776 17 nvidia_uvm,nvidia_modeset
drm_kms_helper 188416 2 nvidia_drm,i915
drm 491520 15 drm_kms_helper,nvidia_drm,i915
Now let’s hope this survives a restart. And that it works when the eGPU is disconnected.
Following the readme, ran both options in separate terminals:
./data-science-stack list
./data-science-stack build-container
./data-science-stack run-container
and
./data-science-stack list
./data-science-stack build-conda-env
./data-science-stack run-jupyter
The latter seems to be installing CUDA and friends on my computer - didn’t expect it, but I need them either way I think, I guess I’ll let the script handle everything since it started. It installed conda to ~/conda/
, but again, not sure what I was expecting
Both running for 20+ minutes now
EDIT: ~/conda/ took 20gb filling up my drive, blocking everything, deleted it
The docker with jupyterlab - tensorflow can’t access the GPU, but pytorch can.
The NVIDIA eGPU tutorial3 continues with offloading Xorg to the GPU - do I want this? Can I use the GPU just for training, and leave Xorg running on the internal one? I probably don’t
As I remember from the last time, X doesn’t start when the GPU is connected at boot but everything’s fine when it gets connected after starting X. When it’s connected, it seems the driver gets loaded and nvidia-smi etc works. That the system works without the eGPU attached is nice! Plug-and-play is nice too.
Installed pytorch in a virtualenv, for cuda 11.1, test snippet says cuda works!
import torch
x = torch.rand(5, 3)
print(x)
torch.cuda.is_available()
Tensorflow:
>>> import tensorflow as tf
2021-04-30 21:36:12.984883: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> tf.debugging.set_log_device_placement(True)
>>> a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
2021-04-30 21:36:23.055614: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-30 21:36:23.058062: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-30 21:36:23.115366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-30 21:36:23.116510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0c:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.721GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2021-04-30 21:36:23.116553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-04-30 21:36:23.119974: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-04-30 21:36:23.120034: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-04-30 21:36:23.121503: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-30 21:36:23.121842: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-30 21:36:23.125037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-30 21:36:23.125803: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-04-30 21:36:23.125980: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-04-30 21:36:23.125996: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Which libcudnn?
Tensorflow’s tutorial (GPU support | TensorFlow) does this:
Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0
What is the version for CUDA 11.2? cuDNN Archive | NVIDIA Developer has download links. The one for 11.2 is called “cudnn-11.2-linux-x64-v8.1.1.33.tgz”. I plug those versions in, they exist and install fine:
sudo apt-get install libcudnn8=8.1.1.33-1+cuda11.2
sudo apt-get install libcudnn8-dev=8.1.1.33-1+cuda11.2
And tensorflow now works!
2021-04-30 21:42:46.176942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7440 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:0c:00.0, compute capability: 6.1)
I can’t believe it but wow. It’s finished, it works, X didn’t die, plug-and-play works, no manual driver loading.
All in all, including all the failed attempts, took 5:30h of pure time, according to my time tracking.
The only wrinkle is that X doesn’t start when turning the computer on with the eGPU attached, but I can 100% live with that!
How to Benchmark your GPU on Linux has a fun quote:
This tool is very old, very basic and only tests a small portion of today’s OpenGL capabilities. Back in the old days, it was used to determine if the proprietary driver was installed and running properly as open-source drivers were performing awfully enough to be perfectly noticeable during this test. Nowadays, you won’t notice any difference between the two
Added this to config.py:
config.bind('<Alt-P>', 'set-cmd-text -s :open -p ')
Seen in someone’s config.py on gitlab6:
for f in glob.glob(str(config.configdir / 'conf.d/*.py')):
config.source(str(os.path.relpath(f, start=config.configdir)))
Nice examples: i3_config/settings.d at master · kiddico/i3_config · GitHub
i3 doesn’t have any kind of include directive in the config files, sadly. i3 - Source/import file from i3wm config - Stack Overflow is one option:
bindsym $mod+Shift+c exec "cat ~/.config/i3/colors ~/.config/i3/base > ~/.config/i3/config && i3-msg reload"
A keybinding to overwrite the config file and restart i3 with a command.
This looks very interesting, I shouldn’t forget to go through this: Life Hacking His blog with personal examples: Alex Vermeer — Life-Hacking. Climbing. Striving for awesome. Coffee. — Page 2
A non-pdf description of Life Areas with questions and metrics for each.
(He’s the same guy who created the awesome How to Get Motivated: A Guide for Defeating Procrastination poster!)
And let’s remember the classic: Evidence-based advice on how to be successful in any job - 80,000 Hours
Two options I like:7
nohup cmd &
cmd & disown
I feel one of these will become part of many aliases of mine.
And short bash function from the same place:
function dos() {
# run_disowned and silenced
run_disowned "$@" 1>/dev/null 2>/dev/null
}
debian - What’s the right way to purge recursively with apt? - Unix & Linux Stack Exchange ↩︎
Accelerating Machine Learning on a Linux Laptop with an External GPU | NVIDIA Developer Blog ↩︎ ↩︎ ↩︎
~pvsr/dotfiles: qutebrowser/.config/qutebrowser/config.py - sourcehut git ↩︎
linux - How do I detach a process from Terminal, entirely? - Super User ↩︎
To read: PEP 8 – Style Guide for Python Code | Python.org
I should learn about the search syntax for jira tickets:
assignee = currentuser() and statusCategory != Done ORDER BY updated DESC
Following this: CUDA 10.1 installation on Ubuntu 18.04 LTS | Medium nope, errors
In the same github discussion about installing CUDA on ubuntu that I’ve been to twice this bit is mentioned: 1
The very very important thing is that never install “nvidia-driver-***” driver by yourself.
Required nvidia drivers are installed while doing
sudo apt install -y cuda=10.0.130-1
sudo apt remove --autoremove nvidia-*
doesn’t work as-is in zsh! *
gets interpreted as files in current directory. Explains my CUDA issues, everything seemed to work till I ran the above in a directory containing files with matching names that got helpfully shown.
sudo apt remove --autoremove nvidia-\*
is the answer.
(or 'nvidia-*'
)
Not the first time this bites me, at least the third, and all of them in the context of doing CUDA stuff.
“Es funktioniert fabelhaft” - heard at work
apt --fix-broken install
didn’t help as advertised, but removing all the broken packages together with sudo dpkg -P cuda-libraries-10-0 libnvidia-common-390
helped! After this removing/cleaning up everything else worked.
A lot of this mentioned changes to initramfs, I really hope I’ll be able to boot up next time :(
Also - if 90% of the tutorials about how to install $thing start with “Remove any traces of installs of $thing you have” it’s a nice sign that something’s shady.
docker logs 09348209840239
Option 1: hide the floating window:
for_window [title="^Skype$" floating] move scratchpad
Option 2:
Clever idea. Although, are you talking about the little window that can be disabled in Skype’s “Settings > Calling > Show call window when Skype is in the background”?
In search, before:Tomorrow
is a nice catch-all filter
Your system installations of CUDA and cudnn won’t be used, if you install PyTorch binaries with these libs. E.g.
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
will install CUDA 10.1 and cudnn in your current conda environment. 2
Nvidia drivers are needed on host machine, but not CUDA! 3
On TF’s official CUDA install page4, the bash listings (that are usually copypasted) contain the standard $
at the beginning, it’s visible, but not copypastable!
So, hopefully the last time today, as the previous couple of times I end up in the official TF tutorial4 about installing CUDA. Armed with the knowledge that:
Snippet:
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt-get update
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install TensorRT. Requires that libcudnn8 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
libnvinfer-dev=7.1.3-1+cuda11.0 \
libnvinfer-plugin7=7.1.3-1+cuda11.0
Done, no conflicts, no anything, worked better than most Medium tutorials I’ve read today.
# Reboot.
Let’s hope for the best.
UPD: no black screen, booted fine, but nvidia-smi
sees no driver.
sudo apt list --installed
shows all cuda stuff and nvidia driver to be installed:
nvidia-driver-465/unknown,unknown,now 465.19.01-0ubuntu1 amd64 [installed,automatic]
More worryingly, I see mentions of cuda-10-1 and cuda-11-1 together
I should use ps axf
instead of ps aux
, the former gives a nice tree representation
Yet another place that makes it look easy: CUDA Toolkit 11.0 Download | NVIDIA Developer
newgrp docker
has to be run from each cli you’ll be using docker from?.. Until you restartdocker run -d -p 80:80 docker/getting-started
docker stop
accepts the full name (distracted_perlman), but part of its container_id works!COPY
instruction from a Dockerfile copies the contents of the directory, but not the directory itself! 1journalctl
Logs take space (4gb on my box!). To see how much specifically journalctl does:2
journalctl --disk-usage
sudo journalctl --vacuum-time=3d
New -> Terminal. (Which you can use to access your docker running jupyter-notebook)
$ docker build -t dt2test -f ./docker/Dockerfile .
- passes the Dockerfile as explicit parameter, inside it paths are relative to the folder you run docker build
in.
For docker compose:
#docker-compose.yml
version: '3.3'
services:
yourservice:
build:
context: ./
dockerfile: ./docker/yourservice/Dockerfile
A lot of other nice options at Docker: adding a file from a parent directory - Stack Overflow