Posts

    Day 864

    zsh bracketed paste (don’t run command in terminal when pasting)

    Stop terminal auto executing when pasting a command - Ask Ubuntu:

    • If you copy a newline symbol at the end of whatever you are copying, it gets executed as expected
    • bracketed paste (enabled by default on zsh) disables this behaviour

    Had unset zle_bracketed_paste in zsh config, likely needed for athame that I don’t use. Removed it, works now.

    To enable in bash,

    echo "set enable-bracketed-paste" >> .inputrc
    

    I should make an eventual list of dotfiles I use for all remote servers, this will go there 100%.

    Docker COPY copies contents, not directory

    Docker COPY copies contents, not directory
    Docker COPY copies contents, not directory
    Docker COPY copies contents, not directory
    Docker COPY copies contents, not directory \

    kitty hint for IPs + python non-capturing (unnamed?) groups

    Added these to kitty config! One for IPs, second IPs+ports:

    map kitty_mod+n>i kitten hints --type regex --regex [0-9]+(?:\.[0-9]+){3} --program @
    map kitty_mod+n>p kitten hints --type regex --regex [0-9]+(?:\.[0-9]+){3}:[0-9]+ --program @
    

    Glad I can still read and understand regexes. The above highlight more than needed, but seems to be kitty’s problem.

    In python, a group without ?: is a non-capturing group in python (= not returned in .groups()). In kitty (that uses python syntax), only what’s inside the first capturing group is copied; making it non-capturing makes it copy the entire regex. 1


    Day 863

    Remapping a Thinkpad T580 Fn key to Ctrl

    The location of the Fn key on the laptop keyboard is absolutely idiotic and I hate it. Fn keys are usually handled by the hardware and ergo unusable. Now that I have to use the keyboard more, thought I have nothing to lose and tried xev and oh what a wonderful world it gets read as XF86WakeUp! Therefore it can be remapped to something more sensible. … like the Ctrl key it should be.

    Easiest way for me was adding this to autostart:

    xcape -e 'XF86WakeUp=Control_L' -d &
    

    No side effects of the other xcape command xcape -e 'Control_L=Escape' -t 100, it seems to be considered a different Control_L key and clicking it fast doesn’t produce Escape.


    Day 862

    Disable touchpad

    xinput set-prop 13 340 1, where 13 comes from xinput -list

    Dockefile RUN a lot of commands

    It’s possible to do this instead of prefixing each command with RUN:

    RUN apt-get update && \
        # install base packages
        apt-get install -y -qq apt-utils aptitude wget curl zip unzip sudo kmod git && \
        /usr/bin/python3 -m pip install --upgrade pip && \
    

    Day 861

    kitty hints

    Changed the hint I most often use to a better binding:

    # Copy url
    # map kitty_mod+n>c kitten hints --type path --program @
    map kitty_mod+g kitten hints --type path --program @
    

    Timewarrior

    • w track 1728 tag1 automatically ends it now`.
    • w continue just continues the last thing running by starting something identical starting “now” and continuing till stopped.

    kitty kittens

    kitty autocompletion

    In zshrc:

    autoload -Uz compinit
    compinit
    # Completion for kitty
    kitty + complete setup zsh | source /dev/stdin
    

    kitty scrollback pager

    From Feature Request: Ability to select text with the keyboard (vim-like) · Issue #719 · kovidgoyal/kitty · GitHub:

    scrollback_pager vim - -c 'w! /tmp/kitty_scrollback' -c 'term ++curwin cat /tmp/kitty_scrollback'
    

    Vim 8.0 works. Nice colorful etc.

    zsh vim mode timeout

    Zsh Vi Mode:

    Adding this allows to register the <Esc> key in 0.1 sec, not default 0.4.

    export KEYTIMEOUT=1
    

    A good documented vimrc

    A Good Vimrc - TODO

    I also love his design!

    zsh vim mode with objects!

    GitHub - softmoth/zsh-vim-mode: Friendly bindings for ZSH’s vi mode

    Out of all the various vim plugins, this is the only one I found that allows to meaningfully work with objects, like ci' etc. Also the mode indicator works very reliably.

    Doesn’t conflict with zsh-evil-registers.

    English / random

    • “expect and require”

    Day 860

    Qutebrowser crashing - again

    Ubuntu 18.04, qutebrowser etc, as usual. What helped was creating the environment with these options:

    python3 scripts/mkvenv.py --pyqt-version 5.14
    

    jq | less zsh alias

    Should’ve done this a long time ago:

    lq() {
        jq . "$1" -C | less
    }
    

    kitty terminal copy url

    From config; I should use them more.

    # Select a filename and copy it 
    map kitty_mod+p>c kitten hints --type path --program @
    #: Select a path/filename and open it with the default open program.
    map kitty_mod+p>o kitten hints --type line --program -
    

    update-alternatives & installing another gcc

    Nicely described: How to switch between multiple GCC and G++ compiler versions on Ubuntu 20.04 LTS Focal Fossa - LinuxConfig.org

    # install stuff
    $ sudo apt -y install gcc-7 g++-7 gcc-8 g++-8 gcc-9 g++-9
    # Add it to update-alternatives
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 7
    sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 7
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8
    sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 8
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9
    sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 9
    
    # choose the default one
    $ sudo update-alternatives --config gcc
    There are 3 choices for the alternative gcc (providing /usr/bin/gcc).
    
      Selection    Path            Priority   Status
    ------------------------------------------------------------
      0            /usr/bin/gcc-9   9         auto mode
      1            /usr/bin/gcc-7   7         manual mode
    * 2            /usr/bin/gcc-8   8         manual mode
      3            /usr/bin/gcc-9   9         manual mode
    Press  to keep the current choice[*], or type selection number:
    

    From the docs: --install link name path priority

    Python pip

    Editable installations (pip install -e .) are a thing. TODO - learn more about them.

    Qutebrowser config - adding bindings for tabs 20-30

    Given that the standard ones are not enough for me, and even my additional ones for 10-20 are not enough, added a third level:

    config.bind('1', 'tab-focus 1')
    config.bind('2', 'tab-focus 2')
    config.bind('3', 'tab-focus 3')
    config.bind('4', 'tab-focus 4')
    config.bind('5', 'tab-focus 5')
    config.bind('6', 'tab-focus 6')
    config.bind('7', 'tab-focus 7')
    config.bind('8', 'tab-focus 8')
    config.bind('9', 'tab-focus 9')
    config.bind('0', 'tab-focus 10')
    config.bind('<Alt-1>', 'tab-focus 11')
    config.bind('<Alt-2>', 'tab-focus 12')
    config.bind('<Alt-3>', 'tab-focus 13')
    config.bind('<Alt-4>', 'tab-focus 14')
    config.bind('<Alt-5>', 'tab-focus 15')
    config.bind('<Alt-6>', 'tab-focus 16')
    config.bind('<Alt-7>', 'tab-focus 17')
    config.bind('<Alt-8>', 'tab-focus 18')
    config.bind('<Alt-9>', 'tab-focus 19')
    config.bind('<Alt-0>', 'tab-focus 20')
    config.bind('<Alt-Ctrl-1>', 'tab-focus 21')
    config.bind('<Alt-Ctrl-2>', 'tab-focus 22')
    config.bind('<Alt-Ctrl-3>', 'tab-focus 23')
    config.bind('<Alt-Ctrl-4>', 'tab-focus 24')
    config.bind('<Alt-Ctrl-5>', 'tab-focus 25')
    config.bind('<Alt-Ctrl-6>', 'tab-focus 26')
    config.bind('<Alt-Ctrl-7>', 'tab-focus 27')
    config.bind('<Alt-Ctrl-8>', 'tab-focus 28')
    config.bind('<Alt-Ctrl-9>', 'tab-focus 29')
    config.bind('<Alt-Ctrl-0>', 'tab-focus -1')
    

    EDIT: Actually, to think of it, in for a penny, in for a pound!

    for i in range(30, 60):
        config.bind(','+str(i), 'tab-focus '+str(i))
    

    Takes about 9 seconds to :config-source everything, but then works like a charm! And doesn’t seem to make anything else slower (strangely, even startup is as usual).

    pycharm can parse markdown!

    Opened a README.md, and see it being rendered nicely to the left. I can also edit it directly. Wow.

    Website with references / cheat sheets for a lot of CLI programs

    sed Cheat Sheet - very down-to-earth, “praxisnah”, I like it. Except for the idiotic scrolling override animations

    jq basics - again

    jq Cheat Sheet

    • I should use ' for the filter, " for any string elements inside it

    • select
      • Get full record if it matches something
      • jq '.results[] | select(.name == "John") | {age}' # Get age for 'John'
    • Value VS key-value
      • jq '.something' gets the content of fields something removing the key
      • jq '. | {something}' gets key-value of something
      • Sample:
        $ jq '. | select(.tokens[0]=="Tel") | .tokens[]' mvs.json
        "Tel"
        ":"
        $ jq '. | select(.tokens[0]=="Tel") | .tokens' mvs.json
        [
        "Tel",
        ":"
        ]
        $ jq '. | select(.tokens[0]=="Tel") | {tokens}' mvs.json
        {
        "tokens": [
          "Tel",
          ":"
        ]
        }
        
    • |keys to extract keys only

    jq Cheet Sheet · GitHub also nice TIl that you don’t need jq '. | keys', jq 'keys' etc is enough.

    • `‘del(.tokens)’ to delete a key
    • Indexing works like in Python, say jq '.[-2:]'
    • 'sort_by(.foo)'

    I think now I’m ready for the holy of holies: jq 1.4 Manual

    • {user, title: .titles[]} will return an array of {user, title} for each value inside .titles[]!
    • Putting ()s around an expression means it’ll be evaluated. {(.user): .titles} will use the value of the key user!
      $  jq '. | {(.id): .id}' mvs.json
      {
      "7574": "7574"
      }
      
    • Putting values inside strings with \(foo)
      $ echo "[1,2,3]" | jq '"A string \(.)"'
      "A string [1,2,3]"
      

      It’s basically synonymous to python3’s f"My f-{string}"

    • '.a=23' will produce an output with .a being set to 23. Will be created if not there.
      • No “change” is being done, the actual value is the same; .a in the same filter after a comma will still return the old value.
    • |= will “update” the value by running its previous value through the expression:
      $ echo '{"one": 23,"two":2}' | jq '.one|=(. | tostring)'
      {
      "one": "23",
      "two": 2
      }
      
    • slurp mode - instead of returning objects, return a list of objects! For more ‘correct’ json.

    Python JSON parser + jq compact mode

    It didn’t read the jq-generated multi-line output without commas between items, but jq compact mode does one record (without comma and not as part of an array) per line, and this gets parsed correctly!

    JQ compact mode is jq -c '.' sth.json

    Before:

    {
      "id": "7575",
      "ner_tags": [
        "6",
        "6"
      ],
      "tokens": [
        "Tel",
        ":"
      ]
    }
    

    After:

    {"id":"7575","ner_tags":["6","6"],"tokens":["Tel",":"]}
    

    Linux - creating a directory accessible to multiple users via a group

    How to Create a Shared Directory for All Users in Linux

    # Create the group
    $sudo groupadd project 
    # Add user to this group
    $sudo usermod -a -G project theuser
    # Change the group of the directory
    $ sudo chgrp -R project /var/www/reports/
    # Turn on the `setGID` bit, so newly created subfiles inherit the same group as the directory
    # And rwxrwx-rx
    $ sudo chmod -R 2775 /var/www/reports/
    

    Day 856

    Presenting stuff

    “Which story do you want to tell?” (Heard at work, from R)

    Git get commit message from file

    git commit -F filename allows to use a pre-written commit message from a textfile.


    Day 855

    i3 scratchpads magic!

    You can ‘mark’ windows1, a la vim, and then use that as filter - no window classes etc needed - for example, for scratchpads!2

    So now I have two scratchpads in i3 config:

    bindsym $ms+Shift+plus mark "scratch2", move scratchpad
    bindsym $ms+plus [con_mark="scratch2"]  scratchpad show
    
    bindsym $ms+Shift+minus mark "scratch", move scratchpad
    bindsym $ms+minus [con_mark="scratch"]  scratchpad show
    

    The second one originally was meant to be for Ding, but it’s really nice to have it flexible.


    Day 854

    English

    Reading “German: An Essential Grammar” by Donaldson found this bit: 1

    English has a rule that if the time of an event that \ occurred in the past is mentioned, then the imperfect must be used, but if \ the time is omitted, the perfect is required, e.g. \

    • He returned from Hamburg yesterday.
    • He has returned from Hamburg.
    • He has returned from Hamburg yesterday. (not grammatical)

    TIL.

    zsh detach and disown

    zsh-specific - to detach & disown a process, there’s &!: 2

    dolphin &!
    

    German / Deutsch

    Long question and answer about fahren zu/nach/in/…: Richtungen und Ziele

    German FSI language courses

    The Yojik Website has the FSI courses FSI Languages Courses and the website as I remember it.

    Taskwarrior

    Changed ~/.taskrc to show any active tasks regardless of anything else in my sprint view:

    s () {task s \(project:w or \(sprint:$SPRINT \(+A or +O\)\) or +ACTIVE\) "$*"}
    

    Turn off screen/monitor with xset

    Standard lock command leaves both monitors on.

    Reddit3 mentioned two commands:

    xset s activate
    xset dpms force off
    

    The second one worked for me!

    Now I have shiny new screen lock (and suspend too, while we are at it) keybinding in i3 config!

    bindsym $ms+n exec gnome-screensaver-command -l && xset dpms force off
    bindsym $ms+Shift+n exec i3lock -i ~/s/black_lock.png -t -p win -e && systemctl suspend -i
    

    Day 853

    Nvidia Docker images

    Nvidia has a repo of all docker images it creates, one of them: Torch | NVIDIA NGC

    German

    “Das finde ich zielführender als…” - heard at work

    Docker - automatically assign a free port

    docker run --name frontend -p 0:80 frontend:latest1

    Port 0 gets passed to the kernel that assigns any free port.

    To see which one, docker port somecontainer.

    Docker run container on specific GPU

    docker run --gpus device=3 -e NVIDIA_VISIBLE_DEVICES=0 -e CUDA_VISIBLE_DEVICES=0 myservice
    

    Where the device=3 is the GPU id on the host that we want to use.


    Day 850

    grep ignore case

    lspci | grep -i "nvidia"

    -i == ‘ignore case’ is actually something that I can remember.

    Docker (stop) autostart of container

    Docker will autostart any container with a RestartPolicy of ‘always’ when the docker service initially starts. 1

    I can set/unset it in kitematic, or through terminal:

    docker update --restart=no my-container
    

    apt-get purge remove –autoremove etc

    Quoting SO: 2

        apt purge --auto-remove <packagename>
    

    purges packagename and any packages which are rendered unnecessary by its removal, as well as any other packages which aren’t necessary.

        apt autoremove --purge
    

    purges any packages which aren’t necessary (marked as “automatically installed” and with no dependent packages).

    The first form is what you’d use when manipulating individual packages; the latter is a clean-up operation across all packages.

    Ways to clean up with apt-get - tutorial

    This seems nice, TODO: Cleaning up with apt-get | Network World

    Backing up LVM disk encryption keys

    LVM - Debian Wiki is nice and readable. I used this command to backup the headers:

     sudo cryptsetup luksHeaderBackup /dev/nvmeXXXXX   --header-backup-file headerBackupFile
    

    … and put it somewhere not on the drive I’ll be recovering if it all goes wrong.

    Setting up Tensorflow and CUDA with an eGPU

    Aaaand the saga continues!

    …since the GPU is an eGPU, apparently I do need to do the harder way: Accelerating Machine Learning on a Linux Laptop with an External GPU | NVIDIA Developer Blog

    Getting the eGPU detected

    It is, I can see it:

    (17:42:42/10815)~/$ lspci | grep -i VGA
    00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
    0c:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
    

    but if it wasn’t, I’d authorize it and check with boltctl list:

    (17:43:13/10817)~/$ boltctl list
    [...]
     ● GIGABYTE GV-N1070IXEB-8GD
       ├─ type:          peripheral
       ├─ name:          GV-N1070IXEB-8GD
       ├─ vendor:        GIGABYTE
       ├─ uuid:          # redacted
       ├─ status:        authorized
       │  ├─ domain:     domain0
       │  └─ authflags:  none
       ├─ authorized:    Do 29 Apr 2021 07:57:37 UTC
       ├─ connected:     Do 29 Apr 2021 07:57:37 UTC
       └─ stored:        no
    

    How to setup an eGPU on Ubuntu for TensorFlow describes other things that can go wrong:

    I had to disable the following, otherwise my eGPU was not detected:

    • Secure Boot
    • Thunderbolt Security Level

    From this point on, I follow Nvidia’s tutorial 3 unless stated otherwise.

    Purging, cleaning up old broken install attempts, updating and upgrading

    Using quotes means the * doesn’t have to be escaped.

    sudo apt-get purge "nvidia*"
    

    This is a fuller example: 4

    sudo rm /etc/apt/sources.list.d/cuda*
    sudo apt remove --autoremove nvidia-cuda-toolkit
    sudo apt remove --autoremove nvidia-*
    

    Found and manually removed /etc/apt/sources.list.d/graphics-drivers-ubuntu-ppa-bionic.list, leaving the .save file in place.

    As per nvidia’s guide,

    sudo apt-get update
    sudo apt-get dist-upgrade
    

    To be safe, rebooted.

    Downloading the correct drivers

    The existing driver is most likely Nouveau, an open-source driver for NVIDIA GPUs. Because Nouveau doesn’t support eGPU setups, install the NVIDIA CUDA and NVIDIA drivers instead. You must also stop the kernel from loading Nouveau. 3

    okay!

    Change of plan - what is NVIDIA data-science-stack?

    Found this: NVIDIA/data-science-stack: NVIDIA Data Science stack tools Read about it here: Ubuntu for machine learning with NVIDIA RAPIDS in 10 min | Ubuntu

    Official by nvidia, and seems to do automatically what’s needed for supported systems. Let’s run a script from the internet that installs drivers, loads kernel modules etc.

    Source is available, yay for open source: data-science-stack/data-science-stack at master · NVIDIA/data-science-stack

    Ran ./data-science-stack setup-system - uses sudo, didn’t ask for root or anything.o

    Seems to have installed nvidia driver version 460. Asked to reboot at the end.

    Rebooted.

    (18:40:30/10909)~/$ nvidia-smi
    NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
    

    okay. Same results I had. Confirms that my prev. steps weren’t wronger than the script.

    (18:41:49/10910)~/$ sudo apt list --installed | grep "\(cuda\|nvidia\)"
    
    WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
    
    libnccl2/unknown,now 2.9.6-1+cuda11.3 amd64 [installed]
    libnvidia-cfg1-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    libnvidia-common-460/unknown,now 460.73.01-0ubuntu1 all [installed,automatic]
    libnvidia-compute-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    libnvidia-container-tools/bionic,now 1.4.0-1 amd64 [installed,automatic]
    libnvidia-container1/bionic,now 1.4.0-1 amd64 [installed,automatic]
    libnvidia-decode-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    libnvidia-encode-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    libnvidia-extra-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    libnvidia-fbc1-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    libnvidia-gl-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    libnvidia-ifr1-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    nvidia-compute-utils-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    nvidia-container-runtime/bionic,now 3.5.0-1 amd64 [installed,automatic]
    nvidia-container-toolkit/bionic,now 1.5.0-1 amd64 [installed,automatic]
    nvidia-dkms-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    nvidia-docker2/bionic,now 2.6.0-1 all [installed]
    nvidia-driver-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed]
    nvidia-kernel-common-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    nvidia-kernel-source-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    nvidia-prime/bionic-updates,bionic-updates,now 0.8.16~0.18.04.1 all [installed,automatic]
    nvidia-settings/unknown,unknown,now 465.19.01-0ubuntu1 amd64 [installed,automatic]
    nvidia-utils-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    xserver-xorg-video-nvidia-460/unknown,now 460.73.01-0ubuntu1 amd64 [installed,automatic]
    

    Also, as usual,

    (18:48:34/10919)~/$ lsmod | grep nvi
    (18:48:37/10920)~/$
    

    lspci -k shows the kernel modules:

    0c:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
            Subsystem: Gigabyte Technology Co., Ltd GP104 [GeForce GTX 1070]
            Kernel modules: nvidiafb, nouveau
    

    This output implies no nvidia driver is installed on my system5. …though it is.

    $ nvidia-settings --version
    nvidia-settings:  version 465.19.01
    

    software-properties-gtk tells me I’m using the proprietary nvidia-driver-460, not 465

    In any case, can’t blacklist nouveau as still there are no ubuntu kernel modules.

    BUT!

    (19:04:04/10946)~/$ dkms status
    nvidia, 460.73.01: added
    

    Also, inxi -Fxxxrz (found somewhere on the internet):

    Graphics:  Card-1: Intel UHD Graphics 620 bus-ID: 00:02.0 chip-ID: 8086:5917
               Card-2: NVIDIA GP104 [GeForce GTX 1070] bus-ID: 0c:00.0 chip-ID: 10de:1b81
               Display Server: x11 (X.Org 1.19.6 ) drivers: modesetting,nvidia (unloaded: fbdev,vesa,nouveau)
    

    It it sees them as there and loaded? Does dkms somehow bypass lsmod etc?

    sudo dkms autoinstall should autoinstall all added drivers, …let’s hope for the best I guess.

    (19:11:47/10958)~/$ sudo dkms autoinstall
    
    Kernel preparation unnecessary for this kernel.  Skipping...
    applying patch disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
    Hunk #1 succeeded at 85 (offset 14 lines).
    
    
    Building module:
    cleaning build area...
    unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-72-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-72-generic/build LD=/usr/bin/ld.bfd modules......(bad exit status: 2)
    ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-dkms-460.0.crash'
    Error! Bad return status for module build on kernel: 5.4.0-72-generic (x86_64)
    Consult /var/lib/dkms/nvidia/460.73.01/build/make.log for more information.
    

    The file is long, keys seems:

     scripts/Makefile.build:269: recipe for target '/var/lib/dkms/nvidia/460.73.01/build/nvidia/nv.o' failed
     make[2]: *** [/var/lib/dkms/nvidia/460.73.01/build/nvidia/nv.o] Error 1
     Makefile:1754: recipe for target '/var/lib/dkms/nvidia/460.73.01/build' failed
     make[1]: *** [/var/lib/dkms/nvidia/460.73.01/build] Error 2
     make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-72-generic'
     Makefile:80: recipe for target 'modules' failed
     make: *** [modules] Error 2
    DKMSKernelVersion: 5.4.0-72-generic
    Date: Fri Apr 30 18:30:45 2021
    DuplicateSignature: dkms:nvidia-dkms-460:460.73.01-0ubuntu1:/var/lib/dkms/nvidia/460.73.01/build/conftest/functions.h:11:2: error: #error acpi_walk_namespace() conftest failed!
    Package: nvidia-dkms-460 460.73.01-0ubuntu1
    PackageVersion: 460.73.01-0ubuntu1
    SourcePackage: nvidia-graphics-drivers-460
    Title: nvidia-dkms-460 460.73.01-0ubuntu1: nvidia kernel module failed to build
    

    Smells like a driver/kernel support isse?

    First result when googling dkms nvidia 460 is this: Can’t get nvidia 460 module to build on Ubuntu 20.04 to support two A100s - GPU Unix Graphics / Linux - NVIDIA Developer Forums

    Please check if the build symlink to the headers for dkms exists:

    ls /lib/modules/$(uname -r)/build
    

    Otherwise, create it

    ln -s /usr/src/linux-headers-$(uname -r)  /lib/modules/$(uname -r)/build
    

    Didn’t have it, created it, trying again, same error, deleted the previous log, full output is:

    (19:19:54/10967)~/$ sudo dkms autoinstall
    
    Kernel preparation unnecessary for this kernel.  Skipping...
    applying patch disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
    Hunk #1 succeeded at 85 (offset 14 lines).
    
    
    Building module:
    cleaning build area...
    unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-72-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-72-generic/build LD=/usr/bin/ld.bfd modules.......(bad exit status: 2)
    Error! Bad return status for module build on kernel: 5.4.0-72-generic (x86_64)
    Consult /var/lib/dkms/nvidia/460.73.01/build/make.log for more information.
    

    The file is full of what looks like syntax errors..?

    This charming chinese website seems to imply gcc version is to blame: NVIDIA驱动出错:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver. Make sure t_sazass的博客-CSDN博客

    (19:22:39/10974)~/$ cat /proc/version
    Linux version 5.4.0-72-generic (buildd@lgw01-amd64-021) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #80~18.04.1-Ubuntu SMP Mon Apr 12 23:26:25 UTC 2021
    
    sudo apt install gcc-8
    sudo update-alternatives --config gcc
    sudo update-alternatives --remove-all gcc
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 10
    sudo update-alternatives --install /usr/bin/cc cc /usr/bin/gcc-8 10
    

    Let’s retry dkms autoinstall:

    (19:26:03/10981)~/$ sudo dkms autoinstall
    
    Kernel preparation unnecessary for this kernel.  Skipping...
    applying patch disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
    Hunk #1 succeeded at 85 (offset 14 lines).
    
    
    Building module:
    cleaning build area...
    unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-72-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-72-generic/build LD=/usr/bin/ld.bfd modules...............
    Signing module:
     - /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia-modeset.ko
     - /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia.ko
     - /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia-uvm.ko
     - /var/lib/dkms/nvidia/460.73.01/5.4.0-72-generic/x86_64/module/nvidia-drm.ko
    Secure Boot not enabled on this system.
    cleaning build area...
    
    DKMS: build completed.
    
    nvidia.ko:
    Running module version sanity check.
     - Original module
       - No original module exists within this kernel
     - Installation
       - Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
    
    nvidia-modeset.ko:
    Running module version sanity check.
     - Original module
       - No original module exists within this kernel
     - Installation
       - Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
    
    nvidia-drm.ko:
    Running module version sanity check.
     - Original module
       - No original module exists within this kernel
     - Installation
       - Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
    
    nvidia-uvm.ko:
    Running module version sanity check.
     - Original module
       - No original module exists within this kernel
     - Installation
       - Installing to /lib/modules/5.4.0-72-generic/updates/dkms/
    
    depmod...
    
    DKMS: install completed.
    

    WOW. WOOOOOW. WOOOOOOOOOOOOOOOOOOOOOO

    Without even restarting, after the first command my screen flashed and changed resolution a bit, BUT THEN IT WORKED

    (19:34:17/10983)~/$ nvidia-smi
    No devices were found
    (19:34:20/10984)~/$ nvidia-smi
    Fri Apr 30 19:34:22 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 1070    On   | 00000000:0C:00.0 Off |                  N/A |
    |  0%   54C    P0    37W / 151W |      7MiB /  8119MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    

    All these attempts failed because the nvidia module in dkms couldn’t install because syntax errors because old gcc compiler version.

    What could I have done differently? Why at no point did I see errors about the kernel module failing to build, where should I have looked for them? And why syntax errors instead of something checking the used gcc version and loudly failing when there was a mismatch? Why is that chinese website the only place I found this fix?

    (19:42:57/10995)~/$ lsmod | grep nvidia
    nvidia_uvm           1015808  0
    nvidia_drm             57344  1
    nvidia_modeset       1228800  1 nvidia_drm
    nvidia              34123776  17 nvidia_uvm,nvidia_modeset
    drm_kms_helper        188416  2 nvidia_drm,i915
    drm                   491520  15 drm_kms_helper,nvidia_drm,i915
    

    Now let’s hope this survives a restart. And that it works when the eGPU is disconnected.

    NVIDIA data-science-stack

    Following the readme, ran both options in separate terminals:

    ./data-science-stack list
    ./data-science-stack build-container
    ./data-science-stack run-container
    

    and

    ./data-science-stack list
    ./data-science-stack build-conda-env
    ./data-science-stack run-jupyter
    

    The latter seems to be installing CUDA and friends on my computer - didn’t expect it, but I need them either way I think, I guess I’ll let the script handle everything since it started. It installed conda to ~/conda/, but again, not sure what I was expecting

    Both running for 20+ minutes now

    EDIT: ~/conda/ took 20gb filling up my drive, blocking everything, deleted it

    The docker with jupyterlab - tensorflow can’t access the GPU, but pytorch can.

    Carrying on with setting the eGPU up

    The NVIDIA eGPU tutorial3 continues with offloading Xorg to the GPU - do I want this? Can I use the GPU just for training, and leave Xorg running on the internal one? I probably don’t

    Restarting and testing

    As I remember from the last time, X doesn’t start when the GPU is connected at boot but everything’s fine when it gets connected after starting X. When it’s connected, it seems the driver gets loaded and nvidia-smi etc works. That the system works without the eGPU attached is nice! Plug-and-play is nice too.

    Installed pytorch in a virtualenv, for cuda 11.1, test snippet says cuda works!

    import torch
    x = torch.rand(5, 3)
    print(x)
    
    torch.cuda.is_available()
    

    Tensorflow:

    >>> import tensorflow as tf
    2021-04-30 21:36:12.984883: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    >>> tf.debugging.set_log_device_placement(True)
    >>> a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    2021-04-30 21:36:23.055614: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
    2021-04-30 21:36:23.058062: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
    2021-04-30 21:36:23.115366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-04-30 21:36:23.116510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
    pciBusID: 0000:0c:00.0 name: GeForce GTX 1070 computeCapability: 6.1
    coreClock: 1.721GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
    2021-04-30 21:36:23.116553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    2021-04-30 21:36:23.119974: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
    2021-04-30 21:36:23.120034: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
    2021-04-30 21:36:23.121503: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
    2021-04-30 21:36:23.121842: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
    2021-04-30 21:36:23.125037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
    2021-04-30 21:36:23.125803: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
    2021-04-30 21:36:23.125980: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
    2021-04-30 21:36:23.125996: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
    

    Which libcudnn?

    Tensorflow’s tutorial (GPU support  |  TensorFlow) does this:

    Install development and runtime libraries (~4GB)
    sudo apt-get install --no-install-recommends \
        cuda-11-0 \
        libcudnn8=8.0.4.30-1+cuda11.0  \
        libcudnn8-dev=8.0.4.30-1+cuda11.0
    

    What is the version for CUDA 11.2? cuDNN Archive | NVIDIA Developer has download links. The one for 11.2 is called “cudnn-11.2-linux-x64-v8.1.1.33.tgz”. I plug those versions in, they exist and install fine:

    sudo apt-get install   libcudnn8=8.1.1.33-1+cuda11.2
    sudo apt-get install   libcudnn8-dev=8.1.1.33-1+cuda11.2
    

    And tensorflow now works!

    2021-04-30 21:42:46.176942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7440 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:0c:00.0, compute capability: 6.1)
    

    I can’t believe it but wow. It’s finished, it works, X didn’t die, plug-and-play works, no manual driver loading.

    All in all, including all the failed attempts, took 5:30h of pure time, according to my time tracking.

    The only wrinkle is that X doesn’t start when turning the computer on with the eGPU attached, but I can 100% live with that!

    GPU benchmarking linux

    How to Benchmark your GPU on Linux has a fun quote:

    This tool is very old, very basic and only tests a small portion of today’s OpenGL capabilities. Back in the old days, it was used to determine if the proprietary driver was installed and running properly as open-source drivers were performing awfully enough to be perfectly noticeable during this test. Nowadays, you won’t notice any difference between the two

    qutebrowser open a private window

    Added this to config.py:

    config.bind('<Alt-P>', 'set-cmd-text -s :open -p ')
    

    Managing dotfiles with machine-specific configuration

    Qutebrowser import other config files

    Seen in someone’s config.py on gitlab6:

    for f in glob.glob(str(config.configdir / 'conf.d/*.py')):
        config.source(str(os.path.relpath(f, start=config.configdir)))
    

    Random i3 configs

    Nice examples: i3_config/settings.d at master · kiddico/i3_config · GitHub

    i3 doesn’t have any kind of include directive in the config files, sadly. i3 - Source/import file from i3wm config - Stack Overflow is one option:

    bindsym $mod+Shift+c exec "cat ~/.config/i3/colors ~/.config/i3/base > ~/.config/i3/config && i3-msg reload"
    

    A keybinding to overwrite the config file and restart i3 with a command.

    To read - life hacking

    This looks very interesting, I shouldn’t forget to go through this: Life Hacking His blog with personal examples: Alex Vermeer — Life-Hacking. Climbing. Striving for awesome. Coffee. — Page 2

    A non-pdf description of Life Areas with questions and metrics for each.

    (He’s the same guy who created the awesome How to Get Motivated: A Guide for Defeating Procrastination poster!)

    And let’s remember the classic: Evidence-based advice on how to be successful in any job - 80,000 Hours

    Detach process completely from terminal

    Two options I like:7

    • nohup cmd &
    • cmd & disown

    I feel one of these will become part of many aliases of mine.

    And short bash function from the same place:

    function dos() {
        # run_disowned and silenced
    
        run_disowned "$@" 1>/dev/null 2>/dev/null
    }
    

    Day 849

    Day 848

    Installing CUDA and pytorch and tensorflow

    Following this: CUDA 10.1 installation on Ubuntu 18.04 LTS | Medium nope, errors

    Read more...

    Day 847

    Docker stuff

    • Making it run as non-root: Post-installation steps for Linux | Docker Documentation
      • newgrp docker has to be run from each cli you’ll be using docker from?.. Until you restart
    • Best tutorial ever can be started with: docker run -d -p 80:80 docker/getting-started
      • It will start as docker image
      • Very readable and step-by-step
    • Docker compose
    • Random docker stop accepts the full name (distracted_perlman), but part of its container_id works!
    • Unintuitively, the COPY instruction from a Dockerfile copies the contents of the directory, but not the directory itself! 1
    Read more...

    Day 843

    Python dataclasses

    Read more...

    Day 842

    Jira old issue view + qutebrowser config setting

    To redirect an issue to the old view, add ?oldIssueView=true.

    Read more...

    Day 841

    Deutsch / German

    “Meetingtourismus oder Papiergenerieren?” (heard at work)

    Read more...

    Day 840

    Patterns / phrases / Random

    • “It’s not a solution, but it’s an approach” - heard at work, VF

    Day 839

    vim delete all lines not matching pattern

    I’ll memorize the g/... syntax someday.

    Read more...

    Day 838

    Pizza sauce recipes

    I should try doing something more interesting with the passata di pomodoro!

    Read more...

    Day 836

    Deutsch

    die Kaffeesatzleserei - reading in coffee beans (heard at work)

    Read more...


Subscribe