Diensttagebuch

Day 847 (27 Apr 2021)

Day 847
Docker stuff
- Making it run as non-root: Post-installation steps for Linux | Docker Documentation
  - newgrp docker has to be run from each cli you’ll be using docker from?.. Until you restart
- Best tutorial ever can be started with: docker run -d -p 80:80 docker/getting-started
  - It will start as docker image
  - Very readable and step-by-step
- Docker compose
  - Get started with Docker Compose | Docker Documentation
- Random docker stop accepts the full name (distracted_perlman), but part of its container_id works!
- Unintuitively, the COPY instruction from a Dockerfile copies the contents of the directory, but not the directory itself! ¹
Clean up journalctl

Logs take space (4gb on my box!). To see how much specifically journalctl does:²
```
journalctl --disk-usage
sudo journalctl --vacuum-time=3d
```
Jupyter notebooks has terminals!

New -> Terminal. (Which you can use to access your docker running jupyter-notebook)

Docker build contexts and relative paths

$ docker build -t dt2test -f ./docker/Dockerfile . - passes the Dockerfile as explicit parameter, inside it paths are relative to the folder you run docker build in.

For docker compose:
```
#docker-compose.yml
version: '3.3'    
services:
      yourservice:
        build:
          context: ./
          dockerfile: ./docker/yourservice/Dockerfile
```
A lot of other nice options at Docker: adding a file from a parent directory - Stack Overflow
1. Dockerfile reference | Docker Documentation: ↩︎
2. 7 Simple Ways to Free Up Space on Ubuntu and Linux Mint - It’s FOSS ↩︎
Day 843 (23 Apr 2021)

Day 843
Python dataclasses
- This module provides a decorator and functions for automatically adding generated special methods such as init() and repr() to user-defined classes. It
- dataclasses — Data Classes — Python 3.9.4 documentation
HuggingFace

“Token classification” includes but is not limited to NER: Hugging Face – The AI community building the future.. Really nice new correct phrase I’ll be using!

Installing (after tensorflow and/or pytorch):
```
pip install transformers
```
Caches by default in user folder but can be overridden:
```
export HF_HOME="/data/sh/experiments/bert/cache" 
```
The “hosted inference API” on the website is really cool! dslim/bert-base-NER · Hugging Face

Example of converting conll dataset to what BERT expects: Fine Tuning BERT for NER on CoNLL 2003 dataset with TF 2.0 | by Bhuvana Kundumani | Analytics Vidhya | Medium

The BERT model documentation shows the tokenizers etc etc etc. - BERT — transformers 4.5.0.dev0 documentation
- Training and fine-tuning — transformers 4.5.0.dev0 documentation - same model can be trained/imported from TF to pytorch and back! Wow!
- Documentation of a sample model: transformers/examples/research_projects/distillation at master · huggingface/transformers
  - It has examples of preparing data for finetuning
  - In general, HF’s examples are wonderful
- Another example of fine-tuning BERT in Pytorch for NER: transformers/examples/pytorch/token-classification at master · huggingface/transformers
  - Needs transformers installed from source (git/master): https://huggingface.co/transformers/installation.html#installing-from-source / pip install git+https://github.com/huggingface/transformers
  - Trained in 37 minutes, wrote everything to /tmp/test-ner/, checkpoints, eval data. Wow.
  - Command used was:
  CUDA_VISIBLE_DEVICES=1; python run_ner.py --model_name_or_path bert-base-uncased --dataset_name conll2003 --output_dir /tmp/test-ner --do_train --do_eval
  - I could train it on the Germaner (explore) dataset, it has no validation set and the script doesn’t support automagically splitting using CLI parameters (though the script supports a lot of other flags).
python datasets package

Here datasets is imported: transformers/requirements.txt at master · huggingface/transformers

TODO - what is this and where can I learn more? Is this HF specific? What else is there?

HuggingFace datasets

It has a really nice interface for searching datasets! Filter by task, language, etc.

German NER datasets: Hugging Face – The AI community building the future.

Some German NER models, sometimes based on bert: Hugging Face – The AI community building the future.
- I could try to reproduce them, for example this one using BERT-base-german-cased and finetuned on legal entity recognition: mrm8488/bert-base-german-finetuned-ler · Hugging Face
Huggingface converting between tf and pytorch

Converting Tensorflow Checkpoints — transformers 4.5.0.dev0 documentation

Is this real?
```
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12

transformers-cli convert --model_type bert \
  --tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
  --config $BERT_BASE_DIR/bert_config.json \
  --pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
```
Random / recipes / cooking

Tatar von geräuchertem Forellenfilet mit Avocado - Annemarie Wildeisens KOCHEN

Die Forellenfilets in kleine Würfelchen schneiden. Die Schalotte schälen und sehr fein hacken. Die Cherrytomaten je in 6 oder 8 Stücke schneiden. Alle diese Zutaten in eine kleine Schüssel geben und sorgfältig mit der Mayonnaise mischen.

Forelle + tomatos + mayonnaise is literally the only recipe I’ve liked with mayonnaise in it
Day 842 (22 Apr 2021)

Day 842
Jira old issue view + qutebrowser config setting

To redirect an issue to the old view, add ?oldIssueView=true.

Added this to config.py:
```
config.bind('<Ctrl-J>', ':open {url}?oldIssueView=true')
```
Ubuntu screen apt-get
```
(18:03:38/10185) sudo apt install screen
# ...
Suggested packages:
  byobu | screenie | iselect
The following NEW packages will be installed:
```
… did I just get an advert for a competitor when installing screen? :) Since when does ubuntu do this and where can I read more about it?
Day 841 (21 Apr 2021)

Day 841
Deutsch / German

“Meetingtourismus oder Papiergenerieren?” (heard at work)

Qutebrowser userscripts

It seems to run userscripts not in the virtualenv qutebrowser uses, but the standard system one? Installing packages in virtualenv didn’t work, but installing them globally did.

DVC

Moving/renaming a file/directory is easy: dvc move from to¹. Automatically updates the from.dvc files. Then .gitignore and the .dvc file have to be added and committed through git as usual.

Data organization / dataset structure link

This is interesting: Data Organization — documentation

In general: Best Practices for Scientific Data Management — documentation

This guide describes Axiom Data Science’s best practices for scientific data management. The intent of these practices is to improve the accessibility and usability of your data. These practices may be followed at any time during the preparation of your dataset, but are most useful when considered at the onset of project planning and implemented during data collection.

Also related: Organising your data | Research Data Management

Tree output only directories

tree -d does it.

Git paths from root of repo

Root of repo: git rev-parse --show-prefix ²

--git-dir returns the location of the .git folder, and --show-toplevel returns the absolute location of the git root.
1. move | Data Version Control · DVC ↩︎
2. bash - How to get the path of the current directory relative to root of the git repository? - Stack Overflow ↩︎
Day 840 (20 Apr 2021)

Day 840
Patterns / phrases / Random
- “It’s not a solution, but it’s an approach” - heard at work, VF
Day 839 (19 Apr 2021)

Day 839
vim delete all lines not matching pattern

I’ll memorize the g/... syntax someday.
```
:g!/pattern/d
```
I can just look for the pattern as usual with /pattern and tweak it live, then do
```
:g!//d
```
and it will atke the last used pattern.
Day 838 (18 Apr 2021)

Day 838
Pizza sauce recipes

I should try doing something more interesting with the passata di pomodoro!

Options:
In general all seem to require both tomato puree and chopped tomatoes; and olive oil + garlic + oregano/basil + (brown) sugar seems to cover 90% of cases.
Day 836 (16 Apr 2021)

Day 836
Deutsch

die Kaffeesatzleserei - reading in coffee beans (heard at work)

screen attaching screens without full name

I shouldn’t forget that screen -R screenname can be replaced by screen -R s if it’s the only screen with such a name. Not sure if better or worse than tab completion, likely worse because it’s surprising, but quite nice to use.

Logoff i3 with a CLI

i3-msg exit¹ does the magic.

Blocking ips with ipset
- ipset - ArchWiki
```
ipset -N myset nethash  # create myset
ipset add myset 27.8.0.0/13 
iptables -I INPUT -m set --match-set myset src -j DROP # create temporary iptables thing

# making it persistent

ipset save > /etc/ipset.conf

# then enable ipset services

# Listing stuff
ipset -L

# Deleting set
ipset destroy myset
```
iptables basics

If you can’t destroy an ipset set because it’s being used by kernel:

iptables -L --line-numbers returns this:
```
Chain INPUT (policy DROP)
num  target     prot opt source               destination
1    DROP       all  --  anywhere             anywhere             match-set myset src
...
```
Then to delete number 1:
```
iptables -D INPUT 1
```
Generally blocking countries

GitHub - mkorthof/ipset-country: Block countries using iptables + ipset + ipdeny.com can do both a whitelist and a blacklist.
1. How do i suspend,lockscreen and logout? - i3 FAQ ↩︎
Day 835 (15 Apr 2021)

Day 835
Data Scientist roadmap/curriculum

Article with a very interesting graph: Becoming a Data Scientist - Curriculum via Metromap – Pragmatic Perspectives

{:height=“500px”}

German / Deutsch
- “Die Prioritäten sind ein bißchen volatil geworden”
- “Sammle von XY Team ein bißchen Stimmung”
Day 832 (12 Apr 2021)

Day 832
German

der Tonus - heard at work in context of

JQ producing nice comma-separated json

Option to return objects as a list of objects (separated by a comma) · Issue #124 · stedolan/jq: TL;DR use jq "[foo]" instead of jq "foo".

Day 847 (27 Apr 2021)

Docker stuff

Clean up journalctl

Jupyter notebooks has terminals!

Docker build contexts and relative paths

Day 843 (23 Apr 2021)

Python dataclasses

HuggingFace

python datasets package

HuggingFace datasets

Huggingface converting between tf and pytorch

Random / recipes / cooking

Day 842 (22 Apr 2021)

Jira old issue view + qutebrowser config setting

Ubuntu screen apt-get

Day 841 (21 Apr 2021)

Deutsch / German

Qutebrowser userscripts

DVC

Data organization / dataset structure link

Tree output only directories

Git paths from root of repo

Day 840 (20 Apr 2021)

Patterns / phrases / Random

Day 839 (19 Apr 2021)

vim delete all lines not matching pattern

Day 838 (18 Apr 2021)

Pizza sauce recipes

Day 836 (16 Apr 2021)

Deutsch

screen attaching screens without full name

Logoff i3 with a CLI

Blocking ips with ipset

iptables basics

Generally blocking countries

Day 835 (15 Apr 2021)

Data Scientist roadmap/curriculum

German / Deutsch

Day 832 (12 Apr 2021)

German

JQ producing nice comma-separated json

Clean up `journalctl`