Diensttagebuch

Day 1297 (21 Jul 2022)

Python str lower bug - callable function vs function return value
- zc
- zc/it
- python
Spent hours tracking down a bug that boiled down to:
```
A if args.sth.lower == "a" else B
```
Guess what - args.sth.lower is a callable, and will never be equal to a string. So args.sth.lower == "a" is always False.

Of course I needed args.sth.lower().
Day 1296 (20 Jul 2022)

Python set operations
- zc
- zc/it
- python
Python sets have two kinds of methods:
- a.intersection(b) which returns the intersection
- a.intersection_update(b) which updates a by removing elements not found in b.
It calls the function-like ones (that return the result) operators, as opposed to the ‘update_’ ones.

(Built-in Types — Python 3.10.5 documentation)
Dataset files structure Huggingface recommendations
- zc
- zc/it
- ml
- hf
Previously: 220622-1744 Directory structure for python research-y projects, 220105-1142 Order of directories inside a python project

Datasets.

HF has recommendations about how to Structure your repository, where/how to put .csv/.json files in various splits/shards/configurations.

These dataset structures are also ones that can be easily loaded with load_dataset(), despite being CSV/JSON files.

Filenames containing ’train’ are considered part of the train split, same for ’test’ and ‘valid’

And indeed I could without issues create a Dataset through ds = datasets.load_dataset(my_directory_with_jsons).

Python argparse pass multiple values for argument
- zc
- zc/it
- python
- py/argparse
Given an argument -l, I needed to pass multiple values to it.

python - How can I pass a list as a command-line argument with argparse? - Stack Overflow is an extremely detailed answer with all options, but the TL;DR is:
1. nargs:
```
parser.add_argument('-l','--list', nargs='+', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 2345 3456 4567
```
1. append:
```
parser.add_argument('-l','--list', action='append', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 -l 2345 -l 3456 -l 4567
```
Details about values for nargs:
```
# This is the correct way to handle accepting multiple arguments.
# '+' == 1 or more.
# '*' == 0 or more.
# '?' == 0 or 1.
# An int is an explicit number of arguments to accept.
parser.add_argument('--nargs', nargs='+')
```
Related, a couple of days ago used nargs to allow an empty value (explicitly passing -o without an argument that becomes a None) while still providing a default value that’s used if -o is omitted completely:
```
    parser.add_argument(
        "--output-dir",
        "-o",
        help="Target directory for the converted .json files. (%(default)s)",
        type=Path,
        default=DEFAULT_OUTPUT_DIR,
        nargs="?",  
    )
```
Day 1288 (12 Jul 2022)

Slurm jobs crash due to OOM
- zc
- zc/it
- slurm
A training that worked on my laptop gets kliled on the slurm node.
- Out-of-Memory (OOM) or Excessive Memory Usage | Ohio Supercomputer Center
- Allocating Memory | Princeton Research Computing
sstat was hard to parse and read, wasn’t sure what I want there.

Find out the CPU time and memory usage of a slurm job - Stack Overflow
- sstat is for running jobs, sacct is for finished jobs
- sacct in its examples told me that column name capitalization doesn’t matter
Ended up with this:
```
 sacct -j 974 --format=jobid,jobname,maxvmsize,avevmsize,maxrss,averss,maxpages,avepages,avecpu,alloccpus,elapsed,state,exitcode,reqcpufreqmax,reqcpufreqgov,reqmem
```
For running jobs:
```
 sstat -j 975 --format=jobid,maxvmsize,avevmsize,maxrss,averss,maxpages,avepages,avecpu,reqcpufreqmax,reqcpufreqgov
```
(Half can be removed, but my goal was to just get it to fit on screen)

W|A is still the best for conversions: 18081980K in gb - Wolfram|Alpha

Other things I learned:
- You can use suffixes in args like --mem=200G
- --mem=0 should give access to all the memory, doesn’t work for me though
- You can do a task farm to run many instances of the same command with diff params: Slurm task-farming for Python scripts | Research IT | Trinity College Dublin
- Found more helpful places
  - Slurm Resource Manger | Research IT | Trinity College Dublin
  - Automating job submission: SLURM Job Submission with R, Python, Bash | Research Computing Lessons
Slurm blues
- zc
- zc/it
- slurm
- cli
Things that work for my specific instance:
- ssh-copy-id to log in via public key
- kitty +kitten ssh shamotskyi@v-slurm-login
- sshfs
- set -o vi in ~/.bashrc
Problem: how to install packages to run my stuff

Problem: how to install my python packages?
- There’s no pip and I have no admin rights to install python3-ensurepip
- pyxls that does “containers” is there
Sample from documentation about using pyxls:
```
srun --mem=16384 -c4 --gres=gpu:v100:2 \
--container-image tensorflow/tensorflow:latest-gpu \
--container-mounts=/slurm/$(id -u -n):/data \
--container-workdir /data \
python program.py
```
Sadly my code needs some additional packages not installed by default there or anywhere, I need to install spacy language packs etc.

I have a Docker image I can use with everything installed on it, but it’s not on any public registry and I’m not gonna setup one just for this.

Solution - Container that gets saved!

You can start interactive jobs, in this case inside a docker container and it drops you inside a shell:
```
 srun --mem=16384 -c4 --gres=gpu:v100:2 --container-image tensorflow/tensorflow:latest-gpu --container-mounts=/slurm/$(id -u -n):/data --container-workdir /data --pty bash
```
Couldn’t add users or install packages because nothing was writeablea, so I open the documentation, find interesting flags there:
```
--container-image=[USER@][REGISTRY#]IMAGE[:TAG]|PATH
                              [pyxis] the image to use for the container
                              filesystem. Can be either a docker image given as
                              an enroot URI, or a path to a squashfs file on the
                              remote host filesystem.
--container-name=NAME   [pyxis] name to use for saving and loading the
                        container on the host. Unnamed containers are
                        removed after the slurm task is complete; named
                        containers are not. If a container with this name
                        already exists, the existing container is used and
                        the import is skipped.
--container-save=PATH   [pyxis] Save the container state to a squashfs
                        file on the remote host filesystem.
--container-writable    [pyxis] make the container filesystem writable
      --container-readonly    [pyxis] make the container filesystem read-only
```
So, I can get an image from Docker hub, save that container locally, and then provide that saved one instead of the image from the registry. Nice.

Or just give it a name, it will reuse it instead of reading it.

I can also make it writable.

=> I can create my own docker image, install everything there, and just go inside it to start trainings?

Final command:
```
 srun --mem=16384 -c4 --gres=gpu:v100:2 --container-image ./test_saved_path --container-save ./test_saved_path_2 --container-mounts=/slurm/$(id -u -n)/data:/data --container-workdir /data  --container-name my_container_name --container-writable --pty bash
```
It:
- Opens the container image locally, but more likely - reopens the one under its name
- Opens a shell
- Is writable, any changes I do get saved
- At the end the container itself gets saved in ./test_saved_paths_2, just in case the open-the-named-container-by-name ever fails me.
- As a bonus - I can do stuff to make the container usable, instead of the very raw default settings of the server I have no rights to change.
And a folder that locally I have mounted with sshfs that the docker image also has transparent access to makes the entire workflow fast.

The final solution was:
1. Set up the perfect Container based on the TF docker image
2. Create two scripts, one that just starts the training inside it and one that drops you in a shell in that container. Both based on the command above.
(But I still wonder how the rest are doing it, I can’t believe that’s the common way to run stuff that needs an installed package…)
Docker cleaning up everything
- zc
- zc/it
- docker
Magic line:
```
docker rm -f $(docker ps -aq) && docker volume rm -f $(docker volume ls -q)
```
Day 1282 (06 Jul 2022)

Pycharm code code completion suggestions and references
- zc
- zc/it
- pycharm
- idea
Reminder of why people use IDEs

Was unhappy about the order of suggestions for completion in Pycharm, more current stuff I can remember than arguments to a function I don’t.

Started looking for ways to order them, but then realized that I ACTUALLY want documentation for the thing under the cursor - that I have in vim/jedi and use but somehow not in pycharm.

Code reference information | PyCharm:
1. <Ctrl-Shift-I> does this “Quick definition”
2. The “Ctrl-click” “go to source” bit - if you don’t click but just hover you also get a tooltip.
“View -> Quick type definition” exists too! Can be bound to a key, but available though the menu.

That menu has A LOT of stuff that is going to be transformative for the way I code. Describing here in full to remember it, it’s worth it.

My understanding is:
- “Quick definition”: short “what” and the closest “where”
  - short “what”: “it’s a function: def ou()..”, “It’s a variable the function got through this part of the signature: a: str,”
  - <C-S-i> by default
- “Quick documentation” - a bit of everything
  - signature, docstring, everything I usually need
  - <Alt-K> for me, default <Ctrl-P>,
  - if pressed twice opens a separate static window that shows the documentation of everything under the cursor as it moves!
- “Type info” - “it’s a str!”
  - Tells you the type information - prolly type hints but not necessarily
  - <Alt-P> for me, default <Ctrl-Shift-P>
- “Quick type definition”: Function or classes signatures
  - This thing is a duck. Ducks are birds that ….. If the duck is a str - well now I know that a str has a long definition. No default shortcut.
- “Context info” - info about current thing from off-screen
  - <Alt-q>
  - First the name of the function being edited, then the name of the class it’s in, etc.
  - Repeat calls make it go higher
Changes to my shortcuts
- <Alt-K> is now quick documentation
- <Alt-P> is now type info
Onwards!
Day 1280 (04 Jul 2022)

Huggingface Datasets metadata
- zc
- zc/it
- python
- ml
- py/hf
A (DatasetInfo) object contains dataset metadata like version etc.

Adding pre-existing attributes described here: Create a dataset loading script. But apparently you can’t add custom ones through it.

Option1 - subclass DatasetBuilder

Build and load touches the topic and suggests subclassing BuilderConfig, it’s the class that then is used by the DatasetBulider.

Option2 - you can subclass the Dataset

Fine-tuning with custom datasets — transformers 3.2.0 documentation

Example shown, not for this problem, and I don’t really like it but whatever.

The best solution

Ended up just not adding metadata, I basically needed things that can be recovered anyway from a Features object with ClassLabels.

No easy support for custom metadata is really strange to me - sounds like something quite useful to many “Dataset created with version XX of converter program” and I see no reason why HF doesn’t do this.

Strong intuitive feeling that I’m misunderstanding the logic on some level and the answer I need is closer in spirit to “why would you want to add custom attributes to X, you could just ….”

Does everyone use separate key/values in the dataset itself or something?

EDIT: https://huggingface.co/datasets/allocine/edit/main/README.md cool example.
Day 1268 (22 Jun 2022)

Python unpacking operator to get list of dictionary keys from dict_keys
- zc
- zc/it
- python
- py
The * operator works to get a list from dictionary keys!
- my_dict.keys() returns a dict_keys object.
- [*my_dict.keys()] returns the keys as list of str
  - list(..) would do the same but in a more readable way :)
Anyway filling this under “cool stuff I won’t ever use”

Day 1297 (21 Jul 2022)

Day 1296 (20 Jul 2022)

Day 1288 (12 Jul 2022)

Problem: how to install packages to run my stuff

Solution - Container that gets saved!

Day 1282 (06 Jul 2022)

Reminder of why people use IDEs

Changes to my shortcuts

Day 1280 (04 Jul 2022)

Option1 - subclass DatasetBuilder

Option2 - you can subclass the Dataset

The best solution

Day 1268 (22 Jun 2022)