# serhii.net

In the middle of the desert you can say anything you want

• ### Day 1345 (06 Sep 2022)

#### Gimp open PDFs to clean them

Gimp can open PDFs, if you select “open pages as images” instead of the default “as layers”, it will open each page as a separate image.

Then you can use burn/levels/… to improve quality of the scan of the document printed with a printer that’s low on toner.

Also - Goddammit Gimp interface - was looking for the burn tool. It’s hidden behind “Smudge”, had to use right click on it to get the full list. Hate this

• ### Day 1344 (05 Sep 2022)

#### Omegaconf and python configs

OmegaConf is nice and has more features than YACS.

Merging (from the help)

conf = OmegaConf.merge(base_cfg, model_cfg, optimizer_cfg, dataset_cfg)


Bits I can' find explicitly documented anywhere:

OmegaConf.merge() takes the first argument as “base”, and its keys should be a superset of keys in the next one or it errors out (from omegaconf.errors import ConfigKeyError).

It casts arguments automatically, if first argument’s key is a Path and the second is a str the merged one will be a Path(str_from_second_argument), beautiful!

• ### Day 1343 (04 Sep 2022)

#### Setting up again Nextcloud, dav, freshRSS sync etc. for Android phone

New phone, need to set up again sync and friends to my VPS - I’ll document it this time.

This is part of the success story of “almost completely de-Google my life” that’s one of the better changes I ever did.

#### Taskwarrior better use of default values

Previously (Day 728 - serhii.net) I used things like this in my zshrc:

th () {task s project.not:w sprint.not:s "$*"}  Found a better way: ## TASKWARRIOR # All todos from both work and home TW_WORK="rc.default.project:w rc.default.command:s" TW_HOME="rc.default.project: rc.default.command:th" # "Important tasks" TW_I="rc.default.command:i" # Work alias s="task$TW_WORK"
# Home
alias t="task $TW_HOME" # All pending tasks from all projects alias ta="task rc.default.command:next" # "Important" tags - report i alias ti="task$TW_I"


This means: s runs taskwarrior and the s report, which shows work-only tasks; if I do s add whatever the task gets added automatically inside project:w.

For completeness, the code for each of these reports (~/.taskrc):

############
# REPORTS
############
report.s.columns=id,project,tags,due.relative,description
report.s.labels=ID,Project,T,D,Desc
#report.s.sort=due+
report.s.sort=project-/,urgency+
report.s.filter=status:pending  -s
report.s.filter=status:pending ((project:w -s) or (+o or +a or +ACTIVE))

report.i.description='Important / priority'
report.i.columns=id,project,tags,due.relative,description
report.i.labels=ID,Project,T,D,Desc
report.i.sort=project-/,urgency+
report.i.filter=status:pending (+o or +a or +ACTIVE)

report.th.columns=id,project,tags,due.relative,description
report.th.labels=ID,Project,T,D,Desc
report.th.sort=project-/,urgency+
report.th.filter=status:pending  -s
# report.th.filter=status:pending ((project.not:w project.not:l -srv -sd) or (+o or +a or +w or +ACTIVE))
report.th.filter=status:pending ((project.not:w project.not:l -srv -sd) or (+o or +a or +ACTIVE))

#Someday
report.sd.columns=id,start.age,depends,est,project,tags,sprint,recur,scheduled.countdown,due.relative,until.remaining,description,urgency
report.sd.labels=D,Active,Deps,E,Project,Tag,S,Recur,S,Due,Until,Description,Urg
report.sd.filter=status:pending (sprint:s or +sd)

# srv -- for continuously needed tasks like starting to work etc
report.srv.description='srv'
report.srv.columns=id,project,tags,pri,est,description,urgency
report.srv.labels=ID,Project,T,P,E,Description,U
report.srv.sort=urgency-
report.srv.filter=status:pending +srv

# Currently active task - for scripts
report.a.columns=id,description #,project
report.a.labels=ID,D #,P
report.a.filter=+ACTIVE

report.next.filter=status:pending -srv -sd

urgency.user.tag.o.coefficient=10
urgency.user.tag.a.coefficient=5
urgency.user.tag.w.coefficient=3


• ### Day 1324 (16 Aug 2022)

#### Spacy custom tokenizer rules

Problem: tokenizer adds trailing dots to the token in numbers, which I don’t want to. I also want it to split words separated by a dash. Also p.a. at the end of the sentences always became p.a.., the end-of-sentence period was glued to the token.

100,000,000.00, What-ever, p.a..

The default rules for various languages are fun to read:

German:

General for all languages: spaCy/char_classes.py at master · explosion/spaCy

nlp.tokenizer.explain() shows the rules matched when doing tokenization.

Docu about customizing tokenizers and adding special rules: Linguistic Features · spaCy Usage Documentation

Solution:

# Period at the end of line/token
trailing_period = r"\.\$"
new_suffixes = [trailing_period]
suffixes = list(pipeline.Defaults.suffixes) + new_suffixes
suffix_regex = spacy.util.compile_suffix_regex(suffixes)

# Add infix dash between words
bindestrich_infix = r"(?<=[{a}])-(?=[{a}])".format(a=ALPHA)
infixes = list(pipeline.Defaults.infixes)
infixes.append(bindestrich_infix)
infix_regex = compile_infix_regex(infixes)

# Add special rule for "p.a." with trailing period
# Usually two traling periods become a suffix and single-token "p.a.."
special_case = [{'ORTH': "p.a."}, {'ORTH': "."}]

pipeline.tokenizer.suffix_search = suffix_regex.search
pipeline.tokenizer.infix_finditer = infix_regex.finditer


The p.a.. was interesting - p.a. was an explicit special case for German, but the two trailing dots got parsed as SUFFIX for some reason (ty explain()). Still no idea why, but given that special rules override suffixes I added a special rule specifically for that case, p.a.. with two periods at the end, it worked.

• ### Day 1319 (11 Aug 2022)

#### Running modules with pdbpp in python

python3 -m pdb your_script.py is usual

For modules it’s unsurprisingly intuitive:

python3 -m pdb -m your.module.name


For commands etc:

python3 -m pdb -c 'until 320' -m your.module.name


## Globs

Similar to Unix shell ones but without special handling of path bits, identical otherwise, and much simpler than regex:

• * matches everything
• ? matches any single character
• [seq] matches any character in seq
• [!seq] matches any character not in seq

## Use case

I have a list of names, I allow the user to select one or more by providing either a single string or a glob and returning what matches.

First it was two parameters and “if both are passed X takes precedence, but if it doesn’t have matches then fallback is used …”.

Realized that a simple string is a glob matching itself - and I can use the same field for both simplifying A LOT. The users who don’t know about globs can just do strings and everything’s fine. Still unsure if it’s a good idea, but nice to have as option.

Then - OK, what happens if his string is an invalid glob? Will this lead to a “invalid regex” type of exception?

Well - couldn’t find info about this, in the source code globs are converted to regexes and I see no exceptions raised, and couldn’t provoke any errors myself.

Globs with only mismatched brackets etc. always match themselves , but the best one:

>>> fnmatch.filter(['aa]ab','bb'],"aa]*a[bc]")
['aa]ab']


It ignores the mismatched bracket while correctly interpreting the matched ones!

So - I just have to care that a “name” doesn’t happen to be a correctly formulated glob, like [this one].

1. If it’s a string and has a match, return that match
2. Anything else is a glob, warn about globs if glob doesn’t have a match either. (Maybe someone wants a name literally containing glob characters, name is not there but either they know about globs and know it’s invalid now, or they don’t know about them - since they seem to use glob special characters, now it’s a good time to find out)

#### Pycharm shelf and changelists and 'Unshelve silently'

So - shelves! Just found out a really neat way to use them

“Unshelve silently” - never used it and never cared, just now - misclick and I did. It put the content of the shelf in a separate changelist named like the shelf, without changing my active changelist.

This is neat!

One of my main uses for both changelists and shelves are “I need to apply this patch locally but don’t want to commit that”, and this basically automates this behaviour.

• ### Day 1318 (10 Aug 2022)

#### Huggingface utils ExplicitEnum python bits

In the Huggingface source found this bit:

class ExplicitEnum(str, Enum):
"""
Enum with more explicit error message for missing values.
"""

@classmethod
def _missing_(cls, value):
raise ValueError(
f"{value} is not a valid {cls.__name__}, please select one of {list(cls._value2member_map_.keys())}"
)


… wow?

(Pdb++) IntervalStrategy('epoch')
<IntervalStrategy.EPOCH: 'epoch'>
(Pdb++) IntervalStrategy('whatever')
*** ValueError: whatever is not a valid IntervalStrategy, please select one of ['no', 'steps', 'epoch']


Was MyEnum('something') allowed the whole time? God I feel stupid.

• ### Day 1317 (09 Aug 2022)

#### Python sorted sorting with multiple keys

So sorted()’s key= argument can return a tuple, then the tuple values are interpreted as multiple sorting keys!