In the middle of the desert you can say anything you want
Gimp can open PDFs, if you select “open pages as images” instead of the default “as layers”, it will open each page as a separate image.
Then you can use burn/levels/… to improve quality of the scan of the document printed with a printer that’s low on toner.
Also - Goddammit Gimp interface - was looking for the burn tool. It’s hidden behind “Smudge”, had to use right click on it to get the full list. Hate this
OmegaConf is nice and has more features than YACS.
Merging (from the help)
conf = OmegaConf.merge(base_cfg, model_cfg, optimizer_cfg, dataset_cfg)
Bits I can’ find explicitly documented anywhere:
OmegaConf.merge()
takes the first argument as “base”, and its keys should be a superset of keys in the next one or it errors out (from omegaconf.errors import ConfigKeyError
).
It casts arguments automatically, if first argument’s key is a Path
and the second is a str
the merged one will be a Path(str_from_second_argument)
, beautiful!
New phone, need to set up again sync and friends to my VPS - I’ll document it this time.
This is part of the success story of “almost completely de-Google my life” that’s one of the better changes I ever did.
Goal: separate commands running separate taskwarrior reports/filters. But also usable to add tasks etc.
Previously (Day 728 - serhii.net) I used things like this in my zshrc:
th () {task s project.not:w sprint.not:s "$*"}
Found a better way:
## TASKWARRIOR
# All todos from both work and home
TW_WORK="rc.default.project:w rc.default.command:s"
TW_HOME="rc.default.project: rc.default.command:th"
# "Important tasks"
TW_I="rc.default.command:i"
# Work
alias s="task $TW_WORK"
# Home
alias t="task $TW_HOME"
# All pending tasks from all projects
alias ta="task rc.default.command:next"
# "Important" tags - report `i`
alias ti="task $TW_I"
This means:
s
runs taskwarrior and the s
report, which shows work-only tasks; if I do s add whatever
the task gets added automatically inside project:w
.
For completeness, the code for each of these reports (~/.taskrc
):
############
# REPORTS
############
report.s.description='Work tasks'
report.s.columns=id,project,tags,due.relative,description
report.s.labels=ID,Project,T,D,Desc
#report.s.sort=due+
report.s.sort=project-/,urgency+
report.s.filter=status:pending -s
report.s.filter=status:pending ((project:w -s) or (+o or +a or +ACTIVE))
report.i.description='Important / priority'
report.i.columns=id,project,tags,due.relative,description
report.i.labels=ID,Project,T,D,Desc
report.i.sort=project-/,urgency+
report.i.filter=status:pending (+o or +a or +ACTIVE)
report.th.description='Home tasks'
report.th.columns=id,project,tags,due.relative,description
report.th.labels=ID,Project,T,D,Desc
report.th.sort=project-/,urgency+
report.th.filter=status:pending -s
# report.th.filter=status:pending ((project.not:w project.not:l -srv -sd) or (+o or +a or +w or +ACTIVE))
report.th.filter=status:pending ((project.not:w project.not:l -srv -sd) or (+o or +a or +ACTIVE))
#Someday
report.sd.columns=id,start.age,depends,est,project,tags,sprint,recur,scheduled.countdown,due.relative,until.remaining,description,urgency
report.sd.labels=D,Active,Deps,E,Project,Tag,S,Recur,S,Due,Until,Description,Urg
report.sd.filter=status:pending (sprint:s or +sd)
# srv -- for continuously needed tasks like starting to work etc
report.srv.description='srv'
report.srv.columns=id,project,tags,pri,est,description,urgency
report.srv.labels=ID,Project,T,P,E,Description,U
report.srv.sort=urgency-
report.srv.filter=status:pending +srv
# Currently active task - for scripts
report.a.description='Currently active task'
report.a.columns=id,description #,project
report.a.labels=ID,D #,P
report.a.filter=+ACTIVE
report.next.filter=status:pending -srv -sd
urgency.user.tag.o.coefficient=10
urgency.user.tag.a.coefficient=5
urgency.user.tag.w.coefficient=3
Problem: tokenizer adds trailing dots to the token in numbers, which I don’t want to. I also want it to split words separated by a dash. Also p.a.
at the end of the sentences always became p.a..
, the end-of-sentence period was glued to the token.
100,000,000.00
, What-ever
, p.a..
The default rules for various languages are fun to read:
German:
General for all languages: spaCy/char_classes.py at master · explosion/spaCy
nlp.tokenizer.explain()
shows the rules matched when doing tokenization.
Docu about customizing tokenizers and adding special rules: Linguistic Features · spaCy Usage Documentation
Solution:
# Period at the end of line/token
trailing_period = r"\.$"
new_suffixes = [trailing_period]
suffixes = list(pipeline.Defaults.suffixes) + new_suffixes
suffix_regex = spacy.util.compile_suffix_regex(suffixes)
# Add infix dash between words
bindestrich_infix = r"(?<=[{a}])-(?=[{a}])".format(a=ALPHA)
infixes = list(pipeline.Defaults.infixes)
infixes.append(bindestrich_infix)
infix_regex = compile_infix_regex(infixes)
# Add special rule for "p.a." with trailing period
# Usually two traling periods become a suffix and single-token "p.a.."
special_case = [{'ORTH': "p.a."}, {'ORTH': "."}]
pipeline.tokenizer.add_special_case("p.a..", special_case)
pipeline.tokenizer.suffix_search = suffix_regex.search
pipeline.tokenizer.infix_finditer = infix_regex.finditer
The p.a..
was interesting - p.a.
was an explicit special case for German, but the two trailing dots got parsed as SUFFIX
for some reason (ty explain()
). Still no idea why, but given that special rules override suffixes I added a special rule specifically for that case, p.a..
with two periods at the end, it worked.
python3 -m pdb your_script.py
is usual
For modules it’s unsurprisingly intuitive:
python3 -m pdb -m your.module.name
For commands etc:
python3 -m pdb -c 'until 320' -m your.module.name
fnmatch — Unix filename pattern matching — Python 3.10.6 documentation:
Similar to Unix shell ones but without special handling of path bits, identical otherwise, and much simpler than regex:
*
matches everything?
matches any single character[seq]
matches any character in seq[!seq]
matches any character not in seqI have a list of names, I allow the user to select one or more by providing either a single string or a glob and returning what matches.
First it was two parameters and “if both are passed X takes precedence, but if it doesn’t have matches then fallback is used …”.
Realized that a simple string is a glob matching itself - and I can use the same field for both simplifying A LOT. The users who don’t know about globs can just do strings and everything’s fine. Still unsure if it’s a good idea, but nice to have as option.
Then - OK, what happens if his string is an invalid glob? Will this lead to a “invalid regex” type of exception?
Well - couldn’t find info about this, in the source code globs are converted to regexes and I see no exceptions raised, and couldn’t provoke any errors myself.
Globs with only mismatched brackets etc. always match themselves , but the best one:
>>> fnmatch.filter(['aa]ab','bb'],"aa]*a[bc]")
['aa]ab']
It ignores the mismatched bracket while correctly interpreting the matched ones!
So - I just have to care that a “name” doesn’t happen to be a correctly formulated glob, like [this one]
.
So - shelves! Just found out a really neat way to use them
“Unshelve silently” - never used it and never cared, just now - misclick and I did. It put the content of the shelf in a separate changelist named like the shelf, without changing my active changelist.
This is neat!
One of my main uses for both changelists and shelves are “I need to apply this patch locally but don’t want to commit that”, and this basically automates this behaviour.
In the Huggingface source found this bit:
class ExplicitEnum(str, Enum):
"""
Enum with more explicit error message for missing values.
"""
@classmethod
def _missing_(cls, value):
raise ValueError(
f"{value} is not a valid {cls.__name__}, please select one of {list(cls._value2member_map_.keys())}"
)
… wow?
(Pdb++) IntervalStrategy('epoch')
<IntervalStrategy.EPOCH: 'epoch'>
(Pdb++) IntervalStrategy('whatever')
*** ValueError: whatever is not a valid IntervalStrategy, please select one of ['no', 'steps', 'epoch']
Was MyEnum('something')
allowed the whole time? God I feel stupid.
So sorted()
’s key=
argument can return a tuple, then the tuple values are interpreted as multiple sorting keys!