In the middle of the desert you can say anything you want
Context: 220120-1959 taskwarrior renaming work tasks from previous work
Just tested this: DAMN!
User
Can you, in English, name one word for each of these tasks:
1. Rhymes with "chair"
2. Is a number larger than eleven
3. Has two letters "a"
4. Ends with the letter "k"
5. In the sentence "the cat had four paws and a good mood" is BEFORE the word "paws"
Also:
6. A sentence that starts with the word "dogs"
7. A sentence that ends with the word "beaver"
8. A sentence that uses the word "metal" twice
https://chat.openai.com/share/3fdfaf05-5c13-44eb-b73f-d66f33b73c59
lmentry/data/all_words_from_category.json at main · aviaefrat/lmentry
Not all of it needs code and regexes! lmentry/data/bigger_number.json at main · aviaefrat/lmentry
I can really do a small lite-lite subset containing only tasks that are evaluatable as a dataset.
// minimal, micro, pico
Plan:
Decision on [[231010-1003 Masterarbeit Tagebuch#LMentry-micro-UA]]: doing a smaller version works!
Will contain only a subset of tasks, the ones not needing regex. They are surprisingly many.
The code will generate a json dataset for all tasks.
Problem: ‘1’ -> один/перший/(на) першому (місці)/першою
Existing solutions:
Created my own! TODO document
TODO https://webpen.com.ua/pages/Morphology_and_spelling/numerals_declination.html
Parse(word='перша', tag=OpencorporaTag('ADJF,compb femn,nomn'), normal_form='перший', score=1.0, methods_stack=((DictionaryAnalyzer(), 'перша', 76, 9),))
compb
Nothing in docu, found it only in the Ukr dict converter tagsets mapping: LT2OpenCorpora/lt2opencorpora/mapping.csv at master · dchaplinsky/LT2OpenCorpora
I assume it should get converted to comp
but doesn’t - yet another future bug report to pymorphy4
pymorphy2 doesn’t add the sing
tag for Ukrainian singular words. Then any
inflection that deals with number fails.
Same issue I had in 231024-1704 Master thesis task CBT
Found a way around it:
@staticmethod
def _add_sing_to_parse(parse: Parse) -> Parse:
"""
pymorphy sometimes doesn't add singular for ukrainian
(and fails when needs to inflect it to plural etc.)
this creates a new Parse with that added.
"""
if parse.tag.number is not None:
return parse
new_tag_str = str(parse.tag)
new_tag_str+=",sing"
new_tag = parse._morph.TagClass(tag=new_tag_str)
new_best_parse = Parse(word=parse.word, tag=new_tag, normal_form=parse.normal_form, score=parse.score, methods_stack=parse.methods_stack)
new_best_parse._morph=parse._morph
return new_best_parse
# Not needed for LMentry, but I'll need it for CBT anyway...
@staticmethod
def _make_agree_with_number(parse: Parse, n: int)->Parse:
grams = parse.tag.numeral_agreement_grammemes(n)
new_parse = Numbers._inflect(parse=parse, new_grammemes=grams)
return new_parse
parse._morph
is the Morph.. instance, without one added inflections of that Parse fail.TagClass
follows the recommendations of the docu2 that say better it than a new OpencorporaTag
, even though both return the same class.+
Words of different lengths, alphabet order of words, etc.
Main relationship is kind=less|more
, where less
means “word closer to beginning of the alphabet”, “smaller number”, “word with fewer letters” etc., more
is the opposite.
https://chat.openai.com/share/b52baed7-5d56-4823-af3e-75a4ea8d5b8c: 1.5 errors, but I’m not sure myself about the fourth one.
LIST = [
"Яке слово стоїть ближче до початку алфавіту: '{t1}' чи '{t2}'?",
"Що є далі в алфавіті: '{t1}' чи '{t2}'?",
"Між '{t1}' та '{t2}', яке слово розташоване ближче до кінця алфавіту?",
# TODO - в алфавіті?
"У порівнянні '{t1}' і '{t2}', яке слово знаходиться ближче до A в алфавіті?",
# ChatGPT used wrong відмінок внизу:
# "Визначте, яке з цих слів '{t1}' або '{t2}' знаходиться далі по алфавіті?",
]
I want a ds with multiple configs.
starts = "(starts|begins)"
base_patterns = [
rf"The first letter is {answer}",
rf"The first letter {of} {word} is {answer}",
rf"{answer} is the first letter {of} {word}",
rf"{word} {starts} with {answer}",
rf"The letter that {word} {starts} with is {answer}",
rf"{answer} is the starting letter {of} {word}",
rf"{word}: {answer}",
rf"First letter: {answer}",
]
For more: lmentry/lmentry/scorers/more_letters_scorer.py at main · aviaefrat/lmentry
Another dictionary I found: slavkaa/ukraine_dictionary: Словник слів українською (слова, словоформи, синтаксичні данні, літературні джерела)
All basically need words and their categories. E.g. Animals: dog/cat/racoon
I wonder how many different categories I’d need
Ah, the O.G. benchmark has 5 categories: lmentry/resources/nouns-by-category.json at main · aviaefrat/lmentry
Anyway - I can find no easy dictionary about this.
options:
for all-in-one:
> grep -o "_\(.*\)(" all-in-one-file.txt | sort | uniq -c
49 _action(
8 _action-and-condition(
58 _holonym(
177 _hyponym(
43 _meronym(
12 _related(
51 _sister(
102 _synonym(
looking through it it’s sadly prolly too small
2009’s hyponym.txt is nice and much more easy to parse.
Ideas: WordNet Search - 3.1 Ask it to give me a list of:
<_(@bm_lmentry) “LMentry: A language model benchmark of elementary language tasks” (2022) / Avia Efrat, Or Honovich, Omer Levy: z / / 10.48550/ARXIV.2211.02069 _> ↩︎
API Reference (auto-generated) — Морфологический анализатор pymorphy2 ↩︎
Created pchr8/pymorphy-spacy-disambiguation: A package that picks the correct pymorphy2 morphology analysis based on morphology data from spacy to easily include it in my current master thesis code.
Later on releases pypi etc., but for now I just wanted to install it from github, and wanted to know what’s the minimum I can do to make it installable from github through pip.
To my surprise, pip install git+https://github.com/pchr8/pymorphy-spacy-disambiguation
worked as-is! Apparently pip is smart enough to parse the poetry project and run the correct commands.
poetry add git+https://github.com/pchr8/pymorphy-spacy-disambiguation
works just as well.
Otherwise, locally:
poetry build
creates a ./dist
directory with the package as installable/shareable files.
Also, TIL:
poetry show
poetry show --tree --why colorama
show a neat colorful tree of package dependencies in the project.
I give all my doctoral students a copy of the following great paper (and I’ve used a variant of the check list at the end for years - avoids errors when working on multiple papers with multiple international teams in parallel) http://www-mech.eng.cam.ac.uk/mmd/ashby-paper-V6.pdf
I’ll write here the main points from each of the linked PDF, copyright belongs to the original authors ofc.
How to Write a Paper
Mike Ashby
Engineering Department, University of Cambridge, Cambridge
6 rd Edition, April 2005
This brief manual gives guidance in writing a paper about your research. Most of the advice applies equally to your thesis or to writing a research proposal.
This is based on 2016 version of the paper, more are here: https://news.ycombinator.com/item?id=38446418#38449638 with the link to the 2016 version being https://web.archive.org/web/20220615001635/http://blizzard.cs.uwaterloo.ca/keshav/home/Papers/data/07/paper-reading.pdf
When you can’t write, it is because you don’t know what you want to say. The first job is to structure your thinking.
Don’t yet think of style, neatness or anything else. Just add, at the appropriate place on the sheet, your thoughts.
[continued on p. 62]
, [see footnote]
.Avoid clichés (standard formalised phrases): they are corpses devoid of the vitality which makes meaning spring from the page
How to Read a Paper
S. Keshav
David R. Cheriton School of Computer Science, University of Waterloo
Waterloo, ON, Canada
keshav@uwaterloo.ca
http://ccr.sigcomm.org/online/files/p83-keshavA.pdf
I have a pytest of a function that uses python @lru_cache
:
cacheinfo = gbif_get_taxonomy_id.cache_info()
assert cacheinfo.hits == 1
assert cacheinfo.misses == 2
LRU cache gets preserved among test runs, breaking independence and making such bits fail.
Enter pytest-antilru · PyPI which resets the LRU cache between test runs. Installing it as a python package is all there’s to ite.
Needed argparse to accept yes/no decisions, should have been used inside a dockerfile that doesn’t have if/else logic, and all solutions except getting a parameter that accepts string like true/false seemed ugly.
The standard linux --do-thing
and --no-do-thing
were also impossible to do within Docker, if I want to use an env. variable etc., unless I literally set them to --do-thing
which is a mess for many reasons.
I had 40 tabs open because apparently this is not a solved problem, and all ideas I had felt ugly.
How do I convert strings to bools in a good way? (bool
alone is not an option because bool('False')
etc.)
Basic if value=="true"
would work, but maybe let’s support other things as a bonus because why not.
My first thought was to see what YAML does, but then I found the deprecated in 3.12 distutils.util.strtobool
: 9. API Reference — Python 3.9.17 documentation
It converts y,yes,t,true,on,1 / n,no,f,false,off,0 into boolean True
/False
.
The code, the only reason it’s a separate function (and not a lambda inside the type=
parameter) was because I wanted a custom ValueError and to add the warning for deprecation, as if Python would let me forget. An one-liner was absolutely possible here as well.
def _str_to_bool(x: str):
"""Converts value to a boolean.
Currently uses (the rules from) distutils.util.strtobool:
(https://docs.python.org/3.9/distutils/apiref.html#distutils.util.strtobool)
True values are y, yes, t, true, on and 1
False values are n, no, f, false, off and 0
ValueError otherwise.
! distutils.util.strtobool is deprecated in python 3.12
TODO solve it differently by then
Args:
value (str): value
"""
try:
res = bool(strtobool(str(x).strip()))
except ValueError as e:
logger.error(
f"Invalid str-to-bool value '{x}'. Valid values are: y,yes,t,true,on,1 / n,no,f,false,off,0."
)
raise e
return res
# inside argparse
parser.add_argument(
"--skip-cert-check",
help="Whether to skip a cert check (%(default)s)",
type=_str_to_bool,
default=SKIP_CERT_CHECK,
)
This allows:
--no-do-thing
flagsdistutils
is deprecated in 3.12 though :(
YAML is known for it’s bool handling: Boolean Language-Independent Type for YAML™ Version 1.1.
Regexp:
y|Y|yes|Yes|YES|n|N|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF`
I don’t like it and think it creates more issues than it solves, e.g. the “Norway problem” ([[211020-1304 YAML Norway issues]]), but for CLI I think that’s okay enough.
Using Kubernetes envFrom for environment variables describes how to get env variables from config map or secret, copying here:
#####################
### deployment.yml
#####################
# Use envFrom to load Secrets and ConfigMaps into environment variables
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: mans-not-hot
labels:
app: mans-not-hot
spec:
replicas: 1
selector:
matchLabels:
app: mans-not-hot
template:
metadata:
labels:
app: mans-not-hot
spec:
containers:
- name: app
image: gcr.io/mans-not-hot/app:bed1f9d4
imagePullPolicy: Always
ports:
- containerPort: 80
envFrom:
- configMapRef:
name: env-configmap
- secretRef:
name: env-secrets
#####################
### env-configmap.yml
#####################
# Use config map for not-secret configuration data
apiVersion: v1
kind: ConfigMap
metadata:
name: env-configmap
data:
APP_NAME: Mans Not Hot
APP_ENV: production
#####################
### env-secrets.yml
#####################
# Use secrets for things which are actually secret like API keys, credentials, etc
# Base64 encode the values stored in a Kubernetes Secret: $ pbpaste | base64 | pbcopy
# The --decode flag is convenient: $ pbpaste | base64 --decode
apiVersion: v1
kind: Secret
metadata:
name: env-secrets
type: Opaque
data:
DB_PASSWORD: cDZbUGVXeU5e0ZW
REDIS_PASSWORD: AAZbUGVXeU5e0ZB
@caiquecastro
This is neater than what I used before, listing literally all of them:
spec:
containers:
- name: name
image: image
env:
- name: BUCKET_NAME
valueFrom:
configMapKeyRef:
name: some-config
key: BUCKET_NAME
Wanted to do coloring and remembered about Krita and the tutorial about flat coloring (Flat Coloring — Krita Manual 5.2.0 documentation) mentioned the Colorize Mask and it’s awesome!
Needed to actually understand it, and even had to watch a video tutorial (Tutorial: Coloring with “Colorize-mask” in Krita - YouTube) but it was so worth it!
It’s basically a bucket fill tool on steroids, and even might be reason enough to move away from Inkscape for some of these tasks!
Cleaned lineart:
Mask (red is transparent):
Result:
Result with random brushes moon texture below it:
Interesting bits:
Multiply
, but if there’s anything else below it it’ll be a mess - sometimes it should just be converted to a paint layer w/ the correct settings to see what it will look like in the endHeard the expression “roter Faden”, googled it, and it’s actually interesting and relevant.
In a scientific context, it’s the main topic / leitmotiv / … of the text. You ask a question, and all parts of the text should work together to answer it, relating to it in a clear way.
Excellent (PDF) link on this exact topic in scientific writing & an itemized list of ways to make it clear: https://www.uni-osnabrueck.de/fileadmin/documents/public/1_universitaet/1.3_organisation/sprachenzentrum/schreibwerkstatt/Roter_Faden_Endversion.pdf
TODO hypothetically save it from link rot somewhere
Also:
wolph/python-progressbar: Progressbar 2 - A progress bar for Python 2 and Python 3 - “pip install progressbar2” really cool flexible progressbar.
Also: progressbar.widgets — Progress Bar 4.3b.0 documentation:
Examples of markers:
- Smooth: ` ▏▎▍▌▋▊▉█` (default)
- Bar: ` ▁▂▃▄▅▆▇█`
- Snake: ` ▖▌▛█`
- Fade in: ` ░▒▓█`
- Dots: ` ⡀⡄⡆⡇⣇⣧⣷⣿`
- Growing circles: ` .oO`
You can export your own papers as single file and the entire Internet tells you how. But if you’re NOT the author, this is a workaround I found: