In the middle of the desert you can say anything you want
Really nice, and the blog post introducing it has a lot of general info about datasets that I found very interesting.
From python - How do I type hint a method with the type of the enclosing class? - Stack Overflow:
If you have a classmethod and want to annotate the return value as that same class you’re now defining, you can actually do the logical thing!
from __future__ import annotations
class Whatever:
# ...
@classmethod what(cls) -> Whatever:
return cls()
It started with writing type hints for a complex dict, which led me to TypedDict, slowly went into “why can’t I just do a dataclass as with the rest”.
Found two libraries:
MyClass(**somedict)
field_names
to camelcase fieldNames
, one can disable that from settings: Extending from Meta — Dataclass Wizard 0.22.1 documentationTIL another bit I won’t ever use: 21. for/else — Python Tips 0.1 documentation
This exists:
for a in whatveer:
a.whatever()
else:
print("Whatever is empty!")
Found it after having a wrong indentation of an else
that put it inside the for
loop.
Found at least three:
Spent hours tracking down a bug that boiled down to:
A if args.sth.lower == "a" else B
Guess what - args.sth.lower
is a callable, and will never be equal to a string. So args.sth.lower == "a"
is always False
.
Of course I needed args.sth.lower()
.
Python sets have two kinds of methods:
a.intersection(b)
which returns the intersectiona.intersection_update(b)
which updates a
by removing elements not found in b.It calls the function-like ones (that return the result) operators, as opposed to the ‘update_’ ones.
Previously: 220622-1744 Directory structure for python research-y projects, 220105-1142 Order of directories inside a python project
Datasets.
HF has recommendations about how to Structure your repository, where/how to put .csv/.json files in various splits/shards/configurations.
These dataset structures are also ones that can be easily loaded with load_dataset()
, despite being CSV/JSON files.
Filenames containing ’train’ are considered part of the train split, same for ’test’ and ‘valid’
And indeed I could without issues create a Dataset through ds = datasets.load_dataset(my_directory_with_jsons)
.
Given an argument -l
, I needed to pass multiple values to it.
python - How can I pass a list as a command-line argument with argparse? - Stack Overflow is an extremely detailed answer with all options, but the TL;DR is:
nargs
:parser.add_argument('-l','--list', nargs='+', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 2345 3456 4567
parser.add_argument('-l','--list', action='append', help='<Required> Set flag', required=True)
# Use like:
# python arg.py -l 1234 -l 2345 -l 3456 -l 4567
Details about values for nargs
:
# This is the correct way to handle accepting multiple arguments.
# '+' == 1 or more.
# '*' == 0 or more.
# '?' == 0 or 1.
# An int is an explicit number of arguments to accept.
parser.add_argument('--nargs', nargs='+')
Related, a couple of days ago used nargs
to allow an empty value (explicitly passing -o
without an argument that becomes a None
) while still providing a default value that’s used if -o
is omitted completely:
parser.add_argument(
"--output-dir",
"-o",
help="Target directory for the converted .json files. (%(default)s)",
type=Path,
default=DEFAULT_OUTPUT_DIR,
nargs="?",
)