In the middle of the desert you can say anything you want
TIL about git bisect.
git help bisect
for help.
TL;DR: uses binary search to find a commit that introduced a change. You run it, it gives you a commit, you tell it if it’s good or bad, and it keeps narrowing down the options.
git bisect start
-> git bisect good
-> git bisect bad
-> git bisect reset
HF Datasets’ README links this nice google colab that explain the basics: HuggingFace datasets library - Overview - Colaboratory
I use # TODO
s for “Do later”.
If they exist, Pycharm asks me every time before committing if I really want to.
I guess the idea is to use them to mark things to do before committing, so much smaller scale and here-and-now?
sanitize-filename · PyPI does what it says on the box.
It’s more complex than the replace--/ that I had in mind: sanitize_filename/sanitize_filename.py · master · jplusplus / sanitize-filename · GitLab
And intution tells me using external semi-unknown libraries like this might be a security risk.
TODO - what is the best practice for user-provided values that might become filenames?.. Something not smelling of either injection vulns or dependency vulns?
To skip slow tests, first I marked them as…
@pytest.mark.slow
def test_bioconv(tmp_path):
...
then, in the running configuration, I added the pytest params:
-m "not slow"
Using the Python defaultdict Type for Handling Missing Keys – Real Python
Python defaultdict is powerful, copying example from the excellent Real Python page above:
from collections import defaultdict
, then things like:
>>> def_dict = defaultdict(list) # Pass list to .default_factory
>>> def_dict['one'] = 1 # Add a key-value pair
>>> def_dict['missing'] # Access a missing key returns an empty list
[]
>>> def_dict['another_missing'].append(4) # Modify a missing key
become possible.
God, how many times have I written ugly (or overkill-dataclasses) code for “if there’s a key in the dict, append to it, if not - create an empty list”
Saw this in spacy’s iob_utils.py
:
# Fallbacks to make backwards-compat easier
offsets_from_biluo_tags = biluo_tags_to_offsets
spans_from_biluo_tags = biluo_tags_to_spans
biluo_tags_from_offsets = offsets_to_biluo_tags
I hope I never need this but it’s kinda cool!
Pytest has a nice tmp_path
fixture that creates a temporary directory and returs the Path
1:
# content of test_tmp_path.py
CONTENT = "content"
def test_create_file(tmp_path):
d = tmp_path / "sub"
d.mkdir()
p = d / "hello.txt"
p.write_text(CONTENT)
assert p.read_text() == CONTENT
assert len(list(tmp_path.iterdir())) == 1
Explicitly adding breakpoint()
in a python script is synonymous to adding a pycharm-debugger-breakpoint at that point in the file.
If you have a module inside another module, say two
inside one
, the syntax for running them from CLI is the same as the one used when importing them (import one.two
).
Assuming your working directory contains ./one/two/
:
python3 -m one.two --whatever