In the middle of the desert you can say anything you want
TIL about git bisect.
git help bisect for help.
TL;DR: uses binary search to find a commit that introduced a change. You run it, it gives you a commit, you tell it if it’s good or bad, and it keeps narrowing down the options.
git bisect start -> git bisect good -> git bisect bad -> git bisect reset
HF Datasets’ README links this nice google colab that explain the basics: HuggingFace datasets library - Overview - Colaboratory
I use # TODOs for “Do later”.
If they exist, Pycharm asks me every time before committing if I really want to.
I guess the idea is to use them to mark things to do before committing, so much smaller scale and here-and-now?
sanitize-filename · PyPI does what it says on the box.
It’s more complex than the replace--/ that I had in mind: sanitize_filename/sanitize_filename.py · master · jplusplus / sanitize-filename · GitLab
And intution tells me using external semi-unknown libraries like this might be a security risk.
TODO - what is the best practice for user-provided values that might become filenames?.. Something not smelling of either injection vulns or dependency vulns?
To skip slow tests, first I marked them as…
@pytest.mark.slow
def test_bioconv(tmp_path):
...
then, in the running configuration, I added the pytest params:
-m "not slow"
Using the Python defaultdict Type for Handling Missing Keys – Real Python
Python defaultdict is powerful, copying example from the excellent Real Python page above:
from collections import defaultdict, then things like:
>>> def_dict = defaultdict(list) # Pass list to .default_factory
>>> def_dict['one'] = 1 # Add a key-value pair
>>> def_dict['missing'] # Access a missing key returns an empty list
[]
>>> def_dict['another_missing'].append(4) # Modify a missing key
become possible.
God, how many times have I written ugly (or overkill-dataclasses) code for “if there’s a key in the dict, append to it, if not - create an empty list”
Saw this in spacy’s iob_utils.py:
# Fallbacks to make backwards-compat easier
offsets_from_biluo_tags = biluo_tags_to_offsets
spans_from_biluo_tags = biluo_tags_to_spans
biluo_tags_from_offsets = offsets_to_biluo_tags
I hope I never need this but it’s kinda cool!
Pytest has a nice tmp_path fixture that creates a temporary directory and returs the Path1:
# content of test_tmp_path.py
CONTENT = "content"
def test_create_file(tmp_path):
d = tmp_path / "sub"
d.mkdir()
p = d / "hello.txt"
p.write_text(CONTENT)
assert p.read_text() == CONTENT
assert len(list(tmp_path.iterdir())) == 1
Explicitly adding breakpoint() in a python script is synonymous to adding a pycharm-debugger-breakpoint at that point in the file.
If you have a module inside another module, say two inside one, the syntax for running them from CLI is the same as the one used when importing them (import one.two).
Assuming your working directory contains ./one/two/:
python3 -m one.two --whatever