In the middle of the desert you can say anything you want
TIL about git bisect.
git help bisect
for help.
TL;DR: uses binary search to find a commit that introduced a change. You run it, it gives you a commit, you tell it if it’s good or bad, and it keeps narrowing down the options.
git bisect start
-> git bisect good
-> git bisect bad
-> git bisect reset
HF Datasets’ README links this nice google colab that explain the basics: HuggingFace datasets library - Overview - Colaboratory
To skip slow tests, first I marked them as…
@pytest.mark.slow
def test_bioconv(tmp_path):
...
then, in the running configuration, I added the pytest params:
-m "not slow"
Using the Python defaultdict Type for Handling Missing Keys – Real Python
Python defaultdict is powerful, copying example from the excellent Real Python page above:
from collections import defaultdict
, then things like:
>>> def_dict = defaultdict(list) # Pass list to .default_factory
>>> def_dict['one'] = 1 # Add a key-value pair
>>> def_dict['missing'] # Access a missing key returns an empty list
[]
>>> def_dict['another_missing'].append(4) # Modify a missing key
become possible.
God, how many times have I written ugly (or overkill-dataclasses) code for “if there’s a key in the dict, append to it, if not - create an empty list”
Saw this in spacy’s iob_utils.py
:
# Fallbacks to make backwards-compat easier
offsets_from_biluo_tags = biluo_tags_to_offsets
spans_from_biluo_tags = biluo_tags_to_spans
biluo_tags_from_offsets = offsets_to_biluo_tags
I hope I never need this but it’s kinda cool!
Pytest has a nice tmp_path
fixture that creates a temporary directory and returs the Path
1:
# content of test_tmp_path.py
CONTENT = "content"
def test_create_file(tmp_path):
d = tmp_path / "sub"
d.mkdir()
p = d / "hello.txt"
p.write_text(CONTENT)
assert p.read_text() == CONTENT
assert len(list(tmp_path.iterdir())) == 1
Explicitly adding breakpoint()
in a python script is synonymous to adding a pycharm-debugger-breakpoint at that point in the file.
If you have a module inside another module, say two
inside one
, the syntax for running them from CLI is the same as the one used when importing them (import one.two
).
Assuming your working directory contains ./one/two/
:
python3 -m one.two --whatever
Use requirements.txt | PyCharm
Tools -> Sync Python Requirements
This syncs the actual project requirements and possibly the installed packages with the given requirements.txt
There’s also a plugin, that autodetects requirements.txt in the root of the project, and then suggests installing missing packages from there etc.
WT recommended Streamlit • The fastest way to build and share data apps
“Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience required.”
Sample demos:
Other examples are in the Gallery • Streamlit
Awesome Streamlit is freaking awesome.
Connects well to explorables etc., and would replace about 30% of my use-cases for jupyter notebook. Especially random small demos, ones I don’t do because I don’t want to mess with interactive graphs in Jupyterlab or re-learn d3.
Speaking of d3 - I should rewrite Flappy Words in it!