In the middle of the desert you can say anything you want
allenai/olmocr: Toolkit for linearizing PDFs for LLM datasets/training
Online demo: https://olmocr.allenai.org/
curl -X 'POST' \
'http://localhost:8001/tokenize' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "mistralai/Mistral-Small-24B-Instruct-2501",
"prompt": "my prompt",
"add_special_tokens": true,
"additionalProp1": {}
}'
uv add jupyterlab-vim
uv run jupyter labextension list
uv run jupyter labextension enable jupyterlab_vim
uv run jupyter lab
def pdb_debug(
main_func,
): # pylint: disable=missing-function-docstring, broad-exception-caught
"""Run a function, optionally drop to pdb on exception"""
def wrapper():
import pdb
import sys
import traceback
try:
main_func()
except Exception as e:
# PDB is actually read from .env
if PDB:
_, _, tb = sys.exc_info()
traceback.print_exc()
pdb.post_mortem(tb)
else:
raise e
return wrapper
A spin over the usual: 220214-1756 python run pdb on exception
apt install progress
: Xfennec/progress: Linux tool to show progress for cp, mv, dd, … (formerly known as cv)progress -w
gives status of running copy/mv operationsrsync -aP
) pip install -U "huggingface_hub[cli]"
#either of
hf auth login
hf auth login --token $HF_TOKEN
# models
hf download adept/fuyu-8b --cache-dir ./path/to/cache
// TODO — vllm — will it be VLLM_CACHE_ROOT
or HF_HOME
?
Also: Troubleshooting - vLLM — they literally recommend getting it first via hf cli and passing the full path
Lost cumulatively hours on these things this months.
MODEL_CONFIG = ConfigDict(
serialize_by_alias=True, # why doesn't this, alone, work?
)
Guess why? Because I have pydantic 2.10, the config above was introduced in 2.11, and it just quietly allows me to set this config value.
(Configuration - Pydantic)
(Ty ty
for picking up on this)
Next. Configuration - Pydantic
ConfigDict(
arbitrary_types_allowed=False, # disallow obj.invalid_field = "whatever"
)
For my own models as well. Setting obj.name='test'
when you want obj.step_name
is almost never a good idea.
And again about serialize_by_alias: will be default in pydantic v3, which I welcome, because if you forget to model_dump(by_alias=True)
then the model will be dumped with unexpected names, which will then be quietly deleted when you try to initialize a new model from that dict through e.g. NewModel(**old_model.model_dump())
.
(Should’ve validated anyway, but…)
~/.pdbrc
gets read by both of them, and can import stuff and use aliases!
# ! makes it python code to be executed
!import rich
# # alternative if not !importing it above in one comman:d
# alias P !import rich; rpprint(%1)
alias I rich.inspect(%1)
alias ppp rpprint(%1)
print("Custom commands:")
print("\t i $thing — rich inspect $thing")
print("\t ppp $thing — rich pretty print $thing")
EDIT: the above works only if rich is already imported.
I found out about uv self upgrade
which took me from 0.5 to 0.8, read the CHANGELOGs etc. and many neat things exist.
upgrade --all
· Issue #1419 · astral-sh/uvuv add -U pydantic
one packageuv lock --upgrade
all of them?uv build
package you want to useuv add ../other/dist/package-2.4.0-py3-none-any.whl
[tool.uv.sources]
coral-ucf = { path = "../other/dist/package-2.4.0-py3-none-any.whl" }
Copied from the old Semantic Wiki. Half of the old links are gone, yet another reminder of bit rot and memento mori.