In the middle of the desert you can say anything you want

13 Mar 2024

Huggingface Hub prefers zip archives because they support streaming

Random nugget from Document to compress data files before uploading · Issue #5687 · huggingface/datasets:

  • gz, to compress individual files
  • zip, to compress and archive multiple files; zip is preferred rather than tar because it supports streaming out of the box

(Streaming: TL;DR don’t download the entire dataset for very large datasets, add stream=true to the load_dataset() fn)

Nel mezzo del deserto posso dire tutto quello che voglio.