Huggingface Hub prefers zip archives because they support streaming
Random nugget from Document to compress data files before uploading · Issue #5687 · huggingface/datasets:
- gz, to compress individual files
- zip, to compress and archive multiple files; zip is preferred rather than tar because it supports streaming out of the box
(Streaming: https://huggingface.co/docs/datasets/v2.4.0/en/stream TL;DR don’t download the entire dataset for very large datasets, add stream=true
to the load_dataset()
fn)
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus