20 Jul 2022

Dataset files structure Huggingface recommendations

Datasets.

HF has recommendations about how to Structure your repository, where/how to put .csv/.json files in various splits/shards/configurations.

These dataset structures are also ones that can be easily loaded with load_dataset(), despite being CSV/JSON files.

Filenames containing ’train’ are considered part of the train split, same for ’test’ and ‘valid’

And indeed I could without issues create a Dataset through ds = datasets.load_dataset(my_directory_with_jsons).

Nel mezzo del deserto posso dire tutto quello che voglio.

serhii.net