Dataset files structure Huggingface recommendations
Previously: 220622-1744 Directory structure for python research-y projects, 220105-1142 Order of directories inside a python project
Datasets.
HF has recommendations about how to Structure your repository, where/how to put .csv/.json files in various splits/shards/configurations.
These dataset structures are also ones that can be easily loaded with load_dataset()
, despite being CSV/JSON files.
Filenames containing ’train’ are considered part of the train split, same for ’test’ and ‘valid’
And indeed I could without issues create a Dataset through ds = datasets.load_dataset(my_directory_with_jsons)
.
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus