Dataset files structure Huggingface recommendations
HF has recommendations about how to Structure your repository, where/how to put .csv/.json files in various splits/shards/configurations.
These dataset structures are also ones that can be easily loaded with
load_dataset(), despite being CSV/JSON files.
Filenames containing ‘train’ are considered part of the train split, same for ‘test’ and ‘valid’
And indeed I could without issues create a Dataset through
ds = datasets.load_dataset(my_directory_with_jsons).