Huggingface Datasets metadata
A (DatasetInfo
) object contains dataset metadata like version etc.
Adding pre-existing attributes described here: Create a dataset loading script. But apparently you can’t add custom ones through it.
Option1 - subclass DatasetBuilder
Build and load touches the topic and suggests subclassing BuilderConfig
, it’s the class that then is used by the DatasetBulider.
Option2 - you can subclass the Dataset
Fine-tuning with custom datasets — transformers 3.2.0 documentation
Example shown, not for this problem, and I don’t really like it but whatever.
The best solution
Ended up just not adding metadata, I basically needed things that can be recovered anyway from a Features
object with ClassLabels
.
No easy support for custom metadata is really strange to me - sounds like something quite useful to many “Dataset created with version XX of converter program” and I see no reason why HF doesn’t do this.
Strong intuitive feeling that I’m misunderstanding the logic on some level and the answer I need is closer in spirit to “why would you want to add custom attributes to X, you could just ….”
Does everyone use separate key/values in the dataset itself or something?
EDIT: https://huggingface.co/datasets/allocine/edit/main/README.md cool example.