serhii.net

In the middle of the desert you can say anything you want

29 May 2023

Plants datasets taxonomy

Previously: 230507-2308 230507-1623 Plants paper notes

Am gathering them here: https://docs.google.com/spreadsheets/d/15fJR2PtEgJ7EWWPETDvquisNU_IBlGVKHezsMj76Y7M/edit?usp=sharing

Paper pic here because this isn’t published: Content.jpg

Rules:

  • yes/no is 1/0
  • links are comma-separated, starting with the ‘best’ link
  • lists are comma-separated, can contain spaces (TODO: nums are awkward)
  • DOIs are preferred as links
  • num images/classes is only plants (separate column for the entire ds)
    • if dataset contains train/test/val splits, the available ones are summed up
    • is_composed->dataset_parts also focuses on plants only

Columns

num_classes: whatever is the y in the dataset, for Pl@ntNet-big it’s species (not families/genera) - num classes/images are ONLY FOR THE PLANTS PART OF THE DATASET - num_XXX_full for the entire dataset with all supercategories (plants, animals, w/e)

additional_metadata (per-pic):

  • organ (=flower, leaf, …., first seen in pl@ntnet300k)
  • license (of picture)
  • author (of picture)
  • habitat, growth form (plantclef)
  • bbox for bounding boxes (TODO: OR SEGMENTATIONS)

generation_type:

  • user_generated

constantly_updated: True if like the big Pl@ntNet dataset it’s being continuously updated

DONE: 1 if I finished that row in the spreadsheet

Nel mezzo del deserto posso dire tutto quello che voglio.