Plant datasets taxonomy prep
This contains the entire list of all datasets I care about RE [230529-1413 Plants datasets taxonomy] for 230507-2308 230507-1623 Plants paper notes
-
GBIF
- Search
- Seems to be a central place for all similar and not-similar datasets with a centralized API
-
Pl@ntNet
- FULL dataset: Pl@ntNet observations
- https://www.gbif.org/occurrence/search?dataset_key=7a3679ef-5582-4aaa-81f0-8c2545cafc81
- Contains num of species and yet another exploring thing:World flora: Species - Pl@ntNet identify
- datamanager @ data
- Not sure what this is but sounds interesting, maybe connected to downloading their stuff
- OpenReview review of their paper: Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution | OpenReview
- FULL dataset: Pl@ntNet observations
-
- portuguese-only?
-
Flora-On: Flora de Portugal Interactiva. (2023). Sociedade Portuguesa de Botânica. www.flora-on.pt. Consulta efectuada em 29-5-2023.
- Used in <
@herediaLargeScalePlantClassification2017
(2017) z/d>
-
iNaturalist-xxx
- Has 2017-2018-… variants
- train/val plants numbers in the paper, test sets unclear and needs to be downloaded to calculate
- consensus (kaggle, paperswithcode etc.) seems to be train/val use for all of them
- Competitions
- All: visipedia/inat_comp: iNaturalist competition details
- 2017:
- (seems to be the coolest/most_used dataset)
- Plantae, Insecta,Aves,Reptilia,Mammalia,Fungi,Amphibia,Mollusca,Animalia,Arachnida,Actinopterygii,Chromista
- rows: supercategory (animalia, etc.), per-image attribution and license, GBIF (id, link, ..), others
- Sample rows from dataset: iNat competition GBIF info - Google Sheets
- <
@vanhornINaturalistSpeciesClassification2018
(2018) z/d>
- Added the rest to the spreadsheet, 2019 is the only one with missing per-category things
-
INaturalist
- Website is mirrored in GBIF
- You can filter stuff in the website and get numbers, GBIF has a viewer that also can do things but not list of species
- ‘research grade’ are the good observations: Help · iNaturalist
- SPECIES!=IDENTIFIER! The latter seems to contain also families etc.
- “iNaturalist Research-grade Observations”
- On GBIF is the entire dataset: Occurrence search
- This is the filter they used on iNaturalist: Observations · iNaturalist
- Plants-only GBIF: Occurrence search
- GBIF file with ‘species lst’: Download
- It contains 129k records, but they aren’t just species, also families etc.
- Corresponding iNat filter: Observations · iNaturalist
- GBIF file with ‘species lst’: Download
- On GBIF is the entire dataset: Occurrence search
- Plant seedlings dataset
- https://github.com/TheSaintIndiano/Plant-Seedlings-Classification/blob/master/Seedlings.ipynb
- num of images based on that doc: 390+ 611+ 231+ 496+ 221+ 475+ 287+ 385+ 221+ 654+ 516+ 263
-
Flora Incognita
- A lot of conflicting info!
- Main site: Flora Incognita | EN – The Flora Incognita app – Interactive plant species identification
- 16k+ plant types
- Main site: Flora Incognita | EN – The Flora Incognita app – Interactive plant species identification
- 2020 Paper giving some numbers about the dataset Multi-view classification with convolutional neural networks | PLOS ONE
- 775 classes
- 2021 paper about app with some details about the num of plants and comparison to other datasets: The Flora Incognita app – Interactive plant species identification - Mäder - 2021 - Methods in Ecology and Evolution - Wiley Online Library
- 4851 classes in table 1
- Flora Capture: a citizen science application for collecting structured plant observations - PMC
- Flowers, leaves or both? How to obtain suitable images for automated plant identification - PubMed is about the number of images for classification, and
we curated a partly crowd-sourced image dataset, comprising 50,500 images of 101 species.
- Paper using FI app, not FI dataset itself! Plant image identification application demonstrates high accuracy in Northern Europe | AoB PLANTS | Oxford Academic
2494 observations with 3199 images from 588 species, 365 genera and 89 families
- A lot of conflicting info!
-
PlantCLEF
- PlantCLEF2021
- PlantCLEF 2021 | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
- TRAIN: 321270 herb sheets, 6316 field photos, 354 observations of both herbarium+field; TEST SET: “3,186 photos in the field related to 638 plant observations (about 5 pictures per plants on average).”
- Dataset here: https://zenodo.org/record/3658343#.ZDe9x1qxVhE
- PlantCLEF2021
REALLY NICE OVERVIEW PAPER with really good overview of the existing datasets! Frontiers | Plant recognition by AI: Deep neural nets, transformers, and kNN in deep embeddings
-
Flavia
-
Datasets | The Leaf Genie has list of leaf datasets! TODO
-
Herbarium 2021
- Huge ds and paper linking to smaller ones - preliminarily added them to the spreadsheet
- [[2105.13808] The Herbarium 2021 Half-Earth Challenge Dataset](https://arxiv.org/abs/2105.13808
- <
@delutioHerbarium2021HalfEarth2021
(2021) z/d>)
Next steps
Spreadsheet
- Update it for all the sub-datasets if practical - e.g. web and friends if needed
- Done
Datasets
- Nice picture of who stole from whom
- Done