07 May 2023

Plants paper notes

Key info

PlantCLEF

PlantCLEF 2021 and 2022 summary papers, no doi :(
- 2021¹:
- 2022:²
- other years
Latest datasets not available, previous ones use eol and therefore are a mix of stuff
Tasks and datasets differ by year (=can’t reliably do baseline), and main ideas differ too:
- 2021: use Herbaria to supplement lacking real-life photos
  - Best methods were the ones that used domain-specific adaptations as opposed to simple CNNs
- 2022: multi-image(/metadata) class. problem with A LOT of classes (80k)
  - Classes mean a lot of gimmicks to handle this memory-wise
Why this doesn’t work for us:
- datasets not available!
- the ones that are are a mix of stuff
- A lot of methods that work there well are specific to the task, as opposed to the general thing
- People can use their own datasets for training
- ~~Metrics: MRR (=not comparable to some other literature, even if there were results on the same dataset)~~

PlantNet300k³ paper

Dataset is a representative subsample of the big PlantNet dataaset that “covers over 35K species illustrated by nearly 12 million validated images”
- Subset has “306,146 plant images covering 1,081 species.
Long-tailed distribution of classes:
- 80% of the species account for only 11% of the total number of images”
- Top1 accuracy is OK, but not meaningful
- Macro-average top-1 accuracy differs by A LOT
The paper does a baselines using a lot of networks

Useful stuff

Citizen science

Citizen science - Wikipedia

Citizen science (similar to [..] participatory/volunteer monitoring) is scientific research conducted with participation from the general public

most citizen science research publications being in the fields of biology and conservation
can mean multiple things, usually using citizens acting volunteers to help monitor/classify/.. stuff (but also citizens initiating stuff; also: educating the public about scientific methods, e.g. schools)
- allowed users to upload photos of a plant species and its components, enter its characteristics (such as color and size), compare it against a catalog photo and classify it. The classification results are juried by crowdsourced ratings.⁴

Papers

Paper about using Pl@ntNet⁵ for CS:
- “Here we present two Pl@ntNet citizen science initiatives used by conservation practitioners in Europe (France) and Africa (Kenya).”
- paper citing it are interesting: Bonnet: How citizen scientists contribute to monitor… - Google Scholar
- Pl@ntNet can be
  - limited for subsets of plants
  - limiting plants based on GPS coordinates
  - made to train better certain species by manually adding good examples as done in the Lewa Conservatory in Kenya
Assessing accuracy in citizen science-based plant phenology monitoring | SpringerLink <@fuccilloAssessingAccuracyCitizen2015 (2015) z>
- Volunteers demonstrated greatest overall accuracy identifying unfolded leaves, ripe fruits, and open flowers.
- Maybe we’ll want to compare the areas where people are better at than ML in our paper?
Similar to the above, but detecting weeds:Assessing citizen science data quality: an invasive species case study <@crallAssessingCitizenScience2011 Assessing citizen science data quality (2011) z>
- Compare to the paper about detecting weeds with DL: <@chenPerformanceEvaluationDeep2021 (2021) z>

Centralized repositories of stuff

GBIF (ofc)
https://bien.nceas.ucsb.edu/bien/ more than 200k observations, and
- This:
  - Georeferenced plant observations from herbarium, plot, and trait records;
  - Plot inventories and surveys;
  - Species geographic distribution maps;
  - Plant traits;
  - A species-level phylogeny for all plants in the New World;
  - Cross-continent, continent, and country-level species lists.
No names known to me in their Data contributors

Biodiversity

Really nice paper: <@ortizReviewInteractionsBiodiversity2021 A review of the interactions between biodiversity, agriculture, climate change, and international trade (2021) z/d>
TL;DR climate change is not the worst wrt biodiversity

Positioning / strategy

Main bits

Plant classification as a method to monitor biodiversity in the context of citizen science

Why plant classification is hard

A lot of cleanly labeled herbaria, few labeled pictures (esp. tropical), but trasferring learned stuff from herbarium sheets to field photos is challenging:
- (e.g. strong colour variation and the transformation of 3D objects after pressing like fruits and flowers) <@waldchenMachineLearningImage2018 (2018) z>
- PlantCLEF2021 was entirely dedicated to using herbaria+photos, and there domain adaptations (joint representation space between herb+field) dramatically outperform best classical CNN, esp. the most difficult plants.<@goeau2021overview (2021) z><@goeauAIbasedIdentificationPlant2021 (2021) z>
Connected to the above: lab-based VS field-based investigations
- lab-based has strict protocols for aquisition, people with mobile phones don’t
  - “Lab-based setting is often used by biologist that brings the specimen (e.g. insects or plants) to the lab for inspecting them, to identify them and mostly to archive them. In this setting, the image acquisition can be controlled and standardised. In contrast to field-based investigations, where images of the specimen are taken in-situ without a controllable capturing procedure and system. For fieldbased investigations, typically a mobile device or camera is used for image acquisition and the specimen is alive when taking the picture (Martineau et al., 2017). ”<@waldchenMachineLearningImage2018 (2018) z>
Phenology (growth stages / seasons -> flowers) make life harder
- Plants sometimes have strong phenology (like bright red flowers) that make it more different and easier to find (esp. here in detecting them in satellite pictures: <@pearseDeepLearningPhenology2021 (2021) z>, but there DL failed less without flowers than non-DL), but sometimes don’t
- And ofc. a plant with and without flowers looks like a totally different plant
- Related:
  - Plant growth form has been the most helpful species metadata in PlantCLEF2021, but some plants at different stages of growth look like different plant stages.
Intra-species variability
- Plants of the same species can have flowers of different colors: (<@NEURIPS_DATASETS_AND_BENCHMARKS2021_7e7757b1 (2021) z/d>)
  - Esp. if only a part of the plant is photographed
The Pl@ntNet-300k paper mentions
- epistemic (model) uncertainty (flowers etc.)
- aleatoric (data) uncertainty (small information given to make a decision)
  - Plants belonging to the same genus can be visually very similar to each other: (same paper)
long-tailed distribution, which: <@walkerHarnessingLargeScaleHerbarium2022 (2022) z/d>
- is representative of RL
- is a problem because DL is “data-hungry”
some say there are a lot of mislabeled specimens <@goodwinWidespreadMistakenIdentity2015 (2015) z/d>

Datasets

EDIT separate post about this: 230529-1413 Plants datasets taxonomy

We can classify existing datasets in two types:
- Pl@ntNet / iNaturalist? / …: people with phones
- Clean standardized things like the Plant seedling classification dataset (<@giselssonPublicImageDatabase2017 (2017) z>), common weeds in Denmark dataset <@leminenmadsenOpenPlantPhenotype2020 (2020) z/d> etc.
  - I’d put leaf datasets in this category too
  - FloraIncognita is an interesting case:
    - FloraCapture requests contributors to photograph plants from at least five precisely defined perspectives
There are some special datasets, satellite and whatever, but especially:
- Leaf datasets exist and are used surprisingly often (if not exclusively) in overviews like the one we want to do:
  - (pic from <@kayaAnalysisTransferLearning2019 (2019) z/d>)
  - ds-peruvian, as used in <@bermejoComparativeStudyPlant2022 (2022) z/d>
- Flower datasets / “Natural flower classification”
  - is one of challenging aspects of plant classification because of non-rigid deformation, illumination changes and inter-class similarity. (<@al-qurranPlantClassificationWild2022 Plant classification in the wild (2022) z/d>)
- Seedlings etc. seem to be useful in industry (and they go hand-in-hand with weed-control)
- Fruit classification/segmentation and other very specific stuff we don’t really care about (<@mamatAdvancedTechnologyAgriculture2022 Advanced Technology in Agriculture Industry by Implementing Image Annotation Technique and Deep Learning Approach (2022) z/d> has an excellent overview of these)
Additional info present in datasets or useful:
- PlantCLEF2021 had additional metadata at the species level: growth form, habitat (forest, wetland, ..), and three others
- - PlantCLEF2022: 3/5 taxonomic levels where used in various ways. Taxonomic loss is a thing (TODO - was this useful?)⁶
- Pl@ntNet and FloraIncognita apps (can) use GPS coordinates during the decision
- TODO Phenology / phenological stage: is this true to begin with?

Research questions similar to ours

Plant classification (a.k.a. species identification) on pictures

Things like ecology and habitat loss, citizen science etc.
Industry:
- Weed detection

Crop identification (sattellites)

Crop stage identification / phenology (sattellites)

Paper outline sketch

Introduction

Tasks about plants are important
Ecology: global warming etc., means different distribution of plant species, phenology stages changed, broken balances and stuff and one needs to track it; herbaria and digitization / labeling of herbaria
Industry: crops stages identification, crops/weeds identification, fruit ripeness identification, etc. long list
automatic methods have been used, starting from SVM/manual-feature-xxx, later - DL
DL has been especially nice and improved stuff in all of these different sub-areas, show the examples that compare DL-vs-non-DL in the narrow fields
The closest relevant thing is PlantCLEF competition that’s really really nice but \textbf{TODO what are we doing that PlantCLEF isn’t?}
Goal of this paper is:
1. Do a short overview of the tasks-connected-to-plants that exist and are usually tackled using AI magic
2. Along the way: WHICH AI magic is usually used for the tasks that are formalized as image classification (TODO and object segmentation/detection?)
3. Show that while

http://ceur-ws.org/Vol-2936/paper-122.pdf / <@goeau2021overview (2021) z> ↩︎
https://hal-lirmm.ccsd.cnrs.fr/lirmm-03793591/file/paper-153.pdf / <@goeau2022overview (2022) z> ↩︎
<@NEURIPS_DATASETS_AND_BENCHMARKS2021_7e7757b1 (2021) z> ↩︎
IBM and SAP open up big data platforms for citizen science | Guardian sustainable business | The Guardian ↩︎
<@bonnetHowCitizenScientists2020 (2020) z> ↩︎
Deep Learning with Taxonomic Loss for Plant Identification - PMC ↩︎

Nel mezzo del deserto posso dire tutto quello che voglio.

serhii.net

Plants paper notes

Key info

PlantCLEF

PlantNet300k3 paper

Useful stuff

Citizen science

Papers

Centralized repositories of stuff

Biodiversity

Positioning / strategy

Main bits

Why plant classification is hard

Datasets

Research questions similar to ours

Plant classification (a.k.a. species identification) on pictures

Crop identification (sattellites)

Crop stage identification / phenology (sattellites)

Paper outline sketch

Introduction

PlantNet300k³ paper