01 Nov 2022

DL 4 - Images and image segmentation

Image segmentation

Motivation:
- Medicine, autonomous driving, etc.
Image segmentation
- Pixel eine oder mehrere Klasse(n) zuordnen
- Unterscheiden nicht verschiedene instanzen
Data
- In: (x,y,color)
- Out: (x,y,m) where m is the number of masks/classes

/D Backbone is the encoder part
- Oft austauschbar
/D Deconvolution, generally aka Upsampling
- The opposite of Convolutions, getting back a picture of the same size basically
- “Rekostruktion höherdimensionaler Darstellungen aus niedrigdimensionalem Input”
/D Unpooling - opposite of Maxpooling
- How:
  - Pooling indices - remember where the biggest N was located, and fill it back later leaving 0 in the non-max places
  - OR same as above, but use the max number everywhere (instead of 0)
Architecture
- Usually symmetrical

Uses bits from pre-DL times
Also multiple losses etc., and lateral connections between downsampled and upsampled parts
Use features from different … TOOD Sl.174

We care about borders when parking or doing surgery - segmentation
But sometimes we care also about the presence and classes + specific instances of the same object class
Data:
- Bounding box for each object, xywh + c
TODO D/ Object segmentation Sl.180
For each class, you get an xy picture where you get 0 where there are no instances, then there’s 1 for first instance, 2 second etc.

Input -> Regions of interests -> Feature extraction -> Classification
Questions:
- How to find different instances?
- How to find different parts of the same instance (wheels look much different from the rest of the car but they are still part of the same car)

Region-based CNN
One of the first CNN for object detection
Approach:
- R-CNN selective search:
  - Selective-search algo for object candidate (Sl.186)
  - Hierarchic clustering for similar regions (Farbe, Textur, Helligkeit etc.)
  - Merge ones till you get sth similar
- Then you crop and resize the candidates to a similar size
- TODO really interesting feature bits and saving them to disk
- Then use a linear SVM to predict on the Zielklassen
- Bounding bog regressions to correct the BB of the candidates
Nachteile:
- Mehrere unabhängige Komponenten
- Too much time and place etc.
- Slow

TODO, Sl.200 +

Nel mezzo del deserto posso dire tutto quello che voglio.