DL 4 - Images and image segmentation
Image segmentation
- Motivation:
- Medicine, autonomous driving, etc.
- Image segmentation
- Pixel eine oder mehrere Klasse(n) zuordnen
- Unterscheiden nicht verschiedene instanzen
- Data
- In:
(x,y,color)
- Out:
(x,y,m)
wherem
is the number of masks/classes
- In:
Networks
Fully convolutional network
- /D Backbone is the encoder part
- Oft austauschbar
- /D Deconvolution, generally aka Upsampling
- The opposite of Convolutions, getting back a picture of the same size basically
- “Rekostruktion höherdimensionaler Darstellungen aus niedrigdimensionalem Input”
- /D Unpooling - opposite of Maxpooling
- How:
- Pooling indices - remember where the biggest N was located, and fill it back later leaving 0 in the non-max places
- OR same as above, but use the max number everywhere (instead of 0)
- How:
- Architecture
- Usually symmetrical
U-Net (2015)
- FCNN works well
- There are other networks for special cases
- U-Net was created for medical stuff - when we have much smaller datasets
- Named after the U-form of the network
- Major contributions:
- A lot of info gets lost on the sides
- -> Padding done by mirroring what’s inside!
2 1 [1 2 3 4 5] 5 4
- Verbindung von Contraction und Expansion
- Connects the parts in the U to provide info from the original input picture to the downsampled one
Feature pyramid network
- Uses bits from pre-DL times
- Also multiple losses etc., and lateral connections between downsampled and upsampled parts
- Use features from different … TOOD Sl.174
Object detection
- We care about borders when parking or doing surgery - segmentation
- But sometimes we care also about the presence and classes + specific instances of the same object class
- Data:
- Bounding box for each object,
xywh + c
- Bounding box for each object,
- TODO D/ Object segmentation Sl.180
- For each class, you get an
xy
picture where you get 0 where there are no instances, then there’s 1 for first instance, 2 second etc.
Typical object detection pipeline
- Input -> Regions of interests -> Feature extraction -> Classification
- Questions:
- How to find different instances?
- How to find different parts of the same instance (wheels look much different from the rest of the car but they are still part of the same car)
R-CNN (2014)
- Region-based CNN
- One of the first CNN for object detection
- Approach:
- R-CNN selective search:
- Selective-search algo for object candidate (Sl.186)
- Hierarchic clustering for similar regions (Farbe, Textur, Helligkeit etc.)
- Merge ones till you get sth similar
- Then you crop and resize the candidates to a similar size
- TODO really interesting feature bits and saving them to disk
- Then use a linear SVM to predict on the Zielklassen
- Bounding bog regressions to correct the BB of the candidates
- R-CNN selective search:
- Nachteile:
- Mehrere unabhängige Komponenten
- Too much time and place etc.
- Slow
Fast R-CNN
- Improvements
Faster R-CNN
- …
Mask R-CNN - Architecture
- TODO Sl.190+
- Eine Netz mit 3 verschiedene Ausgaben
- Region proposal Network aka RPN
- Vorhersage der Regionen
- Anchor boxes mit Vordefinierte Größen
- Tausende boxes pro Bild
- “Anker” für mögliche Objekten
- Beschrieben durch: Skalierung (scale) + Seitenverhältnis (aspect ratio)
- Sliding Window prozessiert gesamtes Bild and for each position does anchor boxes
- … magic (TODO, Sl.199)
- For each box we get a score about whether it’s an object or bg etc.
- Then we use boxes with $IoU > 0.7$ for training as objects
YOLO
TODO, Sl.200 +
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus