DL 4 - Images and image segmentation
Image segmentation
- Motivation:
- Medicine, autonomous driving, etc.
 
 - Image segmentation
- Pixel eine oder mehrere Klasse(n) zuordnen
 - Unterscheiden nicht verschiedene instanzen
 
 - Data
- In: 
(x,y,color) - Out: 
(x,y,m)wheremis the number of masks/classes 
 - In: 
 
Networks
Fully convolutional network
- /D Backbone is the encoder part
- Oft austauschbar
 
 - /D Deconvolution, generally aka Upsampling
- The opposite of Convolutions, getting back a picture of the same size basically
 - “Rekostruktion höherdimensionaler Darstellungen aus niedrigdimensionalem Input”
 
 - /D Unpooling - opposite of Maxpooling
- How:
- Pooling indices - remember where the biggest N was located, and fill it back later leaving 0 in the non-max places
 - OR same as above, but use the max number everywhere (instead of 0)
 
 
 - How:
 - Architecture
- Usually symmetrical
 
 
U-Net (2015)
- FCNN works well
 - There are other networks for special cases
 - U-Net was created for medical stuff - when we have much smaller datasets
 - Named after the U-form of the network
 - Major contributions:
- A lot of info gets lost on the sides
 - -> Padding done by mirroring what’s inside!
2 1 [1 2 3 4 5] 5 4 - Verbindung von Contraction und Expansion
- Connects the parts in the U to provide info from the original input picture to the downsampled one
 
 
 
Feature pyramid network
- Uses bits from pre-DL times
 - Also multiple losses etc., and lateral connections between downsampled and upsampled parts
 - Use features from different … TOOD Sl.174
 
Object detection
- We care about borders when parking or doing surgery - segmentation
 - But sometimes we care also about the presence and classes + specific instances of the same object class
 - Data:
- Bounding box for each object, 
xywh + c 
 - Bounding box for each object, 
 - TODO D/ Object segmentation Sl.180
 - For each class, you get an 
xypicture where you get 0 where there are no instances, then there’s 1 for first instance, 2 second etc. 
Typical object detection pipeline
- Input -> Regions of interests -> Feature extraction -> Classification
 - Questions:
- How to find different instances?
 - How to find different parts of the same instance (wheels look much different from the rest of the car but they are still part of the same car)
 
 
R-CNN (2014)
- Region-based CNN
 - One of the first CNN for object detection
 - Approach:
- R-CNN selective search:
- Selective-search algo for object candidate (Sl.186)
 - Hierarchic clustering for similar regions (Farbe, Textur, Helligkeit etc.)
 - Merge ones till you get sth similar
 
 - Then you crop and resize the candidates to a similar size
 - TODO really interesting feature bits and saving them to disk
 - Then use a linear SVM to predict on the Zielklassen
 - Bounding bog regressions to correct the BB of the candidates
 
 - R-CNN selective search:
 - Nachteile:
- Mehrere unabhängige Komponenten
 - Too much time and place etc.
 - Slow
 
 
Fast R-CNN
- Improvements
 
Faster R-CNN
- …
 
Mask R-CNN - Architecture
- TODO Sl.190+
 - Eine Netz mit 3 verschiedene Ausgaben
 - Region proposal Network aka RPN
- Vorhersage der Regionen
 - Anchor boxes mit Vordefinierte Größen
 - Tausende boxes pro Bild
 - “Anker” für mögliche Objekten
 - Beschrieben durch: Skalierung (scale) + Seitenverhältnis (aspect ratio)
 - Sliding Window prozessiert gesamtes Bild and for each position does anchor boxes
 - … magic (TODO, Sl.199)
 - For each box we get a score about whether it’s an object or bg etc.
 - Then we use boxes with $IoU > 0.7$ for training as objects
 
 
YOLO
TODO, Sl.200 +
				
					Nel mezzo del deserto posso dire tutto quello che voglio.
				
			
comments powered by Disqus