serhii.net

In the middle of the desert you can say anything you want

01 Nov 2022

DL 4 - Images and image segmentation

Image segmentation

  • Motivation:
    • Medicine, autonomous driving, etc.
  • Image segmentation
    • Pixel eine oder mehrere Klasse(n) zuordnen
    • Unterscheiden nicht verschiedene instanzen
  • Data
    • In: (x,y,color)
    • Out: (x,y,m) where m is the number of masks/classes

Networks

Fully convolutional network

  • /D Backbone is the encoder part
    • Oft austauschbar
  • /D Deconvolution, generally aka Upsampling
    • The opposite of Convolutions, getting back a picture of the same size basically
    • “Rekostruktion höherdimensionaler Darstellungen aus niedrigdimensionalem Input”
  • /D Unpooling - opposite of Maxpooling
    • How:
      • Pooling indices - remember where the biggest N was located, and fill it back later leaving 0 in the non-max places
      • OR same as above, but use the max number everywhere (instead of 0)
  • Architecture
    • Usually symmetrical

U-Net (2015)

  • FCNN works well
  • There are other networks for special cases
  • U-Net was created for medical stuff - when we have much smaller datasets
  • Named after the U-form of the network
  • Major contributions:
    • A lot of info gets lost on the sides
    • -> Padding done by mirroring what’s inside!
      2 1 [1 2 3 4 5] 5 4
      
    • Verbindung von Contraction und Expansion
      • Connects the parts in the U to provide info from the original input picture to the downsampled one

Feature pyramid network

  • Uses bits from pre-DL times
  • Also multiple losses etc., and lateral connections between downsampled and upsampled parts
  • Use features from different … TOOD Sl.174

Object detection

  • We care about borders when parking or doing surgery - segmentation
  • But sometimes we care also about the presence and classes + specific instances of the same object class
  • Data:
    • Bounding box for each object, xywh + c
  • TODO D/ Object segmentation Sl.180
  • For each class, you get an xy picture where you get 0 where there are no instances, then there’s 1 for first instance, 2 second etc.

Typical object detection pipeline

  • Input -> Regions of interests -> Feature extraction -> Classification
  • Questions:
    • How to find different instances?
    • How to find different parts of the same instance (wheels look much different from the rest of the car but they are still part of the same car)

R-CNN (2014)

  • Region-based CNN
  • One of the first CNN for object detection
  • Approach:
    • R-CNN selective search:
      • Selective-search algo for object candidate (Sl.186)
      • Hierarchic clustering for similar regions (Farbe, Textur, Helligkeit etc.)
      • Merge ones till you get sth similar
    • Then you crop and resize the candidates to a similar size
    • TODO really interesting feature bits and saving them to disk
    • Then use a linear SVM to predict on the Zielklassen
    • Bounding bog regressions to correct the BB of the candidates
  • Nachteile:
    • Mehrere unabhängige Komponenten
    • Too much time and place etc.
    • Slow

Fast R-CNN

  • Improvements

Faster R-CNN

Mask R-CNN - Architecture

  • TODO Sl.190+
  • Eine Netz mit 3 verschiedene Ausgaben
  • Region proposal Network aka RPN
    • Vorhersage der Regionen
    • Anchor boxes mit Vordefinierte Größen
    • Tausende boxes pro Bild
    • “Anker” für mögliche Objekten
    • Beschrieben durch: Skalierung (scale) + Seitenverhältnis (aspect ratio)
    • Sliding Window prozessiert gesamtes Bild and for each position does anchor boxes
    • … magic (TODO, Sl.199)
    • For each box we get a score about whether it’s an object or bg etc.
    • Then we use boxes with $IoU > 0.7$ for training as objects

YOLO

TODO, Sl.200 +

Nel mezzo del deserto posso dire tutto quello che voglio.