serhii.net

In the middle of the desert you can say anything you want

25 Oct 2022

DL 3: CNN

Convolutional neural networks

  • Motivation: if we just connect everything to the next layer, we: - Get a lot of params - Lose the position information

Convolutions

  • You do it once per color, and using a sliding window to cover everything
  • Lokale Features werden durch Filter/Kernel bestimmt
  • TODO /D Filter Sl. 120 (?)
  • TODO Param. sharing - same filter for all patches
  • For each convolution layer, you get more and more complex filters that do more complex features
  • Output shape
    • Influenced by everything - filter size, padding, etc.
    • There are online calculators - link in sl.122 - “convnet calculator”
  • TODO - Filterzahl bestimmt wie viele Kanäle:
      • Input: 3 RGB ones, use 8 filters, get 8 channels as output
  • We have 4 params
  • TODO Terminology:
    • Schrittweite == stride
    • Filter == Kernel (?)
    • Filter count == Filterzahl
  • Parameters:
    • Padding:
      • “valid” == only complete kernels == no padding
      • Zero-padding:
        • “same” - рамочка вокруг из нулей, then the size stays the same
    • Schrittweite / stride
      • Stride < filter size -> overlapping convolutions
      • TODO S125
  • Dilated convolutions
    • Bigger receptive field without paying anything for it
    • Basically instead of filling every pixel, we do every second
    • Almost never used nowadays
  • Dimensions:
    • Conv1d is for things like .. temperature?
    • Or you can use different filter sizes for stuff like a feature covering temp for
  • Keras:
    • Conv1D, Conv2D, Conv3D
    • predictable parameters

Image classification

  • Motivation:
    • Medicine, Qualitätssichreung, sortierung
  • Pic -> one or multiple classes
  • Input picture -> Convolution (feature maps) -> Maxpooling -> Fully-connected layer (S.132)

Max-pooling

TODO

Receptive field

  • How much does a neuron “see”
    • -> and learn
  • A neuron in layer 3 sees many more from prev. layers, like a pyramid

Deep CNNs

  • The field becomes bigger and bigger based on depth of layers (S. 135)
  • The deeper you go the more filters you get and therefore the more complex the features are.
  • At the end you have a “picture” of size 1x1xXXXXX features etc. till you get die Ausgabewahrscheinlichkeit für die Zielklassen.
  • Example
    • digitalisation of Grundrissen/plans, everything worked, but not the doors - the receptive field was just not big enough.
    • it should be big enough based on what you want to classify at the end
  • Jupyter notebook example - a lot of lines and layers but few params - contrasts with the ones we’d have had if e used dense layers for this
  • ImageNet
    • Nice visualization S. 139
    • TL;DR Transformers etc. work well, both for Vision and for NLP, if they have enough compute+data
      • For English we do, for the other languages we don’t
      • -> CNNs still matter for these cases
  • History of the best vision models
    • Alex-Net -
      • much more params than the prev 1998 model
      • Stride < Filtergröse, Sliding windows overlap.
      • ReLu instead of tanh
        • 6 times faster to train
        • actually converges faster by itself too!
      • Dropout for regularization
        • Now standard, very interesting then
      • Input data:
        • Scaled to 256
        • Augmentations: flips, crops, colors
      • For testing: 5 crops & flips and took the average predictions of these 10 pictures

    • InceptionNet/GoogleNet - 2014
      • 10 times fewer params but much deeper
      • Inception blocks:
        • For each block, multiple convolutions with diff patches (but same padding) are applied
        • max pooling at the end
        • Output concatenated at the end
      • 1x1 Convolutions lernen kanalübergreifende Features
        • basically decreasing N of params to make it faster? TODO
      • Auxiliary loss
        • to avoid vanishing gradients
        • help to calculate the final loss
        • N Auxiliary classifiers, each with own loss that’s closer to the source, and they help
      • TL;DR just much more efficient -> fewer params
    • ResNet
      • Residual connection / skp connections / lateral conns / identity shortcut connection
        • Dienen der Fehlerminimierung
          • stellen sicher, dass information nicht verloren geht
          • Either we get better, or we stay the same - we make sure we have the option
          • F(x)+x
        • Verhindert der Vanishing Gradient - always useful in a lot of scenarios
        • Similar to highway networks - though they are the generic solution
      • Architecture - VGG19etc - A LOT of layers
        • pictured either as 4layers or 34 layers, die Darstellungen sind äquivalent
        • ResNet18,34,101,152 etc.
      • Dimensionsreduktion durch Erhöhung des Strides (und nicht Pooling)
Nel mezzo del deserto posso dire tutto quello che voglio.