DL 3: CNN
Convolutional neural networks
- Motivation: if we just connect everything to the next layer, we: - Get a lot of params - Lose the position information
Convolutions
- You do it once per color, and using a sliding window to cover everything
- Lokale Features werden durch Filter/Kernel bestimmt
- TODO /D Filter Sl. 120 (?)
- TODO Param. sharing - same filter for all patches
- For each convolution layer, you get more and more complex filters that do more complex features
- Output shape
- Influenced by everything - filter size, padding, etc.
- There are online calculators - link in sl.122 - “convnet calculator”
- TODO - Filterzahl bestimmt wie viele Kanäle:
-
- Input: 3 RGB ones, use 8 filters, get 8 channels as output
-
- We have 4 params
- TODO Terminology:
- Schrittweite == stride
- Filter == Kernel (?)
- Filter count == Filterzahl
- Parameters:
- Padding:
- “valid” == only complete kernels == no padding
- Zero-padding:
- “same” - рамочка вокруг из нулей, then the size stays the same
- Schrittweite / stride
- Stride < filter size -> overlapping convolutions
- TODO S125
- Padding:
- Dilated convolutions
- Bigger receptive field without paying anything for it
- Basically instead of filling every pixel, we do every second
- Almost never used nowadays
- Dimensions:
- Conv1d is for things like .. temperature?
- Or you can use different filter sizes for stuff like a feature covering temp for
- Keras:
- Conv1D, Conv2D, Conv3D
- predictable parameters
Image classification
- Motivation:
- Medicine, Qualitätssichreung, sortierung
- Pic -> one or multiple classes
- Input picture -> Convolution (feature maps) -> Maxpooling -> Fully-connected layer (S.132)
Max-pooling
TODO
Receptive field
- How much does a neuron “see”
- -> and learn
- A neuron in layer 3 sees many more from prev. layers, like a pyramid
Deep CNNs
- The field becomes bigger and bigger based on depth of layers (S. 135)
- The deeper you go the more filters you get and therefore the more complex the features are.
- At the end you have a “picture” of size 1x1xXXXXX features etc. till you get die Ausgabewahrscheinlichkeit für die Zielklassen.
- Example
- digitalisation of Grundrissen/plans, everything worked, but not the doors - the receptive field was just not big enough.
- it should be big enough based on what you want to classify at the end
- Jupyter notebook example - a lot of lines and layers but few params - contrasts with the ones we’d have had if e used dense layers for this
- ImageNet
- Nice visualization S. 139
- TL;DR Transformers etc. work well, both for Vision and for NLP, if they have enough compute+data
- For English we do, for the other languages we don’t
- -> CNNs still matter for these cases
- History of the best vision models
- Alex-Net -
- much more params than the prev 1998 model
- Stride < Filtergröse, Sliding windows overlap.
- ReLu instead of tanh
- 6 times faster to train
- actually converges faster by itself too!
- Dropout for regularization
- Now standard, very interesting then
- Input data:
- Scaled to 256
- Augmentations: flips, crops, colors
-
For testing: 5 crops & flips and took the average predictions of these 10 pictures
- InceptionNet/GoogleNet - 2014
- 10 times fewer params but much deeper
- Inception blocks:
- For each block, multiple convolutions with diff patches (but same padding) are applied
- max pooling at the end
- Output concatenated at the end
- 1x1 Convolutions lernen kanalübergreifende Features
- basically decreasing N of params to make it faster? TODO
- Auxiliary loss
- to avoid vanishing gradients
- help to calculate the final loss
- N Auxiliary classifiers, each with own loss that’s closer to the source, and they help
- TL;DR just much more efficient -> fewer params
- ResNet
- Residual connection / skp connections / lateral conns / identity shortcut connection
- Dienen der Fehlerminimierung
- stellen sicher, dass information nicht verloren geht
- Either we get better, or we stay the same - we make sure we have the option
- F(x)+x
- Verhindert der Vanishing Gradient - always useful in a lot of scenarios
- Similar to highway networks - though they are the generic solution
- Dienen der Fehlerminimierung
- Architecture - VGG19etc - A LOT of layers
- pictured either as 4layers or 34 layers, die Darstellungen sind äquivalent
- ResNet18,34,101,152 etc.
- Dimensionsreduktion durch Erhöhung des Strides (und nicht Pooling)
- Residual connection / skp connections / lateral conns / identity shortcut connection
- Alex-Net -
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus