DL 3: CNN
Convolutional neural networks
- Motivation: if we just connect everything to the next layer, we: - Get a lot of params - Lose the position information
 
Convolutions
- You do it once per color, and using a sliding window to cover everything
 - Lokale Features werden durch Filter/Kernel bestimmt
 - TODO /D Filter Sl. 120 (?)
 - TODO Param. sharing - same filter for all patches
 - For each convolution layer, you get more and more complex filters that do more complex features
 - Output shape
- Influenced by everything - filter size, padding, etc.
 - There are online calculators - link in sl.122 - “convnet calculator”
 
 - TODO - Filterzahl bestimmt wie viele Kanäle:
- 
- Input: 3 RGB ones, use 8 filters, get 8 channels as output
 
 
 - 
 - We have 4 params
 - TODO Terminology:
- Schrittweite == stride
 - Filter == Kernel (?)
 - Filter count == Filterzahl
 
 - Parameters:
- Padding:
- “valid” == only complete kernels == no padding
 - Zero-padding:
- “same” - рамочка вокруг из нулей, then the size stays the same
 
 
 - Schrittweite / stride
- Stride < filter size -> overlapping convolutions
 - TODO S125
 
 
 - Padding:
 - Dilated convolutions
- Bigger receptive field without paying anything for it
 - Basically instead of filling every pixel, we do every second
 - Almost never used nowadays
 
 - Dimensions:
- Conv1d is for things like .. temperature?
 - Or you can use different filter sizes for stuff like a feature covering temp for
 
 - Keras:
- Conv1D, Conv2D, Conv3D
 - predictable parameters
 
 
Image classification
- Motivation:
- Medicine, Qualitätssichreung, sortierung
 
 - Pic -> one or multiple classes
 - Input picture -> Convolution (feature maps) -> Maxpooling -> Fully-connected layer (S.132)
 
Max-pooling
TODO
Receptive field
- How much does a neuron “see”
- -> and learn
 
 - A neuron in layer 3 sees many more from prev. layers, like a pyramid
 
Deep CNNs
- The field becomes bigger and bigger based on depth of layers (S. 135)
 - The deeper you go the more filters you get and therefore the more complex the features are.
 - At the end you have a “picture” of size 1x1xXXXXX features etc. till you get die Ausgabewahrscheinlichkeit für die Zielklassen.
 - Example
- digitalisation of Grundrissen/plans, everything worked, but not the doors - the receptive field was just not big enough.
 - it should be big enough based on what you want to classify at the end
 
 - Jupyter notebook example - a lot of lines and layers but few params - contrasts with the ones we’d have had if e used dense layers for this
 - ImageNet
- Nice visualization S. 139
 - TL;DR Transformers etc. work well, both for Vision and for NLP, if they have enough compute+data
- For English we do, for the other languages we don’t
 - -> CNNs still matter for these cases
 
 
 - History of the best vision models
- Alex-Net -
- much more params than the prev 1998 model
 - Stride < Filtergröse, Sliding windows overlap.
 - ReLu instead of tanh
- 6 times faster to train
 - actually converges faster by itself too!
 
 - Dropout for regularization
- Now standard, very interesting then
 
 - Input data:
- Scaled to 256
 - Augmentations: flips, crops, colors
 
 - 
For testing: 5 crops & flips and took the average predictions of these 10 pictures
 
 - InceptionNet/GoogleNet - 2014
- 10 times fewer params but much deeper
 - Inception blocks:
- For each block, multiple convolutions with diff patches (but same padding) are applied
 - max pooling at the end
 - Output concatenated at the end
 
 - 1x1 Convolutions lernen kanalübergreifende Features
- basically decreasing N of params to make it faster? TODO
 
 - Auxiliary loss
- to avoid vanishing gradients
 - help to calculate the final loss
 - N Auxiliary classifiers, each with own loss that’s closer to the source, and they help
 
 - TL;DR just much more efficient -> fewer params
 
 - ResNet
- Residual connection  / skp connections / lateral conns / identity shortcut connection
- Dienen der Fehlerminimierung
- stellen sicher, dass information nicht verloren geht
 - Either we get better, or we stay the same - we make sure we have the option
 - F(x)+x
 
 - Verhindert der Vanishing Gradient - always useful in a lot of scenarios
 - Similar to highway networks - though they are the generic solution
 
 - Dienen der Fehlerminimierung
 - Architecture - VGG19etc - A LOT of layers
- pictured either as 4layers or 34 layers, die Darstellungen sind äquivalent
 - ResNet18,34,101,152 etc.
 
 - Dimensionsreduktion durch Erhöhung des Strides (und nicht Pooling)
 
 - Residual connection  / skp connections / lateral conns / identity shortcut connection
 
 - Alex-Net -
 
				
					Nel mezzo del deserto posso dire tutto quello che voglio.
				
			
comments powered by Disqus