On this page I will try distinguish various approaches that now used for building NN. For example using convolution or LSTM is an architecture approach. But using batch normalization or not - is optimization, because it can be added to any network without any architecture changes.
- Convs of with various kernels
- 1x1 convolutions from Network-in-network(NiN). With them we may reduce features > perform usual convolution > increase number of features
- Average pooling layer as part of the last classifier
- Inception module(parallel computation of various filter with 1x1 convs and after concatenating them)
- Flattened convolutions(Cx1, 1xC kernels)
- Bypassing features over two layers(as in ResNet)
- Concatenating features from current layer with features from previous ones(as in DenseNet)
- Inception V4 - combine ResNet features propagating approach with Inception module.
- Combine Inception Block with DenseNet approach.
- Blog post about various Neural Networks for image classification.
- XCeption block with separable convolutions(with or without ReLU after it).
- Depthwise separable convolution filters initial paper
- LSTM or GRU cell
- attention mechanisms
- Various activation functions
- Max pooling or average pooling
- Use conv with stride without overlaping, not average/max pooling
- 1x1 convs and then separable by channels 3x3 convs
- Separable by channnels 3x3 convs and after 1x1 convs for all features
- Batch norm
- Regularization loss
- Various learning rate
- Dataset augmentation
- Learn network to one image size(224x224) and fine tune after for less epochs to larger size(448x448 for example)
- Train image detection network with image classification dataset
A systematic evaluation of CNN modules:
- Link to initial paper
- use ELU non-linearity without batchnorm or ReLU with it.
- apply a learned colorspace transformation of RGB.
- use the linear learning rate decay policy.
- use a sum of the average and max pooling layers.
- use mini-batch size around 128 or 256. If this is too big for your GPU, decrease the learning rate proportionally to the batch size.
- use fully-connected layers as convolutional and average the predictions for the final decision.
- when investing in increasing training set size, check if a plateau has not been reach.
- cleanliness of the data is more important then the size.
- if you cannot increase the input image size, reduce the stride in the con- sequent layers, it has roughly the same effect.
- if your network has a complex and highly optimized architecture, like e.g. GoogLeNet, be careful with modifications.