ML Knowledge Base
Activation Functions
# TODO: add why they different and what types better for what tasks
Allow us add some nonlinearity to the model and to produce a nonlinear decision boundary. So the combination of weights coefficients will not "generalized" to linear model. More about another types of activations you may read here.
 Sigmoid function (or logistic function)

\(\sigma(z) = \frac{1}{1 + e^{z}}\)
Properties: \(\sigma(\infty)\approx 1\), \(\sigma(\infty)\approx 0\), but note, that \(\sigma(0)=1\).
Note: sigmoid function (\(\sigma\)) == logistic function so sigmoid neurons can be called as logistic neurons.
Generate probability for discrete classification tasks in which each class is independent and not mutually exclusive. For instance a picture can contain both an elephant and a dog at the same time.
 Softmax function

\(a^L_j = \frac{e^{z^L_j}}{\sum_k e^{z^L_k}}\)
The output activations from softmax are guaranteed to always sum up to 1.
Generate probability for discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For instance a picture can contain elephant or dog, not both.
Cost Functions
Note that cost function should be non negative!
Cost function also can be called as loss function or objective function.
 Quadratic cost function

\(C = \frac{1}{2n}\sum_{n}(y  a)^2\)
 \(y\)  the target output
 \(a\)  the network output
 \(n\)  the total number of training inputs
Can be called as mean squared error, or just MSE.
 Crossentropy cost function

\(C = \frac{1}{n} \sum_x \left[y_t \ln y_o + (1y_t ) \ln (1y_o) \right]\)
 \(n\)  the total number of items of training data
 \(x\)  the sum is over all training inputs
 \(y_t\)  corresponding target output
 \(y_o\)  the network output(predicted values). Can be replaced with \(a\).
Regularization approaches
 L1 regularization
 to be filled
 L2 regularization
 to be filled
 Dropout
 to be filled
Evaluation metrics
to be filled