# ML Knowledge Base

## Activation Functions

# TODO: add why they different and what types better for what tasks

Allow us add some nonlinearity to the model and to produce a non-linear decision boundary. So the combination of weights coefficients will not “generalized” to linear model. More about another types of activations you may read here.

Sigmoid function (or logistic function)

$\sigma(z) = \frac{1}{1 + e^{-z}}$

Properties: $\sigma(\infty)\approx 1$, $\sigma(-\infty)\approx 0$, but note, that $\sigma(0)=1$.

Note: sigmoid function ($\sigma$) == logistic function so sigmoid neurons can be called as logistic neurons.

Generate probability for discrete classification tasks in which each class is independent and not mutually exclusive. For instance a picture can contain both an elephant and a dog at the same time.

Softmax function

$a^L_j = \frac{e^{z^L_j}}{\sum_k e^{z^L_k}}$

The output activations from softmax are guaranteed to always sum up to 1.

Generate probability for discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For instance a picture can contain elephant or dog, not both.

## Cost Functions

Note that cost function should be non negative!

Cost function also can be called as loss function or objective function.

$C = \frac{1}{2n}\sum_{n}(y - a)^2$

• $y$ - the target output
• $a$ - the network output
• $n$ - the total number of training inputs

Can be called as mean squared error, or just MSE.

Cross-entropy cost function

$C = -\frac{1}{n} \sum_x \left[y_t \ln y_o + (1-y_t ) \ln (1-y_o) \right]$

• $n$ - the total number of items of training data
• $x$ - the sum is over all training inputs
• $y_t$ - corresponding target output
• $y_o$ - the network output(predicted values). Can be replaced with $a$.

## Regularization approaches

L1 regularization
to be filled
L2 regularization
to be filled
Dropout
to be filled

to be filled