Neural Networks
These are exam preparation notes, subpar in quality and certainly not of divine quality.
See the index with all articles in this series
Connectionist Neurons
A neural network generally has a number of inputs
A typical function would look like this. The part in the brackets is later referred to as
Some of the functions being used for
Logistic Function:
Hyperbolic tangent:
Linear neuron:
Binary neuron:
The
TODO:Transformation
The first weight or input is a bias node which is always 1. It is not always included in equations as
Reasons for nonlinear transfer functions:
- multiple layers could beexpressed as one in linear transfer functions (main reason)
- sign function for classification problems (0,1)
- logistic sigmoidal for probabilities (0..1)
Important variables:
Types of Neural Networks
- Recurrent Neural Networks
There can be loops in the graph - Feedforward Neural Networks (DAG)
no loops - Radial Basis Function Networks
Typical Usecase: Prediction of attributes. MLPs are universal approximators
Always from
NN in Regression
Hessian matrix is second derivative of
Jacobian matrix is first derivative of
Hessian is often too computationally expensive to compute and therefore backpropagation is often used instead of Newton’s Method.
Generalization Error
ERM
Test Error
The test error is reduced using gradient descend.
where
The error is usually quadratic error
The derivative is trivially:
and is later used in backpropagation.
Backpropagation
In backpropagation the weights of the neural network are adjusted so that the test error is reduced. This is achieved by
- Calculating the prediction
- Calculating the test error
- Going back layer by layer and calculating the delta each time *
It would be possible to do backpropagation by applying the chain rule. But that is a lot more computationally expensive than Backpropagation.
Regularization in Deep Learning
- Dropout randomly ignores neurons
Architectures
Convolutional layer
Layer that is only connected to selected previous neurons. For example this can be used in image recognition, having neurons only be connected to some adjacent previous pixels ( a tensor).
Spatial/Feature pooling
Trying to detect features in an image even though the image may be rotated, translated, etc. There are then e.g. three different detection units for a specific pattern that is then aggregated by a neuron with a max() function, to recognize the correctly oriented feature.
Auto-Encoders
Unfortunately excluded from the exam, therefore neglected here
Basically you take an image of what you want to recognize and push it through your network. What you get is a “compressed” version of the image (there is a lot less information in the final layers). In the beginning of your training this will be just noise / randomness. You then have another neural network (the same???) reconstruct the original image.
What is then possible is to compare the reconstruction to the original image and generate error values from it.
You thereby can train two neural networks to meaningfully abstract from images without having to have labelled images.
Time Series
In a time series it is often assumed that y depends on a short time window. Therefore there are convolutions, where some neurons can look “back” in time.
Recurrent NN
Neural Network is “shifted” through time.
All previous inputs are summarized as a vector with a weight vector
n number of timesteps
Cost function with:
There is a vector
TODO: Are the weights
Backpropagation through time
Works just like regular backpropagation.
- Assume all
are independent - compute gradients with backpropagation
- All computed gradients are averaged for weight update.
Exploding / Vanishing gradient
One problem of RNNs is that activity is often either vanishing or exploding over time, when
Echo State Networks
Echo state networks set W and U so that their | y_i | is almost equal to r. (TODO why is r in range 1.3<->3 ?) |
Leaky Units:
There are units that specialize in long or short term memory. This depends on a factor
LSTM
- Delay update of hidden layer
- Special transfer function (only retrieve state in certain cases)
Radial Basis Function Networks
Also see Wikipedia.
A radial basis function is a function that is only dependent on the distance from the center(Usually Eucleadian distance).
Gaussian function often used:
Learning with RBFs
Three different parameters:
centroid (center of basis function) range of influence weights of the output layer
2-Step Learning procedure is an alternative to normal learning of parameters.
- Find centroids and variances
- Determine output weights
Find centroids and variances
Use k-means clustering to find centroids
Choose
Determine output weights
Output weights are found reducing quadratic error with M := number of RBFs:
Pseudo-inverse
TODO: Do we now use Gradient Descent or invertible matrix?
MLP vs RBF
RBFs have fast convergence, as few parameters needs to be changed per training point, as they have negligible influence on far away points.
RBFs fall under curse of dimensionality, need
RBFs are kernel functions that make it possible to map non-linear data into linearity and then do regression on them.
RBFs are useful for low-dimensional data.