Following the principle “You only understood something thoroughly if you can explain it” - here come the prepping notes for Machine Intelligence II. If no sources are indicated, it comes from the lecture slides.

Note This was foremostly written for my own understanding, so it might contain incomplete explanations

Chapters

General Terms and tools

A lot of the different methods rely on some general methodology that will be reused. Need a refresher in Matrix multiplication? Oh, and dot product is the same as a scalar product.

Centered Data

Centering data means making the center of mass 0. This means every dimension is averaged and the average is then subtracted from all data points for each dimension.

X=X1pαpx(α)

also called first moment

or with numpy:

# x is our data
x_centered = x - np.mean(x, axis=0)

Covariance matrix

Assuming p centered data points x(α):

Whitened Data

Whitening turns your data matrix into a matrix with a covariance matrix which is an identity matrix. The data is then uncorrelated (but might be dependent). This is useful to find e.g. outliers.

Cij=1pα=1pxi(α)xj(α)orC=1pxTx

Kullback-Leibler-Divergence

Kullback-Leibler-Divergence measures the difference / distance between two probability distributions - in this example P and P^

DKL[P(x),P^(x;w)]=dxP(x)lnP(x)P^(x;w)=min(w)

Jacobian Matrix

For a function f:IRnIRm the Jacobian matrix is filled with the derivatives

[f1x1f1xnf1xnfmxn]

Mercers theorem

From the slides:

Every positive semidefinite definite kernel k corresponds to a scalar product in some metric feature space

Markov Process

A markov process is only dependent on the most recent state. E.g. Its probabilities into which state it will go next are independent of any older states.

Variance

σ2=E[(xμ)2]

Discrete

σ2=1pα=1p(xαμ)2

Footnotes