Chapters

Independent Component Analysis (ICA)

ICA allows the reconstruction of mixed signals. This could for example be multiple speakers on one audio track.

Requirements

  • Needs some prior knowledge
  • The number of sources to be recovered is a parameter to the algorithm. It is possible to choose a higher number of sources and afterwards remove sources that are only noise.

Limitations

  • sources must have non-Gaussian distributions
  • source amplitudes cannot be recovered

p number of observations

N number of dimensions

pN+N2 unknowns

Syntax

x observations

A Mixing Matrix (must be invertible for ICA to work)

s Sources

s^ Recovered/Estimated Sources

W Unmixing matrix

General principal

x=Ass^=Wx

Different methods leverage different prior knowledge

Cost functions

  • Vanishing cross-correlation functions (QDIAG, FFDIAG) (more or less invented by the institute)
  • non-Gaussianity (fastICA)
  • Statistical Independence DKL(Ps(s^))
  • Infomax H(u^)

Infomax ICA

Variables

  • See above
  • P Probability distribution of th target space
  • H our cost function
  • u^i=f^i(jwijxj)
  • u^ is the vector with the results of our approximation function
  • f^i is a freely chosen sigmoid cdf function that depends on s^i. |f^i(s^i)|=P^si(s^i)f^i(s^i)=s^idyP^si(y)
  • e(α) training error of a specific datapoint.

Infomax ICA uses empirical risk minimization(ERM) to do gradient ascent.

Example for f^i

f^(y)=11+exp(y)f^(y)f^(y)=12f^(y)

We want to maximize EG which relies on the real distribution of the data. Since this distribution is unkown we use the proxy error function ET. ET uses a sum over all the data points instead of an integral over the real distribution. Otherwise the two are identical.

EG=ln|detW|+dxPx(x){l=1Nln(f^l)(k=1Nwlkxk)}ET=ln|detW|+1pα=1p{l=1Nln(f^l)(k=1Nwlkxk)}

The Infomax cost function Infomax is the measure of how much mutual information two random variables have. It can also be expressed as the KL divergence(is equivalent?). We try to minimize the mutual information, because we believe that our sources are independent This is why we need to maximize the cost function

H=du^Pu(u^)lnPuu^=!max

Possible exam task Independent source signals work better if they match f^i.

Find a suitable f^i

Gradient ascent for w

  • Batch Learning Δwij=ETwij=ϵpαpe(α)wij
  • Online Learning Δwij=εe(α)wij Taylor approximation used because inverse is hard to find.

Natural Gradient

The natural gradient is faster than normal gradient descent.

While normal gradient changes the parameters by a fixed rate, the natural gradient changes the outcome distribution by a constant distance. This distance in distributions is measured by the KL Divergence.1

Step size is normalized at each step dZ=dWW1

Second Order (Blind)Source Separation

SOBSS allows the separation of sources after temporal shift or some other form of noise. sources are not iid but assumed to be time correlated / sequential.

Principle (As ambiguous as necessary because I just transcribed this in the lecture) Try different w so that s^ matches in height.

Parameters

  • τ time shift

Cost function minimization depends on the algorithm used on top.

For additional robustness remove τ=0 from the set of possible solutions. This avoids some trivial solutions.

FastICA

Prerequisites

  • Whitened data
  • PCA
  • Kurtosis is known (prior knowledge)

Limitations

  • sensitive to outliers

Additional Parameters

  • u

A is orthogonal matrix.

Find the maximum negentropy based on contrast function.

|kurt(s^)|=!maxzs.t.var(s^)=z12+z22=!1kurt(s^)=z14+z24

We can show that

zopt=(0±1) or zopt=(±10)

Optimize for curtosis

  • Batch learning
  • Online Learning

Batch Learning

  • loop
    • Initialize b with random vector of unit length
    • δb=εsgn[kurt(bTu)]u(bTu)3
    • normalize b

Kurtosis

  • <0: sub-gaussian. (looks like rectangle) uniform
  • 0: Gaussian normal
  • >0: super-gaussian(looks like triangle) laplace

Footnotes