Source Separation (ICA)
Chapters
- General Terms and tools
 - PCA
    
- PCA
 - Hebbian Learning
 - Kernel-PCA
 
 - Source Separation
    
- ICA
 - Infomax ICA
 - Second Order Source Separation
 - FastICA
 
 - Stochastic Optimization
 - Clustering
    
- k-means Clustering
 - Pairwise Clustering
 - Self-Organising Maps
 - Locally Linear Embedding
 
 - Estimation Theory
    
- Density Estimation
        
- Kernel Density Estimation
 - Parametric Density Estimation
 
 - Mixture Models - Estimation Models
 
 - Density Estimation
        
 
Independent Component Analysis (ICA)
ICA allows the reconstruction of mixed signals. This could for example be multiple speakers on one audio track.
Requirements
- Needs some prior knowledge
 - The number of sources to be recovered is a parameter to the algorithm. It is possible to choose a higher number of sources and afterwards remove sources that are only noise.
 
Limitations
- sources must have non-Gaussian distributions
 - source amplitudes cannot be recovered
 
Syntax
General principal
Different methods leverage different prior knowledge
Cost functions
- Vanishing cross-correlation functions (QDIAG, FFDIAG) (more or less invented by the institute)
 - non-Gaussianity (fastICA)
 - Statistical Independence 
 - Infomax 
 
Infomax ICA
Variables
- See above
  Probability distribution of th target space our cost function- 
  is the vector with the results of our approximation function is a freely chosen sigmoid cdf function that depends on . training error of a specific datapoint.
Infomax ICA uses empirical risk minimization(ERM) to do gradient ascent.
Example for 
We want to maximize 
The Infomax cost function Infomax is the measure of how much mutual information two random variables have. It can also be expressed as the KL divergence(is equivalent?). We try to minimize the mutual information, because we believe that our sources are independent This is why we need to maximize the cost function
Possible exam task
Independent source signals work better if they match 
Find a suitable 
Gradient ascent for w
- Batch Learning 
 - Online Learning 
 Taylor approximation used because inverse is hard to find. 
Natural Gradient
The natural gradient is faster than normal gradient descent.
While normal gradient changes the parameters by a fixed rate, the natural gradient changes the outcome distribution by a constant distance. This distance in distributions is measured by the KL Divergence.1
Step size is normalized at each step 
Second Order (Blind)Source Separation
SOBSS allows the separation of sources after temporal shift or some other form of noise. sources are not iid but assumed to be time correlated / sequential.
Principle (As ambiguous as necessary because I just transcribed this in the lecture)
Try different 
Parameters
 time shift
Cost function minimization depends on the algorithm used on top.
For additional robustness remove 
FastICA
Prerequisites
- Whitened data
 - PCA
 - Kurtosis is known (prior knowledge)
 
Limitations
- sensitive to outliers
 
Additional Parameters
Find the maximum negentropy based on contrast function.
We can show that
Optimize for curtosis
- Batch learning
 - Online Learning
 
Batch Learning
- loop
    
- Initialize 
 with random vector of unit length - 
 - normalize 
 
 - Initialize 
 
Kurtosis
- <0: sub-gaussian. (looks like rectangle) uniform
 - 0: Gaussian normal
 - >0: super-gaussian(looks like triangle) laplace
 
Footnotes
- 
      
Kevin Frans 2016, A[sic!] intuitive explanation of natural gradient descent ↩