Estimation theory - Kernel Density Estimation
Chapters
- General Terms and tools
- PCA
- PCA
- Hebbian Learning
- Kernel-PCA
- Source Separation
- ICA
- Infomax ICA
- Second Order Source Separation
- FastICA
- Stochastic Optimization
- Clustering
- k-means Clustering
- Pairwise Clustering
- Self-Organising Maps
- Locally Linear Embedding
- Estimation Theory
- Density Estimation
- Kernel Density Estimation
- Parametric Density Estimation
- Mixture Models - Estimation Models
- Density Estimation
Density Estimation
The goal of density estimation is to be able to give a density estimation for each coordinate in the vector space.
There are two approaches
- parametric (model based)
- Gaussian Densities
- nonparametric (data driven)
- Kernel Density Estimate
Kernel Density Estimation (exemplary with Gliding Histogram)
Parameter
width of rectangle
Histogram Kernel
are the coordinates at which we want to measure the density is the normalized (well, to 1/2 normalized. Why would anyone do that?) distance between two points.
Does the vector given by
The estimation of density
width of the rectangle number of dimensions
Drawbacks of Gliding Histograms
- “Bumpy” whenevery a new data point falls into the rectangle (especially with few data points or high dimensionality)
- Rectangle not really a good choice
- Optimal size of
non-trivial - needs model selection. lower h leads to overfitting
** Alternatively Gaussian**
Also a Gaussian kernel instead of the rectangle can be used, which reduces most of the side efects.
Parametric Density Estimation
TODO: Figure out what
Parametric Density estimation finds a good value for
Family of parametric density functions: $$\hat{P}(\underline x;\underline w)
Cost function for model selection
Problem: Minimizing the training costs leads to overfitting
==> We needs
Alternative approach: Select the model that gives the highest probability for the already known data points.
Probably simple gradient descent
Conditions for multivariate cases