Chapters

Density Estimation

The goal of density estimation is to be able to give a density estimation for each coordinate in the vector space.

There are two approaches

  • parametric (model based)
    • Gaussian Densities
  • nonparametric (data driven)
    • Kernel Density Estimate

Kernel Density Estimation (exemplary with Gliding Histogram)

Parameter

  • h width of rectangle

Histogram Kernel H

  • x are the coordinates at which we want to measure the density
  • u is the normalized (well, to 1/2 normalized. Why would anyone do that?) distance between two points.

Does the vector given by u end outside our rectangle with width h?

H(u)={1,|uj|<12,j1,...,n0,else

The estimation of density

  • h width of the rectangle
  • n number of dimensions
P^(x)=1hndivide by area with width h1pα=1pH(xx(α)h)

Drawbacks of Gliding Histograms

  • “Bumpy” whenevery a new data point falls into the rectangle (especially with few data points or high dimensionality)
  • Rectangle not really a good choice
  • Optimal size of h non-trivial - needs model selection. lower h leads to overfitting

** Alternatively Gaussian**

Also a Gaussian kernel instead of the rectangle can be used, which reduces most of the side efects.

Parametric Density Estimation

TODO: Figure out what μ and mean (they compose w)

Parametric Density estimation finds a good value for h.

Family of parametric density functions: $$\hat{P}(\underline x;\underline w)

Cost function for model selection

ET=1pα=1plnP^(x(α);w)=!min(w)

Problem: Minimizing the training costs leads to overfitting

==> We needs EG, the generalization costs, but they rely on the knowledge of P ==> Use a proxy function

E^G=1pβ=1pe(β)

Alternative approach: Select the model that gives the highest probability for the already known data points.

P^({x(α)};w)=!m(w)axα=1plnP^(x(α);w)=!min

Probably simple gradient descent

Conditions for multivariate cases

ETμ=0μ=1pα=1px(α)ET=0=1pα=1p(x(α)μ)(x(α)μ)T

Mixture Models - EM

Footnotes