Generalizing Linear Discriminant Analysis

37 Slides873.90 KB

Generalizing Linear Discriminant Analysis

Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller -Maintain the class separation Reason -Reduce computational costs -Minimize overfitting

Linear Discriminant Analysis Want to reduce dimensionality while preserving ability to discriminate Figures from [1]

Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equation from [1]

Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equations from [1]

Linear Discriminant Analysis Figure from [1]

Linear Discriminant Analysis Fisher’s solution.

Linear Discriminant Analysis Fisher’s solution Scatter: Equation from [1]

Linear Discriminant Analysis Fisher’s solution Scatter: Maximize: Equations from [1]

Linear Discriminant Analysis Fisher’s solution Figure from [1]

Linear Discriminant Analysis How to get optimum w*?

Linear Discriminant Analysis How to get optimum w*? Must express J(w) as a function of w. Equation from [1]

Linear Discriminant Analysis How to get optimum w*8 Equation from [1]

Linear Discriminant Analysis How to get optimum w* Equations modified from [1]

Linear Discriminant Analysis How to get optimum w* Equation from [1]

Linear Discriminant Analysis How to get optimum w* Equation from [1]

Linear Discriminant Analysis How to get optimum w* Equations from [1]

Linear Discriminant Analysis How to generalize for 2 classes: -Instead of a single projection, we calculate a matrix of projections.

Linear Discriminant Analysis How to generalize for 2 classes: -Instead of a single projection, we calculate a matrix of projections. -Within-class scatter becomes: -Between-class scatter becomes: Equations from [1]

Linear Discriminant Analysis How to generalize for 2 classes Here, W is a projection matrix. Equation from [1]

Linear Discriminant Analysis Limitations of LDA: -Parametric method -Produces at most (C-1) projections Benefits of LDA: -Linear Decision Boundaries Human interpretation Implementation -Good classification results

Flexible Discriminant Analysis

Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem.

Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish)

Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) Linear regression can be generalized into more flexible, nonparametric forms of regression. (Parametric – mean, variance )

Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) Linear regression can be generalized into more flexible, nonparametric forms of regression. (Parametric – mean, variance ) Expands the set of predictors via basis expansions

Flexible Discriminant Analysis Figure from [2]

Penalized Discriminant Analysis

Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. Directly curbing ‘overfitting’ problem

Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. Directly curbing ‘overfitting’ problem Positively correlated predictors lead to noisy, negatively correlated coefficient estimates, and this noise results in unwanted sampling variance. Example: images

Penalized Discriminant Analysis Images from [2]

Mixture Discriminant Analysis

Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian:

Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian: -Model each class as a mixture of two or more Gaussian components. -All components sharing the same covariance matrix

Mixture Discriminant Analysis Image from [2]

Sources 1. Gutierrez-Osuna, Ricardo– “CSCE 666 Pattern Analysis – Lecture 10” http:// research.cs.tamu.edu/prism/lectures/pr/pr l10.pdf 2. Hastie , Trever, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 3. Raschka, Sebastian - “Linear Discriminant Analysis bit by bit” http://sebastianraschka.com/Articles/2014 python lda.html

END.

Back to top button