Linear discriminant analysis (LDA) is an approach used in supervised machine learning to solve multi-class classification problems. LDA separates multiple classes with multiple features through data dimensionality reduction. This technique is important in data science as it helps optimize machine learning models.
Linear discriminant analysis, also known as normal discriminant analysis (NDA) or discriminant function analysis (DFA), follows a generative model framework. This means LDA algorithms model the data distribution for each class and use Bayes' theorem1 to classify new data points. Bayes calculates conditional probabilities—the probability of an event given some other event has occurred. LDA algorithms make predictions by using Bayes to calculate the probability of whether an input data set will belong to a particular output. For a review of Bayesian statistics and how it impacts supervised learning algorithms, see Naïve Bayes classifiers.
LDA works by identifying a linear combination of features that separates or characterizes two or more classes of objects or events. LDA does this by projecting data with two or more dimensions into one dimension so that it can be more easily classified. The technique is, therefore, sometimes referred to as dimensionality reduction. This versatility ensures that LDA can be used for multi-class data classification problems, unlike logistic regression, which is limited to binary classification. LDA is thus often applied to enhance the operation of other learning classification algorithms such as decision tree, random forest or support vector machines (SVM).
Linear discriminant analysis (LDA) is based on Fisher’s linear discriminant, a statistical method developed by Sir Ronald Fisher in the 1930s and later simplified by C. R. Rao as a multi-class version. Fisher's method aims to identify a linear combination of features that discriminates between two or more classes of labeled objects or events.
Fisher’s method reduces dimensions by separating classes of projected data. Separation means maximizing the distance between the projected means and minimizing the projected variance within classes.
Suppose that a bank is deciding whether to approve or reject loan applications. The bank uses two features to make this decision: the applicant's credit score and annual income.
Here, the two features or classes are plotted on a 2-dimensional (2D) plane with an X-Y axis. If we tried to classify approvals using just one feature, we might observe overlap. By applying LDA, we can draw a straight line that completely separates these two class data points. LDA achieves this by using the X–Y axis to create a new axis, separating the different classes with a straight line and projecting data onto the new axis.
To create this new axis and reduce dimensionality, LDA follows these criteria:
LDAs operate by projecting a feature space, that is, a dataset with n-dimensions, onto a smaller space "k", where k is less than or equal to n – 1, without losing class information. An LDA model comprises the statistical properties that are calculated for the data in each class. Where there are multiple features or variables, these properties are calculated over the multivariate Gaussian distribution3.
The multivariates are:
The statistical properties that are estimated from the data set are fed into the LDA function to make predictions and create the LDA model. There are some constraints to bear in mind, as the model assumes the following:
For these reasons, LDA may not perform well in high-dimensional feature spaces.
Dimensionality reduction involves separating data points with a straight line. Mathematically, linear transformations are analyzed using eigenvectors and eigenvalues. Imagine you have mapped out a data set with multiple features, resulting in a multi-dimensional scatterplot. Eigenvectors provide the "direction" within the scatterplot. Eigenvalues denote the importance of this directional data. A high eigenvalue means the associated eigenvector is more critical.
During dimensionality reduction, the eigenvectors are calculated from the data set and collected in two scatter-matrices:
To use LDA effectively, it’s essential to prepare the data set beforehand. These are the steps and best practices for implementing LDA:
1. Preprocess the data to ensure that it is normalized and centered
This is achieved by passing the n-component parameter of the LDA, which identifies the number of linear discriminants to retrieve.
2. Choose an appropriate number of dimensions for the lower-dimensional space
This is achieved by passing the n-component parameter of the LDA, which identifies the number of linear discriminants to retrieve.
3. Regularize the model
Regularization aims to prevent overfitting, where the statistical model fits exactly against its training data and undermines its accuracy.
4. Using cross-validation to evaluate model performance
You can evaluate classifiers such as LDA by plotting a confusion matrix, with actual class values as rows and predicted class values as columns. A confusion matrix makes it easy to see whether a classifier is confusing two classes—that is, mislabeling one class as another. For example, consider a 10 x 10 confusion matrix predicting images from zero through 9. Actuals are plotted in rows on the y-axis. Predictions are plotted in columns on the x-axis. To see how many times a classifier confused images of 4s and 9s in the 10 x 10 confusion matrix example, you would check the 4th row and the 9th column.
The linear discriminant function helps make decisions in classification problems by separating data points based on features and classifying them into different classes or categories. The computation process can be summarized in these key steps:
The between-class variance is the separability between classes—the distance between the class means.
The within-class variance is the distance between class means and samples.
This maximizes the between-class variance and minimizes the within-class variance. We can represent the linear discriminant function for two classes mathematically with the following.
δ(x) = x * ( σ2 * (μ0-μ1) - 2 * σ2 * (μ02-μ12) + ln(P(w0) / P(w1)))
Where:
Let's use the equation to work through a loan approval example. To recap, the bank is deciding whether to approve or reject loan applications. The bank uses two features to make this decision: the applicant's credit score (x) and annual income. The bank has collected historical data on previous loan applicants and whether the loans were approved.
Using the linear discriminant function, the bank can calculate a score (δ(x)) for each loan application.
The equation for the linear discriminant function might look similar to this:
δ(x) = x * ( σ2 * (μ0-μ1) - 2 * σ2 * (μ02-μ12) + ln(P(w0) / P(w1)))
The bank computes the linear discriminant function for each loan application.
The bank can thus automate its loan approval process, making quicker and more consistent decisions while minimizing human bias.
These are typical scenarios where LDA can be applied to tackle complex problems and help organizations make better decisions.
To mitigate risk, financial institutions must identify and minimize credit default. LDA can help identify applicants who might be likely to default on loans from those who are creditworthy by sifting through financial factors and behavior data.
Fast and accurate disease diagnosis is crucial for effective treatment. Hospitals and healthcare providers must interpret an immense amount of medical data. LDA helps simplify complex data sets and improve diagnostic accuracy by identifying patterns and relationships in patient data.
For effective marketing, e-commerce businesses must be able to categorize diverse customer bases. LDA is pivotal in segmenting customers, enabling e-commerce companies to tailor their marketing strategies for different customer groups. The outcome is more personalized shopping experiences, increasing customer loyalty and sales.
Producing high-quality goods while minimizing defects is a fundamental challenge. Sensor data from machinery can be used with LDA to identify patterns associated with defects. By detecting irregularities in real-time, manufacturers can take immediate corrective actions, and they can improve product quality and reduce wastage.
You can maximize your advertising budget by targeting the right audience with personalized content, but identifying those respective audience segments can be difficult. LDA can simplify this process by classifying customer attributes and behaviors, enhancing the customization of advertising campaigns. This approach can lead to a higher return on investment (ROI) and a better customer experience.
To delve deeper into linear discriminant analysis with Python and leverage the scikit-learn library, you can explore this tutorial Learn classification algorithms using Python and scikit-learn in IBM watsonx™. The tutorial helps you with the basics of solving a classification-based machine learning problem using Python and scikit-learn (also known as sklearn).
For the step-by-step tutorial, you will first import the necessary Python libraries to work with the Iris dataset, perform data preprocessing, and create and evaluate your LDA model:
<Python code snippet>
import numpy as np import pandas as pd import matplotlib.pyplot as plt import sklearn import seaborn as sns from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.model_selection import train_test_split from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, confusion_matrix
If the libraries are not installed, you can resolve this using pip install.
See also this scikit-learn documentation for an overview of key parameters, attributes and general examples of Python implementations using sklearn.discriminant_analysis.LinearDiscriminantAnalysis.
Understanding the advantages and limitations of linear discriminant analysis (LDA) is crucial when applying it to various classification problems. Knowledge of tradeoffs helps data scientists and machine learning practitioners make informed decisions about its suitability for a particular task.
- Shared mean distributions: LDA encounters challenges when class distributions share means. LDA struggles to create a new axis that linearly separates both classes. As a result, LDA might not effectively discriminate between classes with overlapping statistical properties. For example, imagine a scenario in which two species of flowers have highly similar petal length and width. LDA may find it difficult to separate these species based on these features alone. Alternative techniques, such as nonlinear discriminant analysis methods, are preferred here.
- Not suitable for unlabeled data: LDA is applied as a supervised learning algorithm–that is, it classifies or separates labeled data. In contrast, principal component analysis (PCA), another dimension reduction technique, ignores class labels and preserves variance.
IBM® Granite™ is our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.
Explore supervised learning approaches such as support vector machines and probabilistic classifiers.
Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.
Learn how to select the most suitable AI foundation model for your use case.
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.
1 James Joyce, Bayes' Theorem, Stanford Encyclopedia of Philosophy, 2003
2Dan A. Simovici, Lecture notes on Fisher Linear Discriminant Name, 2013
3 Penn State Eberly College of Science, Linear Discriminant Analysis, 2023
4 J. T. Oates, Lecture notes on Linear Discriminant Analysis, 2014
5 Guangliang Chen, lecture notes on Linear Discriminant Analysis (LDA), 2020
6, 7 sci-kit learn, Linear and Quadratic Discriminant Analysis, 2023