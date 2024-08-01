Machine learning algorithms fall into five broad categories: supervised learning, unsupervised learning, semi-supervised learning, self-supervised and reinforcement learning.

1. Supervised machine learning

Supervised machine learning is a type of machine learning where the model is trained on a labeled dataset (i.e., the target or outcome variable is known). For instance, if data scientists were building a model for tornado forecasting, the input variables might include date, location, temperature, wind flow patterns and more, and the output would be the actual tornado activity recorded for those days.

Supervised learning is commonly used for risk assessment, image recognition, predictive analytics and fraud detection, and comprises several types of algorithms.

Regression algorithms —predict output values by identifying linear relationships between real or continuous values (e.g., temperature, salary). Regression algorithms include linear regression, random forest and gradient boosting, as well as other subtypes.

2. Unsupervised machine learning

Unsupervised learning algorithms—like Apriori, Gaussian Mixture Models (GMMs) and principal component analysis (PCA)—draw inferences from unlabeled datasets, facilitating exploratory data analysis and enabling pattern recognition and predictive modeling.

The most common unsupervised learning method is cluster analysis, which uses clustering algorithms to categorize data points according to value similarity (as in customer segmentation or anomaly detection). Association algorithms allow data scientists to identify associations between data objects inside large databases, facilitating data visualization and dimensionality reduction.

K-means clustering —assigns data points into K groups, where the data points closest to a given centroid are clustered under the same category and K represents clusters based on their size and level of granularity. K-means clustering is commonly used for market segmentation, document clustering, image segmentation and image compression.

Unsupervised ML models are often behind the “customers who bought this also bought…” types of recommendation systems.

3. Self-supervised machine learning

Self-supervised learning (SSL) enables models to train themselves on unlabeled data, instead of requiring massive annotated and/or labeled datasets. SSL algorithms, also called predictive or pretext learning algorithms, learn one part of the input from another part, automatically generating labels and transforming unsupervised problems into supervised ones. These algorithms are especially useful for jobs like computer vision and NLP, where the volume of labeled training data needed to train models can be exceptionally large (sometimes prohibitively so).

4. Reinforcement learning

Reinforcement learning, also called reinforcement learning from human feedback (RLHF), is a type of dynamic programming that trains algorithms using a system of reward and punishment. To deploy reinforcement learning, an agent takes actions in a specific environment to reach a predetermined goal. The agent is rewarded or penalized for its actions based on an established metric (typically points), encouraging the agent to continue good practices and discard bad ones. With repetition, the agent learns the best strategies.

Reinforcement learning algorithms are common in video game development and are frequently used to teach robots how to replicate human tasks.

5. Semi-supervised learning

The fifth type of machine learning technique offers a combination between supervised and unsupervised learning.

Semi-supervised learning algorithms are trained on a small labeled dataset and a large unlabeled dataset, with the labeled data guiding the learning process for the larger body of unlabeled data. A semi-supervised learning model might use unsupervised learning to identify data clusters and then use supervised learning to label the clusters.

Generative adversarial networks (GANs)—deep learning tool that generates unlabeled data by training two neural networks—are an example of semi-supervised machine learning.

Regardless of type, ML models can glean data insights from enterprise data, but their vulnerability to human/data bias make responsible AI practices an organizational imperative.