Supervised learning is a machine learning technique that uses labeled datasets to train AI models to identify the underlying patterns across data points. Labeled data includes features and labels, corresponding outputs which the model uses to understand the relationship between the two.
Many businesses hire large teams of human data annotators, which are sometimes assisted by machines. These annotators often require domain expertise in order to ensure that data is properly labeled. For example, when labelling legal data, annotators might need a background in law. The process of using human annotators to help ensure proper labelling is sometimes referred to as “human in the loop.”
A classic example of supervised learning is spam detection. To teach a model to identify spam, one could expose it to a dataset comprised of thousands of emails, each labeled by humans as either “spam” or “not spam.” The model would review the patterns in the emails, noticing various patterns. For example, emails that have the word “free” in the subject line are more likely to be spam. The model would calculate the statistical likelihood that the word “free” in the subject line corresponds to the label “spam.” Then, when given a new email with no label, the model can apply that calculation, along with many others, to determine whether the new email is spam or not.
This type of machine learning is called “supervised” because it involves human supervision to label all of that data.