Unlike supervised learning or fine-tuning, in which a classifier is trained on the exact tasks it will be used for and the training set contains the same classes the model will be tested on, meta learning takes a broader, more indirect approach. Whereas approaches built upon transfer learning adapt pre-trained models, meta learning methods often train systems end-to-end from scratch.
According to Santoro, et al, “meta learning” refers to scenarios in which multiple tasks are used to train a model at both a short-term and long-term level. Within each task, the model learns rapidly to make predictions relevant to the limited domain of that specific task; across tasks, the model gradually accrues knowledge by capturing the way patterns and task structure vary across target domains. This two-tiered process is often described as the model “learning to learn.” 2
For example, the goal of many prominent meta learning methods is to train a model function, across multiple training episodes, to output a prediction for the degree of similarity between data points from any classes—including classes the model has not yet seen—to then use learnings from that process to solve downstream tasks (like specifically defined classification problems).
Some meta learning approaches work on a more abstract level, by training models to be easy to train. In traditional supervised learning, a model’s parameters (like weights and biases) are what’s “learned,” while the model’s hyperparameters—like the learning rate, or how parameters are initialized—are configured prior to training and not part of the learning process. Meta learning can approximate the benefits of transfer learning by learning ideal starting points: parameter initializations or other hyperparameter choices that will generalize well to different datasets in a minimal amount of training steps.