The training process is the most critical step in the lifecycle of AI models, from forecasting systems built on basic linear regression algorithms to the complex neural networks that power generative AI.
Model training is the machine learning (ML) step where the “learning” occurs. In machine learning, learning involves adjusting the parameters of an ML model. These parameters include the weights and biases in the mathematical functions that make up their algorithms. The goal of this adjustment is to produce more accurate outputs. The specific values for these weights and biases, which are the end result of model training, are the tangible manifestation of a model’s “knowledge.”
Mathematically, the goal of this learning is to minimize a loss function that quantifies the error of model outputs on training asks. When the output of the loss function falls beneath some predetermined threshold—meaning the model’s error on training tasks is sufficiently small—the model is deemed “trained.” In reinforcement learning, the goal is reversed: instead of minimizing a loss function, the model’s parameters are optimized to maximize a reward function.
In practice, model training entails a cycle of collecting and curating data, running the model on that training data, measuring loss, optimizing parameters accordingly and testing model performance on validation datasets. This workflow proceeds iteratively until satisfactory results have been achieved. Adequate training might also require the adjustment of hyperparameters—structural choices that influence the learning process but are not themselves “learnable”—in a process called hyperparameter tuning.
Sometimes, an already-trained model can be fine-tuned for more specific tasks or domains through further learning on new training data. Though both the original from-scratch training and the subsequent fine-tuning are “training,” the former is typically called “pretraining” in this context (for the sake of disambiguation). Fine-tuning is one of several types of transfer learning, an umbrella term for machine learning techniques that adapt pretrained models for new uses.