What is model tuning?

21 January 2025

Authors

Ivan Belcic

Staff writer

Cole Stryker

Editorial Lead, AI Models

What is model tuning?

Model tuning optimizes a machine learning model’s hyperparameters to obtain the best training performance. The process involves making adjustments until the optimal set of hyperparameter values is found, resulting in improved accuracy, generation quality and other performance metrics.

Because model tuning identifies a model’s optimal hyperparameters, it is also known as hyperparameter optimization, or alternatively, hyperparameter tuning.

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

What are hyperparameters?

Hyperparameters are model configuration variables that cannot be derived from training data. These variables determine the key features and behavior of a model. Some hyperparameters, such as learning rate, control the model’s behavior during training. Others determine the nature of the model itself, such as a hyperparameter that sets the number of layers in a neural network.

Data scientists must configure a machine learning (ML) model’s hyperparameter values before training begins. Choosing the correct combination of hyperparameters ahead of time is essential for successful ML model training.

Hyperparameters versus model parameters

Model parameters, or model weights, are variables that artificial intelligence (AI) models discover during training. AI algorithms learn the underlying relationships, patterns and distributions of their training datasets, then apply those findings to new data to make successful predictions.

As a machine learning algorithm undergoes training, it sets and updates its parameters. These parameters represent what a model learns from its training dataset and change over time with each iteration of its optimization algorithm.

Why is model tuning important?

Model tuning is important because of how hyperparameter values directly impact model performance. Good hyperparameter configuration leads models to learn better during training.

Without good tuning, a model can become prone to overfitting—when it hews too closely to its training data and cannot adapt to new datasets. Other shortcomings can include excessive model bias or variance.

Each machine learning algorithm has its own optimal combination of hyperparameters, with some influencing performance more than others. Limiting model tuning to a core set of the most impactful hyperparameters can reduce time and computational resource demands.

      Overfitting

      Overfitting happens when a model is too complex for its training data. Its hyperparameters create a neural network with too many layers or with too many trainable parameters. With overfitting, the model adapts too tightly to its training dataset. An overfitted model cannot adapt to new data because it has failed to generalize from its training data.

      Imagine two students in a classroom. One student learns by memorizing facts, the other by understanding the underlying concepts being taught. So far, both have performed well on tests covering the course material. But what happens when they need to apply their learning to new topics?

      The student who can generalize will successfully transfer what they have learned, while the student who relies on memory might struggle to do the same. They have “overfit” their understanding too closely to the specifics of the classroom content while failing to grasp the core principles.

      Bias

      Bias is the gap between a model’s predictions and actual real-world outcomes. While bias can stem from flawed training datasets, bias also results from suboptimal model tuning—the model isn’t able to learn well, even when its training data is viable.

      Models with high bias ignore subtleties in the training data and can fail to generate accurate predictions during training. Simpler algorithms, such as linear regression, are more prone to high bias because they cannot capture more complex relationships in their training data.

      Choosing the right algorithm for a specific task is the first step toward obtaining good performance, even before model tuning begins.

      Variance

      Variance inversely depicts the consistency of a model’s predictions. Greater variance means that a model has less consistent predictions with unseen data, though they often perform well with training datasets. Models with high variance suffer from overfitting—they cannot transfer what they have learned from training data to new data.

      Regularization is a technique that reduces overfitting by shifting the bias–variance ratio in favor of greater bias. Good model tuning manages the tradeoff between bias and variance for optimal real-world predictions.

      Mixture of Experts | 27 February, episode 44

      Decoding AI: Weekly News Roundup

      Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

      How does model tuning work?

      Model tuning works by discovering the configuration of hyperparameters that result in the best training outcome. Sometimes, such as when building smaller, simple models, data scientists can manually configure hyperparameters ahead of time. But transformers and other complex models can have thousands of possible hyperparameter combinations.

      With so many options, data scientists can limit the hyperparameter search space to cover the portion of potential combinations that is most likely to yield optimal results. They can also use automated methods to algorithmically discover the optimal hyperparameters for their intended use case.

        Model tuning methods

        The most common model tuning methods include:

        • Grid search

        • Random search

        • Bayesian optimization

        • Hyperband

        Grid search

        Grid search is the “brute force” model tuning method. Data scientists create a search space consisting of every possible hyperparameter value. Then, the grid search algorithm produces all the available hyperparameter combinations. The model is trained and validated for each hyperparameter combination, with the best-performing model selected for use.

        Because it tests all possible hyperparameter values instead of a smaller subset, grid search is a comprehensive tuning method. The downside of this enlarged scope is that grid search is time-consuming and resource-intensive.

          Random search

          Rather than test every possible hyperparameter configuration, random search algorithms choose hyperparameter values from a statistical distribution of potential options. Data scientists assemble the most likely hyperparameter values, increasing the algorithm’s chances of selecting a viable option.

          Random search is faster and easier to implement than grid search. But because every combination isn’t tested, there is no guarantee that the single best hyperparameter configuration will be found.

          Bayesian optimization

          Unlike grid and random searches, Bayesian optimization selects hyperparameter values based on the results of earlier attempts. The algorithm uses the testing results of previous hyperparameter values to predict values that are likely to lead to better outcomes.

          Bayesian optimization works by constructing a probabilistic model of the objective function. This surrogate function becomes more efficient over time as its results improve—it avoids allocating resources to lower-performing hyperparameter values while homing in on the optimal configuration.

          The technique of optimizing a model based on prior rounds of testing is known as sequential model-based optimization (SMBO).

            Hyperband

            Hyperband improves the random search workflow by focusing on promising hyperparameter configurations while aborting less-viable searches. At each iteration of testing, the hyperband algorithm removes the worst-performing half of all the tested configurations.

            Hyperband’s “successive halving” approach maintains focus on the most promising configurations until the single best is discovered from the original pool of candidates.

            Model tuning versus model training

            While model tuning is the process of discovering the optimal hyperparameters, model training is when a machine learning algorithm is taught to identify patterns in its training dataset and make accurate predictions on new data.

            The training process uses an optimization algorithm to minimize a loss function, or objective function, which measures the gap between a model’s predictions and actual values. The goal is to identify the best combination of model weights and bias for the lowest possible value of the objective function. The optimization algorithm updates a model’s weights periodically during training.

            The gradient descent family of optimization algorithms works by descending the gradient of the loss function to discover its minimum value: the point at which the model is most accurate. A local minimum is a minimum value in a specified region, but might not be the global minimum of the function—the absolute lowest value.

            It is not always necessary to identify the loss function’s global minimum. A model is said to have reached convergence when its loss function is successfully minimized.

            Cross-validation, testing and retraining

            After training, models undergo cross-validation—checking the results of training with another portion of the training data. The model’s predictions are compared to the actual values of the validation data. The highest-performing model then moves to the testing phase, where its predictions are again examined for accuracy before deployment. Cross-validation and testing are essential for large language model (LLM) evaluation.

            Retraining is a portion of the MLOps (machine learning operations) AI lifecycle that continually and autonomously retrains a model over time to keep it performing at its best.

            Model tuning versus fine-tuning

            Model tuning identifies the best hyperparameter values for training, whereas fine-tuning is the process of tweaking a pretrained foundation model for specific downstream tasks. Fine-tuning is a type of transfer learning—when a model’s preexisting learning is adapted to new tasks.

            With fine-tuning, a pretrained model is again trained on a smaller, more specialized dataset that is relevant to the model’s intended use case. Initially training a model on a small dataset risks overfitting, but training with a large, generalized dataset helps mitigate that risk.

            Hyperparameter examples

            While every algorithm has its own set of hyperparameters, many are shared across similar algorithms. Common hyperparameters in the neural networks that power large language models (LLMs) include:

            • Learning rate

            • Learning rate decay

            • Epochs

            • Batch size

            • Momentum

            • Number of hidden layers

            • Nodes per layer

            • Activation function

            Learning rate

            Learning rate determines how quickly a model updates its weights during training. A higher learning rate means that a model learns faster but at the risk of overshooting a local minimum of its loss function. Meanwhile, a low learning rate can lead to excessive training times, increasing resources and cost demands.

            Learning rate decay

            Learning rate decay is a hyperparameter that slows an ML algorithm’s learning rate over time. The model updates its parameters more quickly at first, then with greater nuance as it approaches convergence, reducing the risk of overshooting.

            Epochs

            Model training involves exposing a model to its training data multiple times so that it iteratively updates its weights. An epoch occurs each time the model processes its entire training dataset, and the epochs hyperparameter sets the number of epochs that compose the training process.

            Batch size

            Machine learning algorithms don’t process their entire training datasets in each iteration of the optimization algorithm. Instead, training data is separated into batches, with model weights updating after each batch. Batch size determines the number of data samples in each batch.

            Momentum

            Momentum is an ML algorithm’s propensity to update its weights in the same direction as previous updates. Think of momentum as an algorithm’s conviction in its learning. High momentum leads an algorithm to quicker convergence at the risk of bypassing significant local minima. Meanwhile, low momentum can cause an algorithm to waffle back and forth with its updates, stalling its progress.

            Number of hidden layers

            Neural networks model the structure of the human brain and contain multiple layers of interconnected neurons, or nodes. This complexity is what allows advanced models, such as transformer models, to handle complex generative tasks. Fewer layers make for a leaner model, but more layers open the door to more complex tasks.

            Nodes per layer

            Each layer of a neural network has a predetermined number of nodes. As layers increase in width, so does the model’s ability to handle complex relationships between data points but at the cost of greater computational requirements.

            Activation function

            An activation function is a hyperparameter that grants models the ability to create nonlinear boundaries between data groups. When it is impossible to accurately classify data points into groups separated by a straight line, activation provides the needed flexibility for more complex divisions.

            A neural network without an activation function is essentially a linear regression model.

            Related solutions
            IBM watsonx.ai

            Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

            Discover watsonx.ai
            Artificial intelligence solutions

            Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

            Explore AI solutions
            AI consulting and services

            Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

            Explore AI services
            Take the next step

            Get one-stop access to capabilities that span the AI development lifecycle. Produce powerful AI solutions with user-friendly interfaces, workflows and access to industry-standard APIs and SDKs.

            Explore watsonx.ai Book a live demo