What is a pretrained model?

A teacher explaining a model to a group of students

Author

Cole Stryker

Staff Editor, AI Models

IBM Think

A pretrained model is a machine learning model that has been previously trained on a large dataset for a specific task (usually general-purpose) and can then be reused or fine-tuned for a different but related task. Pre-trained models save development teams time, data and computational resources compared to training a model from scratch.

Requiring extensive resources, infrastructure and expertise, pre-trained models are typically built by a combination of large tech companies, academic institutions, nonprofits and open-source communities. In domains like deep learning, where models require millions of parameters, pre-trained models provide a starting point that allows practitioners to avoid “reinventing the wheel,” every time they build a machine learning application.

What is model training?

Model training “teaches” a machine learning model to optimize performance on a training dataset of sample tasks relevant to eventual use cases. This training data must resemble real-world problems that the model will be tasked with, so the model can learn the data’s patterns and relationships in order to make accurate predictions on new data.

This learning process involves adjusting the parameters of a model, the weights and biases in the mathematical functions that make up their underlying machine learning algorithms. Such adjustments are intended to result in more accurate outputs.

Mathematically speaking, the goal of this process is to minimize a loss function that quantifies the error of model outputs. When the output falls beneath a certain threshold, the model is deemed “trained.” In reinforcement learning, the goal is reversed: the model’s parameters are optimized to maximize a reward function rather than minimize a loss function.

Model training entails a cycle of collecting and preprocessing data, feeding that training data to the model, measuring loss, optimizing parameters and testing performance on validation data. This workflow is repeated until satisfactory results are achieved. Training might also involve adjusting hyperparameters—structural choices that influence the learning process but are not themselves “learnable”—in a process called hyperparameter tuning.

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

The value of a pre-trained model

The primary benefit of a pretrained model is that rather than starting from scratch, developers can use models that have already learned general features—such as language structure or visual shapes—and fine-tune them on smaller, domain-specific datasets. Fine-tuning is one of several types of transfer learning, an umbrella term for techniques that adapt pretrained models for new uses.

Using a pre-trained model accelerates development and allows smaller entities like startups which may not have access to sufficient compute, data or infrastructure, to experiment with state-of-the-art models. It’s like buying an outfit off-the-rack and then having it tailored to suit the wearer’s individual frame.

Using pretrained models means practitioners have access to architectures that have already been validated, benchmarked and tested in real-world scenarios. This lowers risk and helps ensure reliability. Popular pretrained models come with extensive documentation, tutorials and code that can be used to adapt models for individual projects.

Pre-trained large language models (LLMs) are being used at countless organizations to advance natural language processing (NLP) use cases like question answering, sentiment analysis, semantic segmentation, generative AI and more. This long list of LLMs includes many of the most popular options. Other AI models specialize in computer vision, like object detection and image classification models.

One of the earliest and most influential resources for image-based models is ImageNet, a massive dataset that became the industry benchmark for computer vision. Architectures such as ResNet and Inception, trained on ImageNet, are foundational in computer vision workflows. These models excel at feature extraction, identifying the edges, textures and shapes that are useful for classifying new images.

Smart Talks

Redefining beauty through AI innovation

Malcolm Gladwell dives into the exciting collaboration between L'Oréal and IBM, exploring how a custom AI foundation model could revolutionize cosmetic product development and drive more innovation and sustainability.

Where to find pre-trained models

There are a number of model hubs and libraries where organizations host pre-trained models. Here are a few of the most prominent ones:

  • PyTorch Hub is a pre-trained model repository designed to facilitate research reproducibility and simplify the use of pre-trained models within Python’s PyTorch ecosystem.

  • TensorFlow Hub is a repository of trained models ready for fine-tuning and deployable anywhere. BERT models and Faster R-CNN (convolutional neural networks) can be reused with just a few lines of code.

  • Hugging Face Models focuses on NLP and vision models, providing access to state-of-the-art models like BERT, GPT and more, along with tools and tutorials for inference and training. The IBM Granite family of pre-trained models can all be found on Hugging Face. These models are open, performant and trusted, as well as optimized for business use cases. Granite includes models for language, vision, speech and time series, among other applications.

  • Kaggle is a platform for data science and machine learning, offering a space for competitions, datasets and a community for collaboration and learning.

  • GitHub is a proprietary developer platform that allows developers to create, store, manage and share their code. Many researchers and companies release pretrained models in repositories here with code, weights and documentation.

  • NVIDIA NGC Catalog offers optimized pretrained models for GPU acceleration, including computer vision, medical imaging and speech AI.

  • OpenAI Models provides its generative pre-trained transformer models, also known as GPT, such as the ChatGPT chatbot, Codex and DALL-E, via API. Access is cloud-based rather than direct download, through platforms like the OpenAI API or Azure OpenAI.

  • KerasHub is a pretrained model library that aims to be simple, flexible and fast, providing Keras 3 implementations of popular architectures.

Related solutions
IBM Granite

Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.

Explore Granite
Artificial intelligence solutions

Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Explore the IBM library of foundation models in the IBM watsonx portfolio to scale generative AI for your business with confidence.

Discover watsonx.ai Explore IBM Granite AI models