What is transfer learning?

Authors

Jacob Murel Ph.D.

Senior Technical Content Creator

Eda Kavlakoglu

Business Development + Partnerships

IBM Research

What is transfer learning?

Transfer learning is a machine learning technique in which knowledge gained through one task or dataset is used to improve model performance on another related task or different dataset.1 In other words, transfer learning uses what has been learned in one setting to improve generalization in another setting.2

Transfer learning has many applications, from solving regression problems in data science to training deep learning models. Indeed, it is particularly appealing for the latter given the large amount of data needed to create deep neural networks.

Traditional learning processes build a new model for each new task, based on the available labeled data. This is because traditional machine learning algorithms assume training and test data come from the same feature space, and so if the data distribution changes, or the trained model is applied to a new dataset, users must retrain a newer model from scratch, even if attempting a similar task as the first model (e.g. sentiment analysis classifier of movie reviews versus song reviews). Transfer learning algorithms, however, takes already-trained models or networks as a starting point. It then applies that model’s knowledge gained in an initial source task or data (e.g. classifying movie reviews) towards a new, yet related, target task or data (e.g. classifying song reviews).3

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

Advantages and disadvantages of transfer learning

Advantages

  • Computational costs. Transfer learning reduces the requisite computational costs to build models for new problems. By repurposing pretrained models or pretrained networks to tackle a different task, users can reduce the amount of model training time, training data, processor units, and other computational resources. For instance, a fewer number of epochs—i.e. passes through a dataset—may be needed to achieve a desired learning rate. In this way, transfer learning can accelerate and simplify model training processes.

  • Dataset size. Transfer learning particularly helps alleviate difficulties involved in acquiring large datasets. For instance, large language models (LLMs) require large amounts of training data to obtain optimal performance. Quality publicly available datasets can be limited, and producing sufficient manually labelled data can be time-consuming and expensive.

  • Generalizability. While transfer learning aids model optimization, it can further increase a model’s generalizability. Because transfer learning involves retraining an existing model with a new dataset, the retrained model will consist of knowledge gained from multiple datasets. It will potentially display better performance on a wider variety of data than the initial base model trained on only one type of dataset. Transfer learning can thus inhibit overfitting.4

Of course, the transfer of knowledge from one domain to another cannot offset the negative impact of poor-quality data. Preprocessing techniques and feature engineering, such as data augmentation and feature extraction, are still necessary when using transfer learning.

Disadvantages

It is less the case that there are disadvantages inherent to transfer learning than that there are potential negative consequences that result from its misapplication. Transfer learning works best when three conditions are met:

  • both learning tasks are similar

  • source and target datasets data distributions do not vary too greatly

  • a comparable model can be applied to both tasks

When these conditions are not met, transfer learning can negatively affect model performance. Literature refers to this as negative transfer. Ongoing research proposes a variety of tests for determining whether datasets and tasks meet the above conditions, and so will not result in negative transfer.5 Distant transfer is one method developed to correct for negative transfer that results from too great a dissimilarity in the data distributions of source and target datasets.6

Note that there is no widespread, standard metric to determine similarity between tasks for transfer learning. A handful of studies, however, propose different evaluation methods to predict similarities between datasets and machine learning tasks, and so viability for transfer learning.7

Mixture of Experts | 28 November, episode 83

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Types of transfer learning

There are three adjacent practices or sub-settings of transfer learning. Their distinction from one another—as well as transfer learning more broadly—largely result from changes in the relationship between the source domain, target domain, and tasks to be completed.8

  • Inductive transfer. This is when the source and target tasks are different, regardless of any difference or similitude between the target and source domains (i.e. datasets). This can manifest in computer vision models when architectures pretrained for feature extraction on large datasets are then are adopted for further training on a specific task, such as object detection. Multitask learning, which consists of simultaneously learning two different tasks (such as image classification and object detection) on the same dataset, can be considered a form of inductive transfer.9

  • Unsupervised learning. This is similar to inductive transfer, as the target and source tasks are different. But in inductive transfer, source and/or target data is often labeled. Per its name, unsupervised transfer learning is unsupervised, meaning there is no manually labeled data.10 By comparison, inductive transfer can be considered supervised learning. One common application of unsupervised learning is fraud detection. By identify common patterns across an unlabeled dataset of transactions, a model can further learn to identify deviating behaviors as possible fraud.

  • Transductive transfer. This occurs when the source and target tasks are the same, but the datasets (or domains) are different. More specifically, the source data is typically labelled while the target data is unlabeled. Domain adaptation is a form of transductive learning, as it applies knowledge gained from performing a task on one data distribution towards the same task on another data distribution.11 An example of transductive transfer learning is the application of a text classification model trained and tested on restaurant reviews to classify movie reviews.

Transfer learning versus finetuning

Transfer learning is distinct from finetuning. Both, admittedly, reuse preexisting machine learning models as opposed to training new models. But the similarities largely end there. Finetuning refers to the process of further training a model on a task-specific dataset to improve performance on the initial, specific task for which the model was built. For instance, one may create a general purpose object detection model using massive imagesets such as COCO or ImageNet and then further train the resulting model on a smaller, labeled dataset specific for car detection. In this way, a user finetunes an object detection model for car detection. By contrast, transfer learning signifies when users adapt a model to a new, related problem as opposed to the same problem.

Transfer learning use cases

There are many applications of transfer learning in real-world machine learning and artificial intelligence settings. Developers and data scientists can use transfer learning to aid in a myriad of tasks and combine it with other learning approaches, such as reinforcement learning.

Natural language processing

One salient issue affecting transfer learning in NLP is feature mismatch. Features in different domains can have different meanings, and so connotations (e.g. light signifying weight and optics). This disparity in feature representations affects sentiment classification tasks, language models, and more. Deep learning-based models—in particular, word embeddings—show promise in correcting for this, as they can adequately capture semantic relations and orientations for domain adaptation tasks.12

Computer vision

Because of difficulties in acquiring sufficient manually labeled data for diverse computer vision tasks, a wealth of research examines transfer learning applications with convolutional neural networks (CNNs). One notable example is ResNet, a pretrained model architecture that demonstrates improved performance in image classification and object detection tasks.13 Recent research investigates the renowned ImageNet dataset for transfer learning, arguing that (contra computer vision folk wisdom) only small subsets of this dataset are needed to train reliably generalizable models.14 Many transfer learning tutorials for computer vision use both or either ResNet and ImageNet with TensorFlow’s keras library.

Related solutions
IBM® watsonx Orchestrate™ 

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
Artificial intelligence consulting and services

IBM Consulting AI services help reimagine how businesses work with AI for transformation.

Explore AI services
Take the next step

Whether you choose to customize pre-built apps and skills or build and deploy custom agentic services using an AI studio, the IBM watsonx platform has you covered.

Explore watsonx Orchestrate Explore watsonx.ai