The latest AI trends, brought to you by experts
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Transfer learning is a machine learning technique in which knowledge gained through one task or dataset is used to improve model performance on another related task or different dataset.1 In other words, transfer learning uses what has been learned in one setting to improve generalization in another setting.2
Transfer learning has many applications, from solving regression problems in data science to training deep learning models. Indeed, it is particularly appealing for the latter given the large amount of data needed to create deep neural networks.
Traditional learning processes build a new model for each new task, based on the available labeled data. This is because traditional machine learning algorithms assume training and test data come from the same feature space, and so if the data distribution changes, or the trained model is applied to a new dataset, users must retrain a newer model from scratch, even if attempting a similar task as the first model (e.g. sentiment analysis classifier of movie reviews versus song reviews). Transfer learning algorithms, however, takes already-trained models or networks as a starting point. It then applies that model’s knowledge gained in an initial source task or data (e.g. classifying movie reviews) towards a new, yet related, target task or data (e.g. classifying song reviews).3
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Of course, the transfer of knowledge from one domain to another cannot offset the negative impact of poor-quality data. Preprocessing techniques and feature engineering, such as data augmentation and feature extraction, are still necessary when using transfer learning.
It is less the case that there are disadvantages inherent to transfer learning than that there are potential negative consequences that result from its misapplication. Transfer learning works best when three conditions are met:
When these conditions are not met, transfer learning can negatively affect model performance. Literature refers to this as negative transfer. Ongoing research proposes a variety of tests for determining whether datasets and tasks meet the above conditions, and so will not result in negative transfer.5 Distant transfer is one method developed to correct for negative transfer that results from too great a dissimilarity in the data distributions of source and target datasets.6
Note that there is no widespread, standard metric to determine similarity between tasks for transfer learning. A handful of studies, however, propose different evaluation methods to predict similarities between datasets and machine learning tasks, and so viability for transfer learning.7
There are three adjacent practices or sub-settings of transfer learning. Their distinction from one another—as well as transfer learning more broadly—largely result from changes in the relationship between the source domain, target domain, and tasks to be completed.8
Transfer learning is distinct from finetuning. Both, admittedly, reuse preexisting machine learning models as opposed to training new models. But the similarities largely end there. Finetuning refers to the process of further training a model on a task-specific dataset to improve performance on the initial, specific task for which the model was built. For instance, one may create a general purpose object detection model using massive imagesets such as COCO or ImageNet and then further train the resulting model on a smaller, labeled dataset specific for car detection. In this way, a user finetunes an object detection model for car detection. By contrast, transfer learning signifies when users adapt a model to a new, related problem as opposed to the same problem.
There are many applications of transfer learning in real-world machine learning and artificial intelligence settings. Developers and data scientists can use transfer learning to aid in a myriad of tasks and combine it with other learning approaches, such as reinforcement learning.
One salient issue affecting transfer learning in NLP is feature mismatch. Features in different domains can have different meanings, and so connotations (e.g. light signifying weight and optics). This disparity in feature representations affects sentiment classification tasks, language models, and more. Deep learning-based models—in particular, word embeddings—show promise in correcting for this, as they can adequately capture semantic relations and orientations for domain adaptation tasks.12
Because of difficulties in acquiring sufficient manually labeled data for diverse computer vision tasks, a wealth of research examines transfer learning applications with convolutional neural networks (CNNs). One notable example is ResNet, a pretrained model architecture that demonstrates improved performance in image classification and object detection tasks.13 Recent research investigates the renowned ImageNet dataset for transfer learning, arguing that (contra computer vision folk wisdom) only small subsets of this dataset are needed to train reliably generalizable models.14 Many transfer learning tutorials for computer vision use both or either ResNet and ImageNet with TensorFlow’s keras library.
1 Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, Antonio Jose Serrano Lopez. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. Information Science Reference. 2009.
2 Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press. 2016.
3 Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: Concepts and Techniques. 3rd edition. Elsevier. 2012.
4 Jindong Wang and Yiqiang Chen. Introduction to Transfer Learning: Applications and Methods. Springer. 2023.
5 Wen Zhang, Lingfei Deng, Lei Zhang, Dongrui Wu. “A Survey on Negative Transfer.” IEEE/CAA Journal of Automatica Sinica. Vol. 10, No. 2, 2023, pp. 305–329. https://arxiv.org/abs/2009.00909 .
6 Ben Tan, Yangqiu Song, Erheng Zhong, Qiang Yang. “Transitive Transfer Learning.” Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, pp. 1155–1164. https://dl.acm.org/doi/10.1145/2783258.2783295 . Ben Tan, Yu Zhang, Sinno Jialin Pan, Qiang Yang. “Domain Distant Transfer.” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2017, pp. 2604–2610. https://dl.acm.org/doi/10.5555/3298483.3298614 .
7 Changjian Shui, Mahdieh Abbasi, Louis-Émile Robitaille, Boyu Wang, Christian Gagné. “A Principled Approach for Learning Task Similarity in Multitask Learning.” Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019, pp. 3446–3452. https://www.ijcai.org/proceedings/2019/0478.pdf . Kshitij Dwivedi and Gemma Roig. “Representation Similarity Analysis for Efficient Task Taxonomy & Transfer Learning.” Proceedings of the Conference on Computer Vision and Pattern Recognition. 2019, pp. 12387–12396. https://openaccess.thecvf.com/.../CVPR_2019_paper.pdf . Javier García, Álvaro Visús, Fernando Fernández. “A taxonomy for similarity metrics between Markov decision processes.” Machine Learning, Vol. 111. 2022, pp. 4217–4247. https://link.springer.com/article/10.1007/s10994-022-06242-4 .
8 Asmaul Hosna, Ethel Merry, Jigmey Gyalmo, Zulfikar Alom, Zeyar Aung, Mohammad Abdul Azim. “Transfer learning: a friendly introduction.” Journal of Big Data, Vol. 9, 2022. https://journalofbigdata.springeropen.com/articles/10.1186/s40537-022-00652-w . Sinno Jialin Pan, Qiang Yang. “A Survey on Transfer Learning.” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 10, 2010, pp. 1345–1359. https://ieeexplore.ieee.org/document/5288526 .
9 Sinno Jialin Pan, Qiang Yang. “A Survey on Transfer Learning.” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 10, 2010, pp. 1345–1359. https://ieeexplore.ieee.org/document/5288526 . Ricardo Vilalta. “Inductive Transfer.” Encyclopedia of Machine Learning and Data Mining. Springer. 2017.
10 Sinno Jialin Pan, Qiang Yang. “A Survey on Transfer Learning.” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 10, 2010, pp. 1345–1359. https://ieeexplore.ieee.org/document/5288526 .
11 Sinno Jialin Pan, Qiang Yang. “A Survey on Transfer Learning.” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 10, 2010, pp. 1345–1359. https://ieeexplore.ieee.org/document/5288526 . Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. MIT Press. 2016.
12 Qiang Yang. Transfer Learning. Cambridge University Press. 2020. Eyal Ben-David, Carmel Rabinovitz, Roi Reichart. “PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models.” Transactions of the Association for Computational Linguistics, Vol. 8, 2020, pp. 504–521. https://aclanthology.org/2020.tacl-1.33.pdf .
13 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. “Deep Residual Learning for Image Recognition.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, pp. 770–778. https://ieeexplore.ieee.org/document/7780459 .
14 Minyoung Huh, Pulkit Agrawal, Alexei Efros. “What makes ImageNet good for transfer learning?” Berkeley Artificial Intelligence Research Laboratory (BAIR). 2017. https://people.csail.mit.edu/minhuh/papers/analysis/ .