SpotTune: Transfer Learning through Adaptive Fine-Tuning

Share this post:

Deep neural networks have shown remarkable success in many computer vision tasks, but current methods typically rely on massive amounts of labeled training data to achieve high performance. Collecting and annotating such large training datasets is costly, time-consuming, and, for certain tasks that have only a few or no available examples, it may be infeasible.

A common technique to address the problem of visual learning with limited labeled data is transfer learning. Given an existing model or classifier trained on a “source task,” a typical way to conduct transfer learning is to fine-tune this model to adapt to a new “target task.” Existing methods are mostly ad-hoc in terms of deciding where to fine-tune in a deep neural network. A common strategy is to fine-tune the last few layers of the model, while keeping the other layers frozen. However, deciding which layers to freeze or fine-tune still remains a manual design choice, which can be inefficient to optimize for, especially for networks with hundreds or thousands of layers.

The IBM Research team, in collaboration with University of California, San Diego and University of Texas at Austin, recently created a novel adaptive fine-tuning method called SpotTune that automatically decides which layers of a model should be frozen or fine-tuned (see Figure 1). This method, published at the Conference in Computer Vision and Pattern Recognition (CVPR 2019), outperformed the traditional fine-tuning approach on 12 out of 14 standard datasets, and achieved the highest score on the Visual Decathlon challenge, a competitive benchmark for testing the performance of multi-domain learning algorithms with a total of 10 datasets, compared to other state-of-the-art methods.

transfer learning

Figure 1. SpotTune decides, per training example, which layers of a pre-trained model should be fine-tuned or kept frozen to improve the accuracy of the model in the target domain

The method works as follows: given a training image from the target task, a lightweight policy network is used to make the freeze vs. fine-tuning decisions for each layer of a deep neural network. As these decisions are discrete and non-differentiable, a different training algorithm based on Gumbel Softmax sampling had to be adopted.  We observed that for different datasets (different domains), a different set of layers are chosen to be fine-tuned or frozen. In fact, SpotTune automatically identifies the right fine-tuning policy for each dataset, for each training example.

As we move from narrow AI, where methods work on specific domains and require large amounts of labeled data, to broad AI, where systems exhibit intelligent behavior across a variety of tasks, the fine-tuning policy provided by SpotTune is crucial to adapt models to domains where only a few labeled examples are available. This is the case for many enterprise applications, including visual recognition for damage assessment in the insurance industry, recognition of player actions in sports for media and entertainment, diagnosis of diseases in the medical domain, and many others.

For more details about SpotTune, check our CVPR 2019 paper, authored by Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris.

Principal RSM and Manager, Computer Vision and Multimedia Department, IBM Research

More AI stories

IBM Research and The Michael J. Fox Foundation Develop Modeling Methodology to Help Understand Parkinson’s Disease Using Machine Learning

In collaboration with The Michael J. Fox Foundation for Parkinson’s Research, our team of researchers at IBM is aiming to develop improved disease progression models that can help clinicians understand how the disease progresses in relation to the emergence of symptoms, even when those patients are taking symptom-modifying medications.

Continue reading

AI Could Help Enable Accurate Remote Monitoring of Parkinson’s Patients

In a paper recently published in Nature Scientific Reports, IBM Research and scientists from several other medical institutions developed a new way to estimate the severity of a person’s Parkinson’s disease (PD) symptoms by remotely measuring and analyzing physical activity as motor impairment increased. Using data captured by wrist-worn accelerometers, we created statistical representations of […]

Continue reading

Image Captioning as an Assistive Technology

IBM Research's Science for Social Good team recently participated in the 2020 VizWiz Grand Challenge to design and improve systems that make the world more accessible for the blind.

Continue reading