SpotTune: Transfer Learning through Adaptive Fine-Tuning

Share this post:

Deep neural networks have shown remarkable success in many computer vision tasks, but current methods typically rely on massive amounts of labeled training data to achieve high performance. Collecting and annotating such large training datasets is costly, time-consuming, and, for certain tasks that have only a few or no available examples, it may be infeasible.

A common technique to address the problem of visual learning with limited labeled data is transfer learning. Given an existing model or classifier trained on a “source task,” a typical way to conduct transfer learning is to fine-tune this model to adapt to a new “target task.” Existing methods are mostly ad-hoc in terms of deciding where to fine-tune in a deep neural network. A common strategy is to fine-tune the last few layers of the model, while keeping the other layers frozen. However, deciding which layers to freeze or fine-tune still remains a manual design choice, which can be inefficient to optimize for, especially for networks with hundreds or thousands of layers.

The IBM Research team, in collaboration with University of California, San Diego and University of Texas at Austin, recently created a novel adaptive fine-tuning method called SpotTune that automatically decides which layers of a model should be frozen or fine-tuned (see Figure 1). This method, published at the Conference in Computer Vision and Pattern Recognition (CVPR 2019), outperformed the traditional fine-tuning approach on 12 out of 14 standard datasets, and achieved the highest score on the Visual Decathlon challenge, a competitive benchmark for testing the performance of multi-domain learning algorithms with a total of 10 datasets, compared to other state-of-the-art methods.

transfer learning

Figure 1. SpotTune decides, per training example, which layers of a pre-trained model should be fine-tuned or kept frozen to improve the accuracy of the model in the target domain

The method works as follows: given a training image from the target task, a lightweight policy network is used to make the freeze vs. fine-tuning decisions for each layer of a deep neural network. As these decisions are discrete and non-differentiable, a different training algorithm based on Gumbel Softmax sampling had to be adopted.  We observed that for different datasets (different domains), a different set of layers are chosen to be fine-tuned or frozen. In fact, SpotTune automatically identifies the right fine-tuning policy for each dataset, for each training example.

As we move from narrow AI, where methods work on specific domains and require large amounts of labeled data, to broad AI, where systems exhibit intelligent behavior across a variety of tasks, the fine-tuning policy provided by SpotTune is crucial to adapt models to domains where only a few labeled examples are available. This is the case for many enterprise applications, including visual recognition for damage assessment in the insurance industry, recognition of player actions in sports for media and entertainment, diagnosis of diseases in the medical domain, and many others.

For more details about SpotTune, check our CVPR 2019 paper, authored by Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris.

Principal RSM and Manager, Computer Vision and Multimedia Department, IBM Research

More AI stories

Pushing the boundaries of convex optimization

Convex optimization problems, which involve the minimization of a convex function over a convex set, can be approximated in theory to any fixed precision in polynomial time. However, practical algorithms are known only for special cases. An important question is whether it is possible to develop algorithms for a broader subset of convex optimization problems that are efficient in both theory and practice.

Continue reading

Making Neural Networks Robust with New Perspectives

IBM researchers have partnered with scientists from MIT, Northeastern University, Boston University and University of Minnesota to publish two papers on novel attacks and defenses for graph neural networks and on a new robust training algorithm called hierarchical random switching at IJCAI 2019.

Continue reading

Improving the Scalability of AI Planning when Memory is Limited

We report new research results relevant to AI planning in our paper, "Depth-First Memory-Limited AND/OR Search and Unsolvability in Cyclic Search Spaces," presented at the International Joint Conference on Artificial Intelligence, IJCAI-19.

Continue reading