June 14, 2019 | Written by: Rogerio Feris
Share this post:
Deep neural networks have shown remarkable success in many computer vision tasks, but current methods typically rely on massive amounts of labeled training data to achieve high performance. Collecting and annotating such large training datasets is costly, time-consuming, and, for certain tasks that have only a few or no available examples, it may be infeasible.
A common technique to address the problem of visual learning with limited labeled data is transfer learning. Given an existing model or classifier trained on a “source task,” a typical way to conduct transfer learning is to fine-tune this model to adapt to a new “target task.” Existing methods are mostly ad-hoc in terms of deciding where to fine-tune in a deep neural network. A common strategy is to fine-tune the last few layers of the model, while keeping the other layers frozen. However, deciding which layers to freeze or fine-tune still remains a manual design choice, which can be inefficient to optimize for, especially for networks with hundreds or thousands of layers.
The IBM Research team, in collaboration with University of California, San Diego and University of Texas at Austin, recently created a novel adaptive fine-tuning method called SpotTune that automatically decides which layers of a model should be frozen or fine-tuned (see Figure 1). This method, published at the Conference in Computer Vision and Pattern Recognition (CVPR 2019), outperformed the traditional fine-tuning approach on 12 out of 14 standard datasets, and achieved the highest score on the Visual Decathlon challenge, a competitive benchmark for testing the performance of multi-domain learning algorithms with a total of 10 datasets, compared to other state-of-the-art methods.
Figure 1. SpotTune decides, per training example, which layers of a pre-trained model should be fine-tuned or kept frozen to improve the accuracy of the model in the target domain
The method works as follows: given a training image from the target task, a lightweight policy network is used to make the freeze vs. fine-tuning decisions for each layer of a deep neural network. As these decisions are discrete and non-differentiable, a different training algorithm based on Gumbel Softmax sampling had to be adopted. We observed that for different datasets (different domains), a different set of layers are chosen to be fine-tuned or frozen. In fact, SpotTune automatically identifies the right fine-tuning policy for each dataset, for each training example.
As we move from narrow AI, where methods work on specific domains and require large amounts of labeled data, to broad AI, where systems exhibit intelligent behavior across a variety of tasks, the fine-tuning policy provided by SpotTune is crucial to adapt models to domains where only a few labeled examples are available. This is the case for many enterprise applications, including visual recognition for damage assessment in the insurance industry, recognition of player actions in sports for media and entertainment, diagnosis of diseases in the medical domain, and many others.
For more details about SpotTune, check our CVPR 2019 paper, authored by Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris.