Label Set Operations (LaSO) Networks for Multi-Label Few-Shot Learning

Share this post:

As we scale AI and machine learning to work on a broader set of tasks for enterprise and industry applications, it is imperative to learn more from less.  Data augmentation is one important tool, especially in situations where there isn’t enough training data, that improves learning by synthesizing new training samples automatically. Such is the case for few-shot learning, where only one or very few samples are available per category. Most prior work on few-shot classification for images investigates the ‘single label’ scenario, where every training image contains only a single object and hence has a single category label.  However, a more challenging and realistic scenario is multi-label, few-shot image classification where training data has a small number of samples, and images have more than one label, which has not been explored extensively in prior work.

In order to advance this topic, we investigate multi-label, few-shot image classification in our paper presented at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019) in June 2019.  The paper, titled “LaSO: Label-Set Operations networks for multi-label few-shot learning,” proposes a new method to train deep neural networks by combining pairs of image samples with certain sets of labels to synthesize new samples with ‘merged’ labels.  As an example, consider the two images in Figure 1, one depicting ‘a person walking a sheep and a dog’ and another depicting ‘a person holding a dog and a cat’. The labels of the first image are ‘person,’ ‘sheep,’ and ‘dog,’ and the second are ‘person,’ ‘dog,’ and ‘cat.’ Given these two images, the LaSO networks synthesize novel training samples corresponding to operations that perform union, intersection and subtraction of labels.  The ‘union’ produces a sample labeled ‘person,’ ‘dog,’ ’cat,’ and ‘sheep,’ while ‘intersection’ and ‘subtraction’ produce samples labeled ‘person,’ ‘dog,’ and ‘sheep’ alone, respectively. The LaSO networks operate directly in the feature space learned by a deep neural network.

few-shot learning

Figure 1: Examples of LaSO networks operating on two images.

The LaSO networks are trained jointly together, as a single multi-task network, using specific loss functions designed to adapt their operation to the corresponding label set manipulation tasks (Figure 2).

few-shot learning

Figure 2: LaSO network architecture supporting label set operations for intersection, union and subtraction.

The multi-task network is trained on a large-scale multi-labeled dataset, with multiple labels per image corresponding to the objects appearing on that image. The resulting LaSO networks are tested in different ways in order to evaluate their potential in manipulating multi-label content. The tests include both classification of the resulting examples using a classifier pre-trained on real, held-out multi-label data and testing retrieval from the held-out tests set using feature vectors synthesized by LaSO networks (Figure 3) .

few-shot learning

Figure 3: Qualitative results of image retrieval done on synthetic LaSO vectors

The LaSO networks are designed to operate directly on the image representations, not requiring any additional inputs to control the manipulation.  In other words, human intervention is not required to indicate which labels to manipulate. Hence, they can potentially generalize to images containing novel categories that are unseen during training. In this respect, the LaSO networks can be used for challenging multi-label few-shot classification tasks.  In these cases, the LaSO networks synthesize new training samples from random pairs of the provided training samples. In our paper, we apply this capability of LaSO networks to a novel benchmark for the multi-label few shot classification, which we hope will inspire more work on this important problem. The result of using LaSO networks for data augmentation on the proposed benchmark indicate a strong potential to generalize to novel categories (Figure 4).

LaSO figure

Figure 4: LaSO augmentation performance (bottom four rows4) vs baselines (top three rows)

Multi-label few-shot classification is a new, challenging and practical task. We propose the first benchmark for this task. The results of evaluating the LaSO label-set manipulation with neural networks on the proposed benchmark demonstrate that LaSO holds a good potential for this task and possibly for other interesting applications. We hope that this work will inspire more researchers to look into this interesting problem.

LaSO: Label-Set Operations networks for multi-label few-shot learning, Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes and Alex M. Bronstein

Research Staff Member, IBM Research

More AI stories

New research helps make AI fairer in decision-making

To tackle bias in AI, our IBM Research team in collaboration with the University of Michigan has developed practical procedures and tools to help machine learning and AI achieve Individual Fairness. The key idea of Individual Fairness is to treat similar individuals well, similarly, to achieve fairness for everyone.

Continue reading

MIT and IBM announce ThreeDWorld Transport Challenge for physically realistic Embodied AI

MIT Brain and Cognitive Sciences, in collaboration with the MIT-IBM Watson AI Lab, has developed a new Embodied AI benchmark, the ThreeDWorld Transport Challenge, which aims to measure an Embodied AI agent’s ability to change the states of multiple objects to accomplish a complex task, performed within a photo- and physically realistic virtual environment.

Continue reading

Mimicking the brain: Deep learning meets vector-symbolic AI

To better simulate how the human brain makes decisions, we’ve combined the strengths of symbolic AI and neural networks. Specifically, we combined the learning representations that neural networks create with the symbol-like entities represented by high-dimensional and distributed vectors. The idea is to guide a neural network to represent unrelated objects with dissimilar high-dimensional vectors.

Continue reading