Label Set Operations (LaSO) Networks for Multi-Label Few-Shot Learning

Share this post:

As we scale AI and machine learning to work on a broader set of tasks for enterprise and industry applications, it is imperative to learn more from less.  Data augmentation is one important tool, especially in situations where there isn’t enough training data, that improves learning by synthesizing new training samples automatically. Such is the case for few-shot learning, where only one or very few samples are available per category. Most prior work on few-shot classification for images investigates the ‘single label’ scenario, where every training image contains only a single object and hence has a single category label.  However, a more challenging and realistic scenario is multi-label, few-shot image classification where training data has a small number of samples, and images have more than one label, which has not been explored extensively in prior work.

In order to advance this topic, we investigate multi-label, few-shot image classification in our paper presented at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019) in June 2019.  The paper, titled “LaSO: Label-Set Operations networks for multi-label few-shot learning,” proposes a new method to train deep neural networks by combining pairs of image samples with certain sets of labels to synthesize new samples with ‘merged’ labels.  As an example, consider the two images in Figure 1, one depicting ‘a person walking a sheep and a dog’ and another depicting ‘a person holding a dog and a cat’. The labels of the first image are ‘person,’ ‘sheep,’ and ‘dog,’ and the second are ‘person,’ ‘dog,’ and ‘cat.’ Given these two images, the LaSO networks synthesize novel training samples corresponding to operations that perform union, intersection and subtraction of labels.  The ‘union’ produces a sample labeled ‘person,’ ‘dog,’ ’cat,’ and ‘sheep,’ while ‘intersection’ and ‘subtraction’ produce samples labeled ‘person,’ ‘dog,’ and ‘sheep’ alone, respectively. The LaSO networks operate directly in the feature space learned by a deep neural network.

few-shot learning

Figure 1: Examples of LaSO networks operating on two images.

The LaSO networks are trained jointly together, as a single multi-task network, using specific loss functions designed to adapt their operation to the corresponding label set manipulation tasks (Figure 2).

few-shot learning

Figure 2: LaSO network architecture supporting label set operations for intersection, union and subtraction.

The multi-task network is trained on a large-scale multi-labeled dataset, with multiple labels per image corresponding to the objects appearing on that image. The resulting LaSO networks are tested in different ways in order to evaluate their potential in manipulating multi-label content. The tests include both classification of the resulting examples using a classifier pre-trained on real, held-out multi-label data and testing retrieval from the held-out tests set using feature vectors synthesized by LaSO networks (Figure 3) .

few-shot learning

Figure 3: Qualitative results of image retrieval done on synthetic LaSO vectors

The LaSO networks are designed to operate directly on the image representations, not requiring any additional inputs to control the manipulation.  In other words, human intervention is not required to indicate which labels to manipulate. Hence, they can potentially generalize to images containing novel categories that are unseen during training. In this respect, the LaSO networks can be used for challenging multi-label few-shot classification tasks.  In these cases, the LaSO networks synthesize new training samples from random pairs of the provided training samples. In our paper, we apply this capability of LaSO networks to a novel benchmark for the multi-label few shot classification, which we hope will inspire more work on this important problem. The result of using LaSO networks for data augmentation on the proposed benchmark indicate a strong potential to generalize to novel categories (Figure 4).

LaSO figure

Figure 4: LaSO augmentation performance (bottom four rows4) vs baselines (top three rows)

Multi-label few-shot classification is a new, challenging and practical task. We propose the first benchmark for this task. The results of evaluating the LaSO label-set manipulation with neural networks on the proposed benchmark demonstrate that LaSO holds a good potential for this task and possibly for other interesting applications. We hope that this work will inspire more researchers to look into this interesting problem.

LaSO: Label-Set Operations networks for multi-label few-shot learning, Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes and Alex M. Bronstein

Research Staff Member, IBM Research

More AI stories

IBM Research and The Michael J. Fox Foundation Develop Modeling Methodology to Help Understand Parkinson’s Disease Using Machine Learning

In collaboration with The Michael J. Fox Foundation for Parkinson’s Research, our team of researchers at IBM is aiming to develop improved disease progression models that can help clinicians understand how the disease progresses in relation to the emergence of symptoms, even when those patients are taking symptom-modifying medications.

Continue reading

AI Could Help Enable Accurate Remote Monitoring of Parkinson’s Patients

In a paper recently published in Nature Scientific Reports, IBM Research and scientists from several other medical institutions developed a new way to estimate the severity of a person’s Parkinson’s disease (PD) symptoms by remotely measuring and analyzing physical activity as motor impairment increased. Using data captured by wrist-worn accelerometers, we created statistical representations of […]

Continue reading

Image Captioning as an Assistive Technology

IBM Research's Science for Social Good team recently participated in the 2020 VizWiz Grand Challenge to design and improve systems that make the world more accessible for the blind.

Continue reading