AI

Learning Implicit Generative Models by Matching Perceptual Features

Share this post:

Learning Implicit Generative Models by Matching Perceptual Features

The computer vision community is finding success in training deep convolutional neural networks (DCNNs) with pretrained models on large datasets to achieve state-of-the-art performance on object detection, style transfer, video recognition, and super-resolution. These features, referred to as perceptual features (PFs), are exploited for solving other problems through either fine-tuning or transfer learning. However, there is one sub-problem where richness of these PFs is not considerably explored: implicit generative models.

Can we use PFs to learn implicit generative models? This is the question we tried to answer in our work, “Learning Implicit Generative Models by Matching Perceptual Features,” being presented as an oral presentation at ICCV 2019 in Seoul, Korea, on October 31 at 9:18AM (Location: Oral 3.1A, Hall D1). The code is available on GitHub : https://github.com/IBM/gfmn.

In particular, we proposed a new “Moment Matching” method that learns implicit generative models by matching statistics from PFs extracted from pre-trained convolutional neural networks. We call this framework Generative Feature Matching Networks (GFMNs), which learns implicit generative models by matching mean and covariance statistics extracted from the all convolutional layers of a pre-trained DCNN.

Maximum Mean Discrepancy (MMD) based methods capture the difference between two distributions via embedding them into  infinite-dimensional maps. Defining a kernel (or a similarity measure) to discriminate between the real and  machine generated samples  is challenging. One of the existing solutions involves using adversarial training for the online learning of kernel functions. However, adversarial training involves  min-max optimization training, which can lead to instability. Our proposed method overcomes the weakness of existing min-max strategies in the following ways:

  1. Non adversarial : Doesn’t deal with min-max optimization challenges
  2. Perceptual Feature (PF) and Fixed Feature Matching: Doesn’t involve online learning of kernel functions, but leverages instead the richness of perceptual features and their abilities in discriminating between real and machine generated data
  3. Scalable: Involves ADAM based moving average; accommodates smaller batch size

A Closer Look

E = Pre-trained Feature Extractor (PF)

𝓏i= Noise Signal

μ jp-data = Features Mean of Real Data

i =  Generated Image = G(zi,q)

For training GFMN, we use noise vectors sampled from a normal distribution and pass it through a neural network generator. We get the PFs for these generated images and try to match its statistics (mean/variance) with the real training data statistics. Due to GPU scalability issues, we only match the diagonal covariances instead of full covariances. The statistics for the training data can be pre-computed before the start of the training.

ADAM Moving Average

In order to have better estimates of the statistics on generated images, we need a large minibatch. This is difficult with limited GPU capabilities. To address this, we apply moving averages (MA) of the differences between statistics for real and generated images.

Vj =Moving Averages of difference of statistics at layer j

During training, we can also estimate better MA with the help of ADAM optimizer on the loss of MAs.

Types of Feature Extractors

In order to study the impact of richness of PFs for learning the implicit generative models, we mainly tried two feature extractors:

  1. PFs from Autoencoder: Here, we train an autoencoder where the decoder has similar signature (DCGAN) as the generator. Once trained, we use the encoder as feature extractor.

    i) Encoder : DCGAN Discriminator / VGG19

    ii) Decoder : DCGAN / ResNet

  2. PFs from Classifiers: We use various DCNN models (VGG19- Resnet18) pre-trained in a supervised way on large scale dataset and use it as feature extractor. Due to the nature of tasks, these features seem to be more rich in information than auto-encoders ones.

Experiments

We benchmarked  GFMN- with either pre-trained autoencoder or cross-domain classifiers  as features extractor, on the CIFAR10 dataset. The generated images are evaluated against two metrics: Inception Score, IS (higher the better) and Fréchet Inception Distance, FID (lower the better). We get the best performance when we use PFs from both pretrained VGG19 and Resnet18 and the Generator architecture similar to Resnet.

generative models

Further, we also benchmarked GFMN against various existing adversarial and non-adversarial generative models on CIFAR10 and STL10. GFMN performs comparable or better against most methods, including the state-of-the-art Spectral GAN (SN-GANs).

generative models

 

(STL10)                                                                                        (CIFAR10)

 

Research Engineer, IBM Research

Youssef Mroueh

Research Staff Member, IBM Research

More AI stories

IBM Joins Stanford Human-Centered AI Institute’s Partner Program

IBM Research is the first founding corporate partner of the Stanford Institute for Human-Centered Artificial Intelligence.

Continue reading

IBM Research AI focuses on Human-Centered Data Science at CSCW

IBM Research AI's contributions at CSCW 2019 reflect its participation in defining the emerging academic sub-discipline of Human-Centered Data Science (HCDS).

Continue reading

Question Answering for Enterprise Use Cases

IBM Research AI is introducing a new leaderboard called TechQA to foster research on enterprise question answering (QA).

Continue reading