Learning Implicit Generative Models by Matching Perceptual Features

Share this post:

Learning Implicit Generative Models by Matching Perceptual Features

The computer vision community is finding success in training deep convolutional neural networks (DCNNs) with pretrained models on large datasets to achieve state-of-the-art performance on object detection, style transfer, video recognition, and super-resolution. These features, referred to as perceptual features (PFs), are exploited for solving other problems through either fine-tuning or transfer learning. However, there is one sub-problem where richness of these PFs is not considerably explored: implicit generative models.

Can we use PFs to learn implicit generative models? This is the question we tried to answer in our work, “Learning Implicit Generative Models by Matching Perceptual Features,” being presented as an oral presentation at ICCV 2019 in Seoul, Korea, on October 31 at 9:18AM (Location: Oral 3.1A, Hall D1). The code is available on GitHub :

In particular, we proposed a new “Moment Matching” method that learns implicit generative models by matching statistics from PFs extracted from pre-trained convolutional neural networks. We call this framework Generative Feature Matching Networks (GFMNs), which learns implicit generative models by matching mean and covariance statistics extracted from the all convolutional layers of a pre-trained DCNN.

Maximum Mean Discrepancy (MMD) based methods capture the difference between two distributions via embedding them into  infinite-dimensional maps. Defining a kernel (or a similarity measure) to discriminate between the real and  machine generated samples  is challenging. One of the existing solutions involves using adversarial training for the online learning of kernel functions. However, adversarial training involves  min-max optimization training, which can lead to instability. Our proposed method overcomes the weakness of existing min-max strategies in the following ways:

  1. Non adversarial : Doesn’t deal with min-max optimization challenges
  2. Perceptual Feature (PF) and Fixed Feature Matching: Doesn’t involve online learning of kernel functions, but leverages instead the richness of perceptual features and their abilities in discriminating between real and machine generated data
  3. Scalable: Involves ADAM based moving average; accommodates smaller batch size

A Closer Look

E = Pre-trained Feature Extractor (PF)

?i= Noise Signal

μ jp-data = Features Mean of Real Data

i =  Generated Image = G(zi,q)

For training GFMN, we use noise vectors sampled from a normal distribution and pass it through a neural network generator. We get the PFs for these generated images and try to match its statistics (mean/variance) with the real training data statistics. Due to GPU scalability issues, we only match the diagonal covariances instead of full covariances. The statistics for the training data can be pre-computed before the start of the training.

ADAM Moving Average

In order to have better estimates of the statistics on generated images, we need a large minibatch. This is difficult with limited GPU capabilities. To address this, we apply moving averages (MA) of the differences between statistics for real and generated images.

Vj =Moving Averages of difference of statistics at layer j

During training, we can also estimate better MA with the help of ADAM optimizer on the loss of MAs.

Types of Feature Extractors

In order to study the impact of richness of PFs for learning the implicit generative models, we mainly tried two feature extractors:

  1. PFs from Autoencoder: Here, we train an autoencoder where the decoder has similar signature (DCGAN) as the generator. Once trained, we use the encoder as feature extractor.

    i) Encoder : DCGAN Discriminator / VGG19

    ii) Decoder : DCGAN / ResNet

  2. PFs from Classifiers: We use various DCNN models (VGG19- Resnet18) pre-trained in a supervised way on large scale dataset and use it as feature extractor. Due to the nature of tasks, these features seem to be more rich in information than auto-encoders ones.


We benchmarked  GFMN- with either pre-trained autoencoder or cross-domain classifiers  as features extractor, on the CIFAR10 dataset. The generated images are evaluated against two metrics: Inception Score, IS (higher the better) and Fréchet Inception Distance, FID (lower the better). We get the best performance when we use PFs from both pretrained VGG19 and Resnet18 and the Generator architecture similar to Resnet.

generative models

Further, we also benchmarked GFMN against various existing adversarial and non-adversarial generative models on CIFAR10 and STL10. GFMN performs comparable or better against most methods, including the state-of-the-art Spectral GAN (SN-GANs).

generative models


(STL10)                                                                                        (CIFAR10)


Research Engineer, IBM Research

Youssef Mroueh

Research Staff Member, IBM Research

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading