Generative Adversarial Network technology: AI goes mainstream

There have been some fun and popular applications in recent years that allow users to manipulate images of a human face using filters for age, gender, facial expression and the like.  With these apps, you can get magazine-cover quality images for any selfie with just a few clicks. These apps can transform a face to make it smile, look older or younger, or change gender–and with surprisingly convincing results.

The developers of such apps don’t disclose the technology under the hood, but for anyone familiar with machine learning, it’s not difficult to figure it out.  It’s likely that these applications are based on variations of Generative Adversarial Network (GAN) technology.

So, what is GAN technology? GAN was conceived by Ian Goodfellow to create fake images that look just like real images. It has vast applicability in model training. To perform supervised training, one has to come up with labeled images. In the trivial pursuit of classifying a dog or a cat, there are ample pictures on the internet ready for download. But the task of locating labeled images takes a difficult turn when we tackle anomaly detection problems. Take the example of detecting defects in power line coupling capacitors. They’re in good condition most of the time.

In the ideal world, there should be an equal number of images showing good conditions (negative label) and bad (positive label). In practice, positive label images make up 5 percent or less. You have to wait until a part is broken before you can take a photo and add it to the positive label library. This could take a long time. Is there a better way to handle this situation? Yes, here is where GAN comes to the rescue.

GAN uses two neural networks in its architecture. The objective of the generator network is the creation of fake output. It takes random noise as input, and it produces output as similar as possible to real output. If we use counterfeit money as an example, it attempts to produce output that looks like real money.

The discriminator network acts as the cop. It is trained with real money images so that it has a good understanding of how they look. The fake images from the generator are also fed to the discriminator. Clearly it doesn’t have any problems in distinguishing real from fake in the early part of the training. It also provides feedback to the generator of how good a job it is performing. Guided by this feedback, the generator modifies its approach to shoot for more authentic output in the next iteration. Over time, its output becomes better and better, and eventually it reaches an equilibrium where the discriminator can no longer distinguish fake images from the generator and real images.

Traditional data augmentation techniques rely on OpenCV functions. Images can be rotated, stretched, flipped, made blurry and transformed. It increases the volume of training data with the goal of improving model accuracy. With GAN, it takes data augmentation to the next level. Now augmented images look very similar to the original images, and they’re more suitable for model training. It creates new patterns for a model to learn, while the OpenCV based data augmentation simply applies mathematical transformation to the original pictures. The underlying patterns have not changed.

Another interesting use case is the creation of high-resolution images from low resolution ones. Photo editing tools such as Photoshop have long supported upscaling of photos. It uses pixels from a lower resolution photo and adds more of the same pixels to form a higher resolution image. No new data is created with this process, and the enlarged photo becomes blurry. On the other hand, a variant of GAN, SRGAN (super high-resolution GAN) has been developed to upscale photos while maintaining sharpness. This GAN adds new data to the high-resolution images based on the patterns it learns from training. More data leads to sharper photos. Now you know the scene of an FBI agent in a TV show using a computer to make a fuzzy picture look sharper and bigger is no longer science fiction. GAN is at play here.

Photo enthusiasts are not the only group who benefit from this development. One IBM client uses this technique to create high resolution live organism images. While they would prefer to take high resolution photos of live organisms directly in their research facility, doing so would kill the organisms. Instead, they choose to use low resolution photo devices, keeping the organisms alive, and then use SRGAN to create high resolution versions of the photos for model training.

Currently, most of the use cases center around image manipulation. There’s active research to expand its applicability to other data structures. We have only tapped the surface of the true potential of GAN. More and creative use cases will come. By the way, with all the buzz about deepfake, GAN plays a central role in its development. The following picture shows a face generated by GAN. It’s not a real person!

Source: “Progressive Growing of GANs for Improved Quality, Stability, and Variation” by Tero Karras, Timo Aila, Samuli Laine and Jaakko Lehtinen. Image used by permission of NVIDIA

Training GAN networks requires a significant amount of computing resources, and this plays into the sweet spot of IBM Power Systems servers. Equipped with GPUs, NVLink technology, and large memory support, Power Systems are well suited for AI workloads. They can reduce model training time from days to hours or minutes.

Let one of our IBM Systems Client Experience Centers help by providing informational briefings, consulting sessions, workshops, demos, benchmarks, technical training and product documentation, including IBM Redbooks.

To learn more about the Client Experience Center offerings, visit our website or email us.