September 17, 2019 By Nin Lei 4 min read

There have been some fun and popular applications in recent years that allow users to manipulate images of a human face using filters for age, gender, facial expression and the like.  With these apps, you can get magazine-cover quality images for any selfie with just a few clicks. These apps can transform a face to make it smile, look older or younger, or change gender–and with surprisingly convincing results.

The developers of such apps don’t disclose the technology under the hood, but for anyone familiar with machine learning, it’s not difficult to figure it out.  It’s likely that these applications are based on variations of Generative Adversarial Network (GAN) technology.

So, what is GAN technology? GAN was conceived by Ian Goodfellow to create fake images that look just like real images. It has vast applicability in model training. To perform supervised training, one has to come up with labeled images. In the trivial pursuit of classifying a dog or a cat, there are ample pictures on the internet ready for download. But the task of locating labeled images takes a difficult turn when we tackle anomaly detection problems. Take the example of detecting defects in power line coupling capacitors. They’re in good condition most of the time.

In the ideal world, there should be an equal number of images showing good conditions (negative label) and bad (positive label). In practice, positive label images make up 5 percent or less. You have to wait until a part is broken before you can take a photo and add it to the positive label library. This could take a long time. Is there a better way to handle this situation? Yes, here is where GAN comes to the rescue.

GAN uses two neural networks in its architecture. The objective of the generator network is the creation of fake output. It takes random noise as input, and it produces output as similar as possible to real output. If we use counterfeit money as an example, it attempts to produce output that looks like real money.

The discriminator network acts as the cop. It is trained with real money images so that it has a good understanding of how they look. The fake images from the generator are also fed to the discriminator. Clearly it doesn’t have any problems in distinguishing real from fake in the early part of the training. It also provides feedback to the generator of how good a job it is performing. Guided by this feedback, the generator modifies its approach to shoot for more authentic output in the next iteration. Over time, its output becomes better and better, and eventually it reaches an equilibrium where the discriminator can no longer distinguish fake images from the generator and real images.

Traditional data augmentation techniques rely on OpenCV functions. Images can be rotated, stretched, flipped, made blurry and transformed. It increases the volume of training data with the goal of improving model accuracy. With GAN, it takes data augmentation to the next level. Now augmented images look very similar to the original images, and they’re more suitable for model training. It creates new patterns for a model to learn, while the OpenCV based data augmentation simply applies mathematical transformation to the original pictures. The underlying patterns have not changed.

Another interesting use case is the creation of high-resolution images from low resolution ones. Photo editing tools such as Photoshop have long supported upscaling of photos. It uses pixels from a lower resolution photo and adds more of the same pixels to form a higher resolution image. No new data is created with this process, and the enlarged photo becomes blurry. On the other hand, a variant of GAN, SRGAN (super high-resolution GAN) has been developed to upscale photos while maintaining sharpness. This GAN adds new data to the high-resolution images based on the patterns it learns from training. More data leads to sharper photos. Now you know the scene of an FBI agent in a TV show using a computer to make a fuzzy picture look sharper and bigger is no longer science fiction. GAN is at play here.

Photo enthusiasts are not the only group who benefit from this development. One IBM client uses this technique to create high resolution live organism images. While they would prefer to take high resolution photos of live organisms directly in their research facility, doing so would kill the organisms. Instead, they choose to use low resolution photo devices, keeping the organisms alive, and then use SRGAN to create high resolution versions of the photos for model training.

Currently, most of the use cases center around image manipulation. There’s active research to expand its applicability to other data structures. We have only tapped the surface of the true potential of GAN. More and creative use cases will come. By the way, with all the buzz about deepfake, GAN plays a central role in its development. The following picture shows a face generated by GAN. It’s not a real person!

Source: “Progressive Growing of GANs for Improved Quality, Stability, and Variation” by Tero Karras, Timo Aila, Samuli Laine and Jaakko Lehtinen. Image used by permission of NVIDIA

Training GAN networks requires a significant amount of computing resources, and this plays into the sweet spot of IBM Power Systems servers. Equipped with GPUs, NVLink technology, and large memory support, Power Systems are well suited for AI workloads. They can reduce model training time from days to hours or minutes.

Let one of our IBM Systems Client Experience Centers help by providing informational briefings, consulting sessions, workshops, demos, benchmarks, technical training and product documentation, including IBM Redbooks.

To learn more about the Client Experience Center offerings, visit our website or email us.


Was this article helpful?

More from Cloud

Accelerating responsible AI adoption with a new Amazon Web Services (AWS) Generative AI Competency

3 min read - We’re at a watershed moment with generative AI. According to findings from the IBM Institute for Business Value, investment in generative AI is expected to grow nearly four times over the next two to three years. For enterprises that make the right investments in the technology it could deliver a strategic advantage that pays massive dividends. At IBM® we are committed to helping clients navigate this new reality and realize meaningful value from generative AI over the long term. For our…

New 4th Gen Intel Xeon profiles and dynamic network bandwidth shake up the IBM Cloud Bare Metal Servers for VPC portfolio

3 min read - We’re pleased to announce that 4th Gen Intel® Xeon® processors on IBM Cloud Bare Metal Servers for VPC are available on IBM Cloud. Our customers can now provision Intel’s newest microarchitecture inside their own virtual private cloud and gain access to a host of performance enhancements, including more core-to-memory ratios (21 new server profiles/) and dynamic network bandwidth exclusive to IBM Cloud VPC. For anyone keeping track, that’s 3x as many provisioning options than our current 2nd Gen Intel Xeon…

IBM and AWS: Driving the next-gen SAP transformation  

5 min read - SAP is the epicenter of business operations for companies around the world. In fact, 77% of the world’s transactional revenue touches an SAP system, and 92% of the Forbes Global 2000 companies use SAP, according to Frost & Sullivan.   Global challenges related to profitability, supply chains and sustainability are creating economic uncertainty for many companies. Modernizing SAP systems and embracing cloud environments like AWS can provide these companies with a real-time view of their business operations, fueling growth and increasing…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters