September 17, 2019 By Nin Lei 4 min read

There have been some fun and popular applications in recent years that allow users to manipulate images of a human face using filters for age, gender, facial expression and the like.  With these apps, you can get magazine-cover quality images for any selfie with just a few clicks. These apps can transform a face to make it smile, look older or younger, or change gender–and with surprisingly convincing results.

The developers of such apps don’t disclose the technology under the hood, but for anyone familiar with machine learning, it’s not difficult to figure it out.  It’s likely that these applications are based on variations of Generative Adversarial Network (GAN) technology.

So, what is GAN technology? GAN was conceived by Ian Goodfellow to create fake images that look just like real images. It has vast applicability in model training. To perform supervised training, one has to come up with labeled images. In the trivial pursuit of classifying a dog or a cat, there are ample pictures on the internet ready for download. But the task of locating labeled images takes a difficult turn when we tackle anomaly detection problems. Take the example of detecting defects in power line coupling capacitors. They’re in good condition most of the time.

In the ideal world, there should be an equal number of images showing good conditions (negative label) and bad (positive label). In practice, positive label images make up 5 percent or less. You have to wait until a part is broken before you can take a photo and add it to the positive label library. This could take a long time. Is there a better way to handle this situation? Yes, here is where GAN comes to the rescue.

GAN uses two neural networks in its architecture. The objective of the generator network is the creation of fake output. It takes random noise as input, and it produces output as similar as possible to real output. If we use counterfeit money as an example, it attempts to produce output that looks like real money.

The discriminator network acts as the cop. It is trained with real money images so that it has a good understanding of how they look. The fake images from the generator are also fed to the discriminator. Clearly it doesn’t have any problems in distinguishing real from fake in the early part of the training. It also provides feedback to the generator of how good a job it is performing. Guided by this feedback, the generator modifies its approach to shoot for more authentic output in the next iteration. Over time, its output becomes better and better, and eventually it reaches an equilibrium where the discriminator can no longer distinguish fake images from the generator and real images.

Traditional data augmentation techniques rely on OpenCV functions. Images can be rotated, stretched, flipped, made blurry and transformed. It increases the volume of training data with the goal of improving model accuracy. With GAN, it takes data augmentation to the next level. Now augmented images look very similar to the original images, and they’re more suitable for model training. It creates new patterns for a model to learn, while the OpenCV based data augmentation simply applies mathematical transformation to the original pictures. The underlying patterns have not changed.

Another interesting use case is the creation of high-resolution images from low resolution ones. Photo editing tools such as Photoshop have long supported upscaling of photos. It uses pixels from a lower resolution photo and adds more of the same pixels to form a higher resolution image. No new data is created with this process, and the enlarged photo becomes blurry. On the other hand, a variant of GAN, SRGAN (super high-resolution GAN) has been developed to upscale photos while maintaining sharpness. This GAN adds new data to the high-resolution images based on the patterns it learns from training. More data leads to sharper photos. Now you know the scene of an FBI agent in a TV show using a computer to make a fuzzy picture look sharper and bigger is no longer science fiction. GAN is at play here.

Photo enthusiasts are not the only group who benefit from this development. One IBM client uses this technique to create high resolution live organism images. While they would prefer to take high resolution photos of live organisms directly in their research facility, doing so would kill the organisms. Instead, they choose to use low resolution photo devices, keeping the organisms alive, and then use SRGAN to create high resolution versions of the photos for model training.

Currently, most of the use cases center around image manipulation. There’s active research to expand its applicability to other data structures. We have only tapped the surface of the true potential of GAN. More and creative use cases will come. By the way, with all the buzz about deepfake, GAN plays a central role in its development. The following picture shows a face generated by GAN. It’s not a real person!

Source: “Progressive Growing of GANs for Improved Quality, Stability, and Variation” by Tero Karras, Timo Aila, Samuli Laine and Jaakko Lehtinen. Image used by permission of NVIDIA

Training GAN networks requires a significant amount of computing resources, and this plays into the sweet spot of IBM Power Systems servers. Equipped with GPUs, NVLink technology, and large memory support, Power Systems are well suited for AI workloads. They can reduce model training time from days to hours or minutes.

Let one of our IBM Systems Client Experience Centers help by providing informational briefings, consulting sessions, workshops, demos, benchmarks, technical training and product documentation, including IBM Redbooks.

To learn more about the Client Experience Center offerings, visit our website or email us.

 

Was this article helpful?
YesNo

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters