Cognitive Computing

Advancing Visual Recognition and Unlocking Data in Plain Sight

Share this post:

The past decade has been defined by an unprecedented amount of visual content humans have been able to generate – from social media, to entertainment and manufacturing, to even the satellites circling earth from high above the buzz of daily life. With recent advancements in cognitive technologies like large-scale deep learning and semantic-facet based visual modeling, we’ve started to accelerate our ability to discover insights from this data — but it’s still been a challenge to go beyond recognizing a baseline level of detail.

Today, IBM is making an important step forward in advancing this ability by rolling out a significant update to the image classifier capability in Watson Visual Recognition, a service that allows users to understand the contents of an image or video frame. Its active vocabulary is over 2.5 greater in size than the previous model, with a built-in set of tens of thousands of visual labels. This enhancement greatly improves the service’s ability to recognize highly specific visual concepts.

These new, built-in labels cover a broad set of visual concepts that includes objects, people, places, activities, scenes, and many more categories as well as fine-grain attributes, such as specific colors.  The depth of each category, to more specific labels, has also been increased. The result is a built-in classifier that, for many typical photos, can give both more specific and more accurate labels. It augments the description with more general tags based on a hierarchy – such as knowing a “horse” is an “animal”. The service also makes fine distinctions that produce highly specific labels. For example, given a photo of “people having an enjoyable dining experience,” the service can now recognize that the scene is not just a restaurant but specifically is a beer garden based on its visual appearance.

This level of specificity is enabled because Visual Recognition now provides on average nine or more labels for each image — this is up from an average two to three labels per image generated by our previous version. We achieved this major step forward by using a very large set of training images from a broad variety of photographic scenes and a distributed network of Graphics Processing Units (GPUs). Watson soaked up all that information into a convolutional neural network with tens of thousands of tags in its vocabulary. We also developed new methods for inferencing that use semantic reasoning to optimize the specificity, saliency and accuracy of tags produced by the service.

Of course, many enterprises have custom data that they want to create their own private classifiers for, and Watson Visual Recognition also features custom training and classification. When there is a need to learn a new set of image labels for a specific domain, like a company’s product portfolio, the service allows developers to quickly train and “plug-in” new custom models, just by providing example images.  Applications can then use the custom models in conjunction with the base tagging service to provide both domain specific custom-learned labels and a broad set of built-in labels. Custom classifiers can be improved over time by adding new training examples as well.

This development to Visual Recognition is an important step in our continuous journey of bringing the power of sight to Watson. It builds on a growing foundation of world-class research and development in Visual Comprehension that is breaking new ground on challenges ranging from using image analysis to improve the care of patients with skin cancer, to advancing technology for automatic image captioning, to pushing the boundaries of AI and creativity for making the world’s first cognitive film trailer.

Are you ready to bring the power of Watson’s vision to your images and data? You can learn more about our Visual Recognition service here.

More stories

IBM Research Contributes to z15 Launch with Hybrid Cloud, Security Breakthroughs

Today, IBM is launching the new z15 mainframe, the culmination of four years of collaborative development company-wide, with a focus on meeting crucial customer data security and privacy needs across hybrid multicloud environments. To build this ground-breaking new system to meet these client demands, IBM Research partnered with IBM Systems to help develop a new […]

Continue reading

Helping to Untangle Cancer Drug Resistance with Data

Why do targeted cancer therapies often fail? We have acquired so much more understanding about cancer in the last fifty years than in the last five thousand years. Approaches to patient treatments have dramatically changed, and statistics show significant improvement in patient response and outcomes to therapy in the last half a century [1]. Yet […]

Continue reading

Overcoming Challenges in Building Enterprise AI Assistants

A team of researchers from IBM Research AI and AI Horizons Network-partner the University of Michigan published the papers “A Large-Scale Corpus for Conversation Disentanglement” and “Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use” at ACL 2019. This work address two main challenges in building enterprise AI assistants.

Continue reading