Cognitive Computing

Advancing Visual Recognition and Unlocking Data in Plain Sight

Share this post:

The past decade has been defined by an unprecedented amount of visual content humans have been able to generate – from social media, to entertainment and manufacturing, to even the satellites circling earth from high above the buzz of daily life. With recent advancements in cognitive technologies like large-scale deep learning and semantic-facet based visual modeling, we’ve started to accelerate our ability to discover insights from this data — but it’s still been a challenge to go beyond recognizing a baseline level of detail.

Today, IBM is making an important step forward in advancing this ability by rolling out a significant update to the image classifier capability in Watson Visual Recognition, a service that allows users to understand the contents of an image or video frame. Its active vocabulary is over 2.5 greater in size than the previous model, with a built-in set of tens of thousands of visual labels. This enhancement greatly improves the service’s ability to recognize highly specific visual concepts.

These new, built-in labels cover a broad set of visual concepts that includes objects, people, places, activities, scenes, and many more categories as well as fine-grain attributes, such as specific colors.  The depth of each category, to more specific labels, has also been increased. The result is a built-in classifier that, for many typical photos, can give both more specific and more accurate labels. It augments the description with more general tags based on a hierarchy – such as knowing a “horse” is an “animal”. The service also makes fine distinctions that produce highly specific labels. For example, given a photo of “people having an enjoyable dining experience,” the service can now recognize that the scene is not just a restaurant but specifically is a beer garden based on its visual appearance.

This level of specificity is enabled because Visual Recognition now provides on average nine or more labels for each image — this is up from an average two to three labels per image generated by our previous version. We achieved this major step forward by using a very large set of training images from a broad variety of photographic scenes and a distributed network of Graphics Processing Units (GPUs). Watson soaked up all that information into a convolutional neural network with tens of thousands of tags in its vocabulary. We also developed new methods for inferencing that use semantic reasoning to optimize the specificity, saliency and accuracy of tags produced by the service.

Of course, many enterprises have custom data that they want to create their own private classifiers for, and Watson Visual Recognition also features custom training and classification. When there is a need to learn a new set of image labels for a specific domain, like a company’s product portfolio, the service allows developers to quickly train and “plug-in” new custom models, just by providing example images.  Applications can then use the custom models in conjunction with the base tagging service to provide both domain specific custom-learned labels and a broad set of built-in labels. Custom classifiers can be improved over time by adding new training examples as well.

This development to Visual Recognition is an important step in our continuous journey of bringing the power of sight to Watson. It builds on a growing foundation of world-class research and development in Visual Comprehension that is breaking new ground on challenges ranging from using image analysis to improve the care of patients with skin cancer, to advancing technology for automatic image captioning, to pushing the boundaries of AI and creativity for making the world’s first cognitive film trailer.

Are you ready to bring the power of Watson’s vision to your images and data? You can learn more about our Visual Recognition service here.

More stories

Using SecDevOps to design and embed security and compliance into development workflows

IBM Research has initiated focused efforts called Code Risk Analyzer to bring security and compliance analytics to DevSecOps. Code Risk Analyzer is a new feature of IBM Cloud Continuous Delivery, a cloud service that helps provision toolchains, automate builds and tests, and control quality with analytics.

Continue reading

IBM Research and the Broad Institute Seek to Unravel the True Risks of Genetic Diseases

In 2019, IBM and the Broad Institute of MIT and Harvard started a multi-year collaborative research program to develop powerful predictive models that can potentially enable clinicians to identify patients at serious risk for cardiovascular disease (1, 2). At the start of our collaboration, we proposed an approach to develop AI-based models that combine and […]

Continue reading

Impact of the SQL Relational Model 50 years later

Fifty years ago this month, IBM researcher and computing pioneer Edgar Frank Codd published the seminal paper “A Relational Model of Data for Large Shared Data Banks,” which became the foundation of Structured Query Language (SQL), a language originally built to manage structured data with relational properties. Today SQL is one of the world’s most […]

Continue reading