Best practices for custom classifiers in Watson Visual Recognition

Share this post:

Since the launch of the Visual Recognition API this past May, we’ve seen users help California save water, perform infrastructure inspections with drones, and even find Pokemon. Powering many of these use cases are custom classifiers, a feature within Visual Recognition that allows users to train Watson on almost any visual content.

Custom classifiers can be highly powerful but require careful training and content considerations to be properly optimized. Through our user conversations, we’ve assembled a best practices guide below to help you get the most out of your custom classifiers.

How training can increase Watson Visual Recognition’s quality

The accuracy you will see from your custom classifier depends directly on the quality of the training you perform. Clients in the past who closely controlled their training processes have observed greater than 98% accuracy for their use cases. Accuracy – different from confidence score – is based on a ground truth for a particular classification problem and particular data set.

Clients who closely control their image training processes observed greater than 98% accuracy

As a best practice, clients often create a ground truth to benchmark against human classification. Note that often humans make mistakes in classifications due to fatigue, reputation, carelessness, or other problems of the human condition.

On a basic level, images in training and testing sets should resemble each other. Significant visual differences between training and testing groups will result in poor performance results.

There are a number of additional factors that will impact the quality of your training beyond the resolution of your images. Lighting, angle, focus, color, shape, distance from subject, and presence of other objects in the image will all impact your training. Please note that Watson takes a holistic approach when being trained on each image. While it will evaluate all of the elements listed above, it cannot be tasked to exclusively consider a specific element.

The API will accept as few as 10 per class, but we strongly recommend using a significantly greater amount of images to improve the performance and accuracy of your classifier such as 100s or 1000s of images.

What is the score that I see for each tag?

Each returned tag will include a confidence score between 0 and 1. This number does not represent a percentage of accuracy, but instead indicates Watson’s confidence in the returned classification based on the training data for that classifier. The API will classify for all classes in the classifier, but you can adjust the threshold to only return results above a certain confidence score.

The custom classifier scores can be compared to one another to compare likelihoods, but they should be viewed as something that is compared to the cost/benefit of being right or wrong, and then a threshold for action needs to be chosen. Be aware that the nature of these numbers may change as we make changes to our system, and we will communicate these changes as they occur.

Examples of difficult use cases

While Watson Visual Recognition is highly flexible, there have been a number of recurring use case that we’ve seen the API either struggle on or require significant pre/post-work from the user.

  • Face Recognition: Visual Recognition is capable of face detection (detecting the presence of faces) not face recognition (identifying individuals).
  • Detecting details: Occasionally, users want to classify an image based on a small section of an image or details scattered within an image. Because Watson analyzes the entire image when training, it may struggle on classifications that depend on small details. Some users have adopted the strategy of breaking the image into pieces or zooming into relevant parts of an image. See this hail classification use case as an example (video).
  • Emotion: Emotion classification (whether facial emotion or contextual emotion) is not a feature currently supported by Visual Recognition. Some users have attempted to do this through custom classifiers, but this is an edge case and we cannot estimate the accuracy of this type of training.

Examples of good and bad training images

GOOD: The following images were utilized for training and testing by our partner OmniEarth. This demonstrates good training since images in training and testing sets should resemble each other in regards to angle, lighting, distance, size of subject, etc. See the case study OmniEarth: Combating drought with IBM Watson cognitive capabilities for more details.

Training images:
Images courtesy of IBM Watson partner OmniEarth

Testing image:
Images courtesy of IBM Watson partner OmniEarth

BAD: The following images demonstrate bad training since the training image shows a close-up shot of a single apple while the testing image shows a large group of apples taken from a distance with other visual items introduced (baskets, sign, etc). It’s entirely possible that Watson may fail to classify the test image as ‘apples,’ especially if another class in the classifier contains training images of a large group of round objects (such as peaches, oranges ,etc).

Training image:
Photo by adrianbartel / CC BY 2.0

Testing image:
Photo by Mike Mozart / CC BY 2.0

BAD: The following images demonstrate bad training since the training image shows a close-up shot of a single sofa in a well-lit, studio-like setting while the testing image show a sofa that is partially cut off, farther away, and situated among many other objects in a real world setting. Watson may not be able to properly classify the test image due to the number of other objects cluttering the scene.

Training image:
Photo by jingdianjiaju2 / CC BY-SA 2.0

Testing image:
Sofa in living room

Need help or have questions?

We’re excited to see what you build with Watson Visual Recognition, and we’re happy to help you along the way. Try the custom classifiers feature and share any questions or comments you have on our developerWorks forums.

Michel Jonker

Thanks for the tips, it helped with my lego project:

The key is making the training pictures under similar circumstances!

Comments are closed.

More Watson Stories

Build and deploy a MEAN stack application on IBM Cloud

MEAN is a collection of JavaScript-based technologies — MongoDB, Express.js, AngularJS, and Node.js — used to develop web applications. From the client and server sides to databases, MEAN is a full-stack development toolkit. This tutorial walks you through the creation of a web application using the popular MEAN stack. It is composed of a Mongo DB, Express web framework, Angular front-end framework and a Node.js runtime.

Continue reading

A hybrid Cordova mobile app with Push and Analytics in minutes

As promised while introducing "Tutorials to get your mobile development up and running", we are continuing our efforts to add more mobile solution tutorials. After quickly scaffolding a native iOS-Swift and Android mobile app with Push and Analytics, it's time to bring in the hybrid mobile app development flavor to the game with Cordova - Apache Cordova is an open-source mobile development framework. It allows you to use standard web technologies - HTML5, CSS3, and JavaScript for cross-platform development.

Continue reading

Use your own branded UI for user sign-in with App ID

With IBM Cloud App ID’s Cloud Directory feature, you can create a user registry, and add sign-up and email/password sign-in to your mobile or web app. Cloud Directory comes with a pre-built sign-in widget that you can use, or if you prefer to use your own branding, you can replace it with your own custom UI. This blog will show you how to use Cloud Directory APIs and add your own custom sign-in screen to an Android application. You can find the Android Cloud Land App on GitHub.

Continue reading