IBM advances Watson’s image captioning accuracy

Share this post:

Republished from Watson Blog

Watson Visual RecognitionVisual and natural language comprehension are rapidly evolving areas of artificial intelligence (AI). A prime example is image captioning – the task of generating one or more natural language descriptions for an image, relying solely on the visual input – which demonstrates a machine’s comprehension of the visual content as well as its ability to describe that content in natural language. The image captioning task continues to be a very active area of research in academic and industrial research labs.

I am very proud to announce that recently IBM Watson submitted its first entry to the Microsoft COCO Image Captioning Challenge, an ongoing competition since 2015, and is currently in the top spot on the leaderboard!

The results obtained by the Watson entry on various evaluation metrics can be viewed on the codalab results page (row labeled “etiennem”) and also on the MSCOCO results page (Watson Multimodal entry under Table-C5 or Table-C40).

Examples of captions generated using Watson system include:

blue boat analyzed by Watson image captioning

Watson says: “A blue boat is sitting on the side of a building”


green bird in bowl analyzed by Watson image captioning

Watson says: “A green bird sitting on top of a bowl”


woman at table with giraffe analyzed by Watson image captioning

Watson says: “A woman sitting on a table with a giraffe”


We attribute our captioning accuracy to three core approaches:

  1. Careful design and optimization when building the captioning system and learning the parameters and hyper-parameters.
  2. Judicious use of an attention mechanism, where the system evaluates and seeks to describe individual components of an image (rather than just the image as a whole) to create an evolving description of the complete picture.
  3. An innovative twist in reinforcement learning approach we used to optimize the captioning system.

Further details on some aspects of the work can be obtained from the arXiv paper here.

If you’re interested in learning more about IBM’s success and how Watson is leading the way in visual and language comprehension, please let me know and I’d be happy to help.

More Watson stories
May 7, 2019

We’ve Moved! The IBM Cloud Blog Has a New URL

In an effort better integrate the IBM Cloud Blog with the IBM Cloud web experience, we have migrated the blog to a new URL: www.ibm.com/cloud/blog.

Continue reading

April 19, 2019

Reach Out to the IBM Cloud Development Teams on Slack

Get the help you need fast—directly from the IBM Cloud Development Teams and other users on Slack.

Continue reading

April 11, 2019

Permanent Redirect to cloud.ibm.com from console.bluemix.net

Starting on April 27, 2019, we will be turning on permanent redirects from bluemix.net to cloud.ibm.com. All of the same functionality that existed on bluemix.net is still available in cloud.ibm.com.

Continue reading