December 12, 2016 | Written by: Vaibhava Goel
Categorized: Community | Trending | Watson
Share this post:
Republished from Watson Blog
Visual and natural language comprehension are rapidly evolving areas of artificial intelligence (AI). A prime example is image captioning – the task of generating one or more natural language descriptions for an image, relying solely on the visual input – which demonstrates a machine’s comprehension of the visual content as well as its ability to describe that content in natural language. The image captioning task continues to be a very active area of research in academic and industrial research labs.
I am very proud to announce that recently IBM Watson submitted its first entry to the Microsoft COCO Image Captioning Challenge, an ongoing competition since 2015, and is currently in the top spot on the leaderboard!
The results obtained by the Watson entry on various evaluation metrics can be viewed on the codalab results page (row labeled “etiennem”) and also on the MSCOCO results page (Watson Multimodal entry under Table-C5 or Table-C40).
Examples of captions generated using Watson system include:
Watson says: “A blue boat is sitting on the side of a building”
Watson says: “A green bird sitting on top of a bowl”
Watson says: “A woman sitting on a table with a giraffe”
We attribute our captioning accuracy to three core approaches:
- Careful design and optimization when building the captioning system and learning the parameters and hyper-parameters.
- Judicious use of an attention mechanism, where the system evaluates and seeks to describe individual components of an image (rather than just the image as a whole) to create an evolving description of the complete picture.
- An innovative twist in reinforcement learning approach we used to optimize the captioning system.
Further details on some aspects of the work can be obtained from the arXiv paper here.
If you’re interested in learning more about IBM’s success and how Watson is leading the way in visual and language comprehension, please let me know and I’d be happy to help.