Cognitive Computing

Training Watson to see what’s on your plate

Share this post:

Today, we’re introducing our latest AI research in the form of a new beta feature: the IBM Watson Visual Recognition food model. This feature provides a built-in capability for recognizing 2,000+ different foods within images, providing enhanced specificity and accuracy in this content domain compared to Visual Recognition’s general tagging feature. Using the food model, restaurant diners can easily compare their meals to ones from previous visits to the establishment, while restaurants can better understand how often their food is being shared across social media. The food model is the first of many pre-built custom models that will accelerate the time-to-value for developers to create custom solutions for different domains using Watson Visual Recognition. Like free refill French fries – we think the possibilities are bottomless!


Photo of a platter of oysters with results of the returned tags provided by the Watson Visual Recognition food model

The genesis behind our efforts stemmed from the observation that users of food- and nutrition-logging apps get frustrated by the manual process of tracking their meals.

What if we could train a system to automatically identify the foods at popular restaurant chains and simplify food logging? With frequent lunch-time trips to restaurants near the lab, we took photos of known foods and trained a first version of the food recognition model. This use case was an example of “food in context” – where the system recognized foods from known menus. We could always refer back to the menu if we, or the system, were unsure of what it was seeing. We were never hungry, and often the results of our daily training experiments ended up as leftovers for dinner! But, like a good plate of brownies, we found food visual recognition to be addictive!

The larger challenge we took on was what we called “food in the wild,” where the system doesn’t know the restaurant menu or a user’s food history. We started by searching for images of many different foods online, which produced an initial noisy data set with weakly labeled images. We did a lot of work to match the correct foods to the correct labels to clean up the data set, and today we have the largest known collection of more than 1.5 million labeled food images corresponding to 2,000+ different foods. We further developed a taxonomy around the foods that allowed us to classify foods hierarchically. To improve the system’s accuracy, we came up with a novel idea to exploit this food hierarchy in combination with deep learning methods for fine-grained recognition. This model forms the basis of the Visual Recognition food model.

Using the food model in the Visual Recognition API, Watson focuses specifically on the food shown in the photo. Thus, it is different from general visual tagging, which identifies other information in a photo, such as a plate, knife, blanket, strawberry, table, and people in a picture of food.


Photo of chocolate-covered strawberries and the returned tags provided by the Watson Visual Recognition food model

With the food model, the system homes in only on the food in the photo – in the example here, this would be the strawberries. The accuracy of food identification is only one piece of our model. The system’s recognition goes deeper by performing fine-grain recognition of the foods. In the case of the strawberry dish, it might also tag the photo as “strawberry dipped in chocolate” when that label applies.  Using the hierarchy, the service might also label the photo as a “fruit dish,” which gives a higher-level category for the food. Traditionally, deep learning gives you a list of flat classification scores, but by utilizing the hierarchy and fine-grain classification, we trained the deep learning model to make better mistakes even when a food cannot be identified accurately [i].

As important as it is to teach the system “what is a plate of strawberries” – we had to teach the system what is food and what is not food. To make the service as efficient as possible, the food and non-food classifier and the fine-grained food recognition classifier share most parts of the deep learning networks while having separate branches at the top-level of the network. To make a prediction on a test image, the system only needs a single, very-fast forward pass through the food model to detect and categorize the foods.

Now that Watson has become an expert in recognizing what you’re eating, we’re excited to see the applications and interpretations developers and data scientists will build on our technology!

[i] Hui Wu, Michele Merler, Rosario Uceda-Sosa and John Smith. “Learning to make better mistakes: semantics-aware visual food recognition”. ACM Multimedia Conference, 2016.

Research Staff Member, IBM Research

More Cognitive Computing stories

Using SecDevOps to design and embed security and compliance into development workflows

IBM Research has initiated focused efforts called Code Risk Analyzer to bring security and compliance analytics to DevSecOps. Code Risk Analyzer is a new feature of IBM Cloud Continuous Delivery, a cloud service that helps provision toolchains, automate builds and tests, and control quality with analytics.

Continue reading

IBM Research and the Broad Institute Seek to Unravel the True Risks of Genetic Diseases

In 2019, IBM and the Broad Institute of MIT and Harvard started a multi-year collaborative research program to develop powerful predictive models that can potentially enable clinicians to identify patients at serious risk for cardiovascular disease (1, 2). At the start of our collaboration, we proposed an approach to develop AI-based models that combine and […]

Continue reading

Impact of the SQL Relational Model 50 years later

Fifty years ago this month, IBM researcher and computing pioneer Edgar Frank Codd published the seminal paper “A Relational Model of Data for Large Shared Data Banks,” which became the foundation of Structured Query Language (SQL), a language originally built to manage structured data with relational properties. Today SQL is one of the world’s most […]

Continue reading