Cognitive Computing

Cognitive AR to give you superhero vision

Share this post:

Live demonstration of our cognitive operations guidance technology at the recent Tel Aviv Watson Summit

Ethan Hadar giving a live demonstration of our cognitive operations guidance technology at the recent Tel Aviv Watson Summit

The other day my son asked me to explain what I do at work. I told him that our team builds technology that can answer your questions just by pointing your phone at what you’re trying to fix or understand. The augmented reality we’re developing will know what it is, and what to do – and tell you.

His answer? “Oh, I get it – like Jarvis helps Ironman!” I hadn’t thought of that, but yes!

Our team at IBM Research – Haifa is using augmented reality to provide visual guidance by superimposing digital information on real objects viewed through mobile devices, tablets, or smart glasses. What we refer to as ‘cognitive operations guidance’ can recognize what’s in the devices’ field of view, answer questions, and even understand and ask about what you’re gesturing toward – the leaky pipe under the sink on the left. This technology has the potential to simplify and clarify many types of interactions in daily life, for consumers and in industry.

What we can do with the help of visual guidance

We are using augmented reality and cognitive conversation to help field technicians find solutions for real-life industry problems in maintenance, support, and manufacturing. We can also help people learn how to use a device, or get instructions for home repairs.

Instead of overwhelming someone with detailed technical information, our cognitive AR can give simple step-by-step instructions. Cognitive operation guidance can help repair a kitchen appliance, or assemble a DIY dinner table (no paper manuals with instructions in origami). It can help even a novice technician solve a complex technical problem, incorporating the use of voice and gestures.

Instructions superimposed on objects

Instructions superimposed on objects

By simply pointing your device camera on the object to fix, the digital instructions of the operations appear on the video screen, and are anchored to the object’s elements even when you move the camera. Clicking on the next steps advances the repair instructions step by step or presents additional links to external information and even IoT telemetry.

By selecting the type of object, the user can load any repair action and start to interact with it. Our object recognition system can be configured to automatically understand what object the user is pointing the camera at, and offer several options for repair. In some cases, a human expert who is at a remote location, can chat with the end-user, understand what is the repair mission, and give the user the correct set of instructions using AR guided-operation.

Cognitive visual guidance also includes a tracking technology, superimposing information on top of the field of view, like where to find a valve that needs to be turned off. The presented information adjusts and correlates as the devices’ visual perspective, location, and orientation in relation to the object changes (helpful when under the sink repairing a disposer).

Think of our cognitive AR assistance as DIY anywhere, for just about anything. No more hunting for online help. It can guide machine maintenance work, making it safer and more efficient. If a technician in the field is caught in a thunderstorm or major winds, and can’t practically access a manual, our technology can offer guidance from an experienced technician or a cognitive computer assistant.

Another use case we’re looking at can help consumers with a variety of tasks via cognitive visual guidance on the screen of your mobile device, such as operating various features on your car console; checking water and oil levels in your vehicle; or just be assisted by AR to find the USB outlet in a new rental car.

Simplicity is the challenge

At IBM Research-Haifa, we’re working on a full-cycle technology. Cognitive AR for operation guidance can recognize language, objects, gestures, and intentions, and then can put it all together to understand what’s the next steps should be. Our goal is to make life easier with meaningful, real-time interaction – or in my son’s words, “turn us into superheroes!”

We’ll be demoing this technology at CVPR 2017 in Honolulu from July 21 – 26.

Read more about what IBM is up to at CVPR 2017 here.

This technology became a reality because of the team:Benjamin Cohen, Leonid Karlinsky, Ran Nissim, Joseph Shtok, Uzi Shvadron, Yochay Tzur, and Boris Vigman.

More Cognitive Computing stories

Pushing the boundaries of convex optimization

Convex optimization problems, which involve the minimization of a convex function over a convex set, can be approximated in theory to any fixed precision in polynomial time. However, practical algorithms are known only for special cases. An important question is whether it is possible to develop algorithms for a broader subset of convex optimization problems that are efficient in both theory and practice.

Continue reading

Making Neural Networks Robust with New Perspectives

IBM researchers have partnered with scientists from MIT, Northeastern University, Boston University and University of Minnesota to publish two papers on novel attacks and defenses for graph neural networks and on a new robust training algorithm called hierarchical random switching at IJCAI 2019.

Continue reading

Improving the Scalability of AI Planning when Memory is Limited

We report new research results relevant to AI planning in our paper, "Depth-First Memory-Limited AND/OR Search and Unsolvability in Cyclic Search Spaces," presented at the International Joint Conference on Artificial Intelligence, IJCAI-19.

Continue reading