Cloud Computing

How to find your grandmother in a wedding video

Share this post:

Or using cognitive computing to divide a video into scenes

Haifa researchers Dror Porat and Daniel Rotman

Haifa researchers Dror Porat and Daniel Rotman

Searching through a video for a specific person, scene, or moment usually means a frustrating and painstaking ‘hunt and peck’ process. Wouldn’t it be great if you could just skip to where your 90-year-old grandmother boogies to Justin Timberlake at your wedding, without having to skim through the entire 3-hour video – yet again? Or what if you could zero in on certain numbers in a video of complex financial content?  Our team at IBM Research Haifa is helping Watson’s cognitive technology transform the way we search inside video —whether for personal viewing or complex business needs.

Personalize your search

We’re developing a practical and efficient cognitive-based technology that can cut a video into sections based on characteristics that you define. For example, a division based on the music played, or based on indoor/outdoor classification, so you can quickly find your groovy grandma. Think of it as a personalized way to divide a video into “chapters” containing similar content along the natural timeline of a video.

With our new video analytics technology, you can pull out an entire “chapter”, say a family hiking trip in the woods that is buried in hours of home video, or skip right to the women’s swimming competitions in a program on the Olympic Games. At the most basic level, the algorithm uses the color layout of the scene, for example identifying the blue water, green trees, or sand for a desert scene. This can then be used to understand the basic flow and changes present, to separate between sections of a video.

The secret behind this technology is a new optimization process based on a scoring system for the division into scenes. The process finds the feature you’re looking for, and then uses a mathematical formula to predict how well the scene retrieved fits the defined characteristic. It’s intuitive and simple to use. Our technology can “watch” a video, then automatically identify its comprising elements and divide it into scenes based on predefined characteristics.

Getting a deeper understanding of content

Our video scene detection technology plays an essential role in new vistas for video analytics. Many business-oriented applications can use this solution to save time and resources doing fast video indexing, summarization, and quick search by topic.

This is where cognitive abilities from Research come into play. The ability to automatically detect scenes can help businesses analyze viewer responses to different segments of an online product or service video, so they can deliver targeted content. A company might want to target specific audiences based on viewing preferences.

Our basic goal is to efficiently divide a video into sections that make sense to the user. Given the timeline of a video, we want a simple and efficient way to segment it according to general features or specific semantic elements, to target specific content. Today, we’re using mainly visual information as a defining feature, but for the future, we’re looking into identifying subtle shifts in a video for sound, visuals, speed, and even emotions. Our video scene detection technology joins other Watson-powered cognitive services for IBM Cloud Video technology.

Using IBM’s cognitive and cloud capabilities to automatically segment videos into scenes, we can help people and companies unlock meaningful information that makes it easier to find and deliver content that matters.





More stories

Label Set Operations (LaSO) Networks for Multi-Label Few-Shot Learning

Data augmentation is one of the leading methods to tackle the problem of few-shot learning, but current synthesis approaches only address the scenario of a single label per image, when in reality real life images may contain multiple objects. The IBM team came up with a novel technique for synthesizing samples with multiple labels.

Continue reading

AI Models Predict Breast Cancer with Radiologist-Level Accuracy

Our team of IBM researchers published research in Radiology around a new AI model that can predict the development of malignant breast cancer in patients within the year, at rates comparable to human radiologists.

Continue reading

RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection

Deep neural networks have demonstrated good results for few-shot learning. However, very few works have investigated the problem of few-shot object detection. A team of IBM researchers developed a novel approach for Distance Metric Learning (DML).

Continue reading