Cloud Computing

How to find your grandmother in a wedding video

Share this post:

Or using cognitive computing to divide a video into scenes

Haifa researchers Dror Porat and Daniel Rotman

Haifa researchers Dror Porat and Daniel Rotman

Searching through a video for a specific person, scene, or moment usually means a frustrating and painstaking ‘hunt and peck’ process. Wouldn’t it be great if you could just skip to where your 90-year-old grandmother boogies to Justin Timberlake at your wedding, without having to skim through the entire 3-hour video – yet again? Or what if you could zero in on certain numbers in a video of complex financial content?  Our team at IBM Research Haifa is helping Watson’s cognitive technology transform the way we search inside video —whether for personal viewing or complex business needs.

Personalize your search

We’re developing a practical and efficient cognitive-based technology that can cut a video into sections based on characteristics that you define. For example, a division based on the music played, or based on indoor/outdoor classification, so you can quickly find your groovy grandma. Think of it as a personalized way to divide a video into “chapters” containing similar content along the natural timeline of a video.

With our new video analytics technology, you can pull out an entire “chapter”, say a family hiking trip in the woods that is buried in hours of home video, or skip right to the women’s swimming competitions in a program on the Olympic Games. At the most basic level, the algorithm uses the color layout of the scene, for example identifying the blue water, green trees, or sand for a desert scene. This can then be used to understand the basic flow and changes present, to separate between sections of a video.

The secret behind this technology is a new optimization process based on a scoring system for the division into scenes. The process finds the feature you’re looking for, and then uses a mathematical formula to predict how well the scene retrieved fits the defined characteristic. It’s intuitive and simple to use. Our technology can “watch” a video, then automatically identify its comprising elements and divide it into scenes based on predefined characteristics.

Getting a deeper understanding of content

Our video scene detection technology plays an essential role in new vistas for video analytics. Many business-oriented applications can use this solution to save time and resources doing fast video indexing, summarization, and quick search by topic.

This is where cognitive abilities from Research come into play. The ability to automatically detect scenes can help businesses analyze viewer responses to different segments of an online product or service video, so they can deliver targeted content. A company might want to target specific audiences based on viewing preferences.

Our basic goal is to efficiently divide a video into sections that make sense to the user. Given the timeline of a video, we want a simple and efficient way to segment it according to general features or specific semantic elements, to target specific content. Today, we’re using mainly visual information as a defining feature, but for the future, we’re looking into identifying subtle shifts in a video for sound, visuals, speed, and even emotions. Our video scene detection technology joins other Watson-powered cognitive services for IBM Cloud Video technology.

Using IBM’s cognitive and cloud capabilities to automatically segment videos into scenes, we can help people and companies unlock meaningful information that makes it easier to find and deliver content that matters.





More stories

Four Papers Advance Computational Argumentation in IBM’s Project Debater

The latest work on computational argumentation from the IBM Project Debater research team group is being presented at the ACL 2019 conference. Three papers will be presented at the main conference and one more paper will be presented in the co-located Argument Mining Workshop.

Continue reading

Hello, OpenAPI-to-GraphQL 1.0.0

IBM cloud researchers released version 1.0.0 of OpenAPI-to-GraphQL, a library to auto-generate GraphQL wrappers for existing REST(-like) APIs. In contrast to other libraries, OASGraph is data-centric, understands swaggers and Open API Specification (OpenAPI 3.0.0) files, sanitizes / de-sanitizes parts of REST APIs not compatible with GraphQL, and makes use of OpenAPI 3.0.0 features like links to generate more usable GraphQL interfaces.

Continue reading

Label Set Operations (LaSO) Networks for Multi-Label Few-Shot Learning

Data augmentation is one of the leading methods to tackle the problem of few-shot learning, but current synthesis approaches only address the scenario of a single label per image, when in reality real life images may contain multiple objects. The IBM team came up with a novel technique for synthesizing samples with multiple labels.

Continue reading