November 29, 2018 | Written by: Gal Klein
Categorized: Discovery and Exploration
Share this post:
Audioburst is using AI to make spoken-word audio content — radio programs, podcasts, news and other talk media — as easy to search for as images and documents for the first time ever. According to an Edison Research & Triton digital survey, 44% of Americans aged 12 and up have listened to at least one podcast. In 2018 alone, podcast listeners have increased their listening by 40%, while weekly podcast fans now listen to an average of seven shows. Because audio can be consumed easily while working out, traveling, driving, eating and in any number of other scenarios where it’s difficult or dangerous to be looking at a screen, eyes-free content is one of the fastest growing content-types on the internet.
While audio content is exploding, searching for and finding specific audio content is difficult. The internet allows users to search for virtually anything else, like images, videos and news articles, but if you hear something interesting on a podcast or over the radio, finding that exact clip or bit of information again is nearly impossible.
There is no set standard around the way people archive, search for and discover audio content. This is where the Audioburst experience comes in.
Audioburst listens to source feeds like radio stations and podcasts and segments and indexes millions of minutes of audio in real time. It then uses Watson Natural Language Understanding and in-house natural language processing and segmentation algorithms to make the world’s largest library of audio searchable by content, context or theme. For example, when you want to find the latest news on “Kim Kardashian” or “the Federal Reserve,” you can ask Audioburst to gather the most recent clips, or bursts, through our search engine.
AI enables Audioburst to capture trending topics in audio and make them searchable
How does Audioburst make millions of minutes of audio content searchable?
Audioburst listens to the audio clip, enriches, indexes and then transcribes the content for accurate segmentation. Watson NLU allows Audioburst to understand and respond to a simple search in natural language. After a search is entered, Audioburst instantly analyzes all of the media it’s already ingested, locates extracted insights such as keywords, entities and sentiment, then builds a word cloud to identify related entities as part of the larger story related to the search. For example, if a user wants to find the outcome of a recent Cricket match, a simple search engine might turn up results around the chirping insect, but with Watson NLU, Audioburst is able to serve up a clip that is exactly what the user is looking for, by connecting the search content with the discrete entities present in the word cloud.
For the segmentation part of our architecture, we transcribe millions of minutes a month of audio content. Such content may feature speech, speaker change, music, laughter, clapping, silence and everything else you can expect from an audio program or podcast. Audioburst can detect all those audio cues to help our segmentation algorithms understand exactly when a topic segment begins and when it ends. The audio data is then organized according to topic and stored in our repository for search – which means that content can often be found within seconds of it airing.
Bursts are created on-demand around the topic, context, theme and genre requested by the user, and NLU makes each clip searchable using natural language. In the past, something like this could never be attempted. If a user wanted to hear something they found interesting a second time, they would need to find the clip and listen to the entire broadcast again.
Bringing the listening experience to new heights with Watson NLU
We found Watson as the best service to ensure we have the up-to-date data in order to teach our AI about trending news and topics. Integrating the Watson NLU API into Audioburst’s infrastructure was perhaps the easiest part of building our audio search engine. It took 2 – 3 days, in comparison to the 1.5 years it took for us to build our segmentation algorithm.
The product, in the form of Audioburst Search and Audioburst API for developers, personalizes audio on demand and makes it easy to find incredibly specific terms, ideas and stories within audio possible. Watson technology enables Audioburst to process more than a billion spoken words a month.
Capturing voices around the globe
In listening to audio content from around the world, we realized topics and stories in differing regions require a certain level of deep understanding of local culture. We are excited to continue using Watson NLU, as it adds even more languages to its broad insight portfolio.
We have plans to expand our product both in application and geography and are currently working with brands like Samsung, Bytedance, Nippon Broadcasting, Bose, Harman and Radioline.
We’re continuing to evolve the Audioburst experience to existing and new users through interfaces like virtual assistants, web, mobile, IoT and in-car entertainment systems.
As we take Audioburst to new markets and embark on exciting new avenues, we are thrilled with our continued partnership with IBM Watson.
Learn about the Natural Language Understanding solution that enabled Audioburst to achieve success.