January 25, 2015 | Written by: IBM Research Editorial Staff
Share this post:
IBM develops analytics and classification technology to provide data for a new kind of multimedia-based search engine
Editor’s note: This posting was authored by Zvi Kons, researcher in the Speech Technologies group at IBM Research – Haifa
When you walk down a busy street, do you ever notice the sounds that surround you? People, traffic, music; city sounds are often like the foreign language the couple next to you at the café is speaking—background chatter. That city buzz, though, together with related visual images, has the potential to generate a continuous stream of information that can indicate real-time dynamics of the city.
To gather, process, analyze and ultimately separate useful sound from white noise, my team at IBM Research-Haifa is working on new technology for searchable audio analysis as part of the EU-funded project called SMART (Search Engine for Multimedia Environment Generated Content).
We’re developing algorithms and an engine to analyze those city sounds, extracting information that can be cross referenced with video images to generate real-time content. Our research on audio classification is an integral aspect of a new kind of internet search engine that could provide locally oriented, readily available and informative content with practical applications.
Capturing the sights and sounds of city streets to gain insight
Our team collected data from two locations in Santander, Spain. Because the municipality is a partner in the SMART project, they offered to support the technical aspects of the infrastructure needed and are helping test the technology. Cameras and microphones set up in the town square and market area provided continuous audio and visual data of normal daily activity for one month, collecting more than 1,000 hours of data. We analyzed the sounds to note various types of activities, and to identify patterns and anomalies, like peak hours for busy crowds in the market square, traffic, and special events.
Santander city square
|Visual representation of weekly audio from the city square
The audio from the video above and others produced this diagram that shows a visual representation of the weekly crowd activity level; blue for low activity, red for high activity.
Another sample detected a day with unusual crowd noise, music, and applause. By cross-referencing with video footage from nearby street cameras, it turned out to be from a protest rally on a nearby street, which could be important information for analyzing any immediate security risk, or the need to send a news team to report on a developing story.
Listen to the mid-day rally as it passed on the top right corner of the frame:
The sounds of privacy
To address potential privacy and legal issues, the SMART team used wide angles and low resolution for the video cameras. The microphones were placed at a distance to pick up crowd noise rather than intelligible speech or individual conversations.
The idea behind SMART’s new multi-media-based search engine is the incorporation of information gleaned from the environment. We can use data from city sounds and video images, as well as social media like tweets, to identify events and situations in real-time and make that information available online. The sounds of the city can help identify a drunken brawl, a spontaneous demonstration, a musical event, or an accident during rush hour. This kind of readily available information could be valuable for security systems, municipal and media use, and helpful knowledge for city residents.
Our research highlights the enormous potential of easily accessible information in our physical surroundings. The technology to use that information has exciting and practical applications for smart cities, with innovative ways to interpret sounds and images.