Demonstrating the continual innovation that takes place around its major sporting events, IBM Research and IBM iX are teaming up to provide “Cognitive Highlights” to The Championships, Wimbledon, the oldest tennis tournament in the world, to demonstrate how AI technology can scale and accelerate the video production process for any media, sports or entertainment company.
It was in April that IBM Research and IBM iX first explored this project, creating the first ever multi-modal system for analyzing golf video for the 2017 Masters Golf Tournament. The proof-of-concept brought together computer vision and other leading AI technologies to listen, watch and learn from a live video feed of the golf tournament and automatically identify and curate the most exciting moments and shots into segments that could be used in online highlight packages.
The solution for Wimbledon will go beyond selecting and curating individual segments for a video editor to choose from, to automatically creating a one to two minute highlights package of matches for the Wimbledon editorial team’s use across the Wimbledon Digital Platforms, and which will be available shortly after each match.
Moreover, instead of a four-day tournament at The Masters, IBM will contend with a 13-day championship at The All England Lawn Tennis & Croquet Club that starts next Monday July 3rd. Video from the matches can quickly add up to hundreds of hours of footage to sift through. This sheer volume of data requires a mix of cognitive technologies and advanced engineering techniques that can integrate multiple data points, audio and visual components to search, discover and extract key scenes or moments.
A key advantage for Wimbledon is that the IBM system is scalable, enabling the tournament content team to automatically capture highlight packages for the matches played outside of the most popular courts, which traditionally may not have had moments curated. Now, with the help of Cognitive Highlights, producers are assisted with a tool that can quickly deliver highlights suggestions from six courts, expanding the number of potential matches that are turned into timely highlight videos for fans to watch and share.
A fusion of metadata and multimedia
For this year’s Championships, the production of highlight reels will rely on a number of steps and technologies. Video of the matches will be collected by IBM soon after their completion. Initial highlight candidates are identified using information from the on-court statistician and other sensors that provide data, for example, on the speed of the ball, number of aces, saved breakpoints, etc. The system then continues to capture segments of a match that could be predictors of an exciting moment using a combination of audio and video AI tools that analyze crowd cheering as well as action recognition (i.e. visuals of player behavior) as well as scoring data. Based on these different modalities, the video segments are rank-ordered and selected to produce the final highlight video for each match.
The combination of this data and these modalities helps the system get the full picture of a match’s most exciting moments, and demonstrates the value of audio and video techniques in helping rank or discover moments that might ordinarily be passed over using pure meta-data analysis. For example, a match may have a number of break points worthy of being considered a highlight, so in order to select the best ones, the system will choose the segments with the highest excitement score measured from the audio and video of the moment. Having more eyes and ears on a match that can distinguish and rank all the various exciting moments of a game allows editorial teams to sift right to the ‘winning’ moments that fans can re-watch after the game.
Once the highlights for the match are determined, the system uses meta-data from the match to generate the graphics to facilitate the production of the one to two minute highlight packages that fans can watch on any of Wimbledon’s digital channels.
The IBM Research team trained the system to recognize crowd cheering and the player’s reaction using videos of matches of previous tournaments, with associated contextual metadata from the on-court statistician to filter out specific content. The technology relies on state-of-the-art deep learning models which provides effective methods for learning new classifiers using a few manually annotated training examples, using active learning techniques.
The value of a technology solution such as Cognitive Highlights is that its utility can go beyond sports. Media and entertainment companies have amassed huge archives of hundreds of thousands of hours of video of program material and other footage that is not easily searched, and where this kind of solution could simplify their production process. We also believe this technology could be extended to provide summarization tools for consumer videos captured by mobile phones or wearable cameras, for example.
Starting Monday July 3rd you can view highlight videos produced by our technology on Wimbledon.com, and let us know what you think!