August 8, 2017 | Written by: Hillery Hunter
Categorized: Artificial Intelligence
Share this post:
“…Each blind man feels a different part of the elephant body, but only one part, such as the side or the tusk. They then describe the elephant based on their partial experience and their descriptions are in complete disagreement on what an elephant is.” – The Rigveda
This parable is helpful describing the problem that we are solving and context for the promising early results we have achieved in image recognition with deep learning. Despite initial disagreement, if these people are given enough time they can share enough information to piece together a pretty accurate collective picture of an elephant.
Deep learning is a widely used AI method to help computers understand and extract meaning from images and sounds through which humans experience much of the world. It holds promise to fuel breakthroughs in everything from consumer mobile app experiences to medical imaging diagnostics. But progress in accuracy and the practicality of deploying deep learning at scale is gated by technical challenges, such as the need to run massive and complex AI models – a process for which training times are measured in days and weeks.
For our part, my team in IBM Research has been focused on reducing these training times for large models with large data sets. Our objective is to reduce the wait-time associated with deep learning training from days or hours to minutes or seconds, and enable improved accuracy of these AI models. To achieve this, we are tackling grand-challenge scale issues in distributing deep learning across large numbers of servers and GPUs.
As a result, when we scaled to a large cluster with 100s of GPUs, it yielded record image recognition accuracy of 33.8% on 7.5M images from the ImageNet-22k dataset vs. the previous best published result of 29.8% by Microsoft.
Read more about our record image recognition accuracy on my article on the IBM Research blog.