July 31, 2017 | Written by: Sujatha Kashyap
Share this post:
As smart as the phones in our pockets are, they’re not built to interpret the written word. Low-literate adults, of whom there are 757 million worldwide, including 32 million in the United States, instead rely on visual cues for recognition and comprehension. So, as part of IBM Research’s Science for Social Good initiative, our team in Austin, Texas, reached out to the Literacy Coalition of Central Texas with an idea for a different approach: use artificial intelligence to turn text into pictures and simple verbal messages.
A student at the Literacy Coalition of
Central Texas scans a grocery store
item with Simpler Voice.
The mobile app project, called Simpler Voice, can parse complex text of everything from product descriptions, instruction manuals, to even public signage. It then extracts and presents simplified images and short spoken messages. Today’s smartphone AI is good at straight text-to-speech and speech-to-text, but not so good at rendering a new way of explaining the world around us.
Simpler Voice weaves together IBM Watson natural language understanding and text-to-speech services with novel image generation code – or in AI research parlance, generative adversarial networks (GAN). These GANs provide alternative visualizations of what the smart phone is looking at.
What better place to pilot Simpler Voice than the grocery store.
Our team’s first test run with the Literacy Coalition’s students will examine a handful of common grocery store items, from shampoo bottles to canned goods. We’ll then have their students pilot the app at a local grocery store – with the ability to scan any product.
Low-literate adults often match images from television ads and newspaper coupons with what they see – and then buy – at the grocery store. This limitation excludes potentially healthier, more economical choices, or quite simply, something they might like better. Simpler Voice opens up the entire store to our students, and we hope soon, to the 19 percent of Texans who are low-literate, and beyond.
Simpler Voice output when scanning
Take dishwasher detergent, for example. An LCCT student sees a box of dishwashing tablets next to the liquid he recognizes and usually buys. Curious, but unsure of what the tablets do, he scans the barcode. Simpler Voice reads the company’s product description, and begins to interpret key words and phrases to verbalize “dishwasher detergent” and provide a visualization of a person using a dishwasher. Simpler Voice can also illustrate instructions on where to put the tablet in a dishwasher. And because these tablets look like candy, it may also communicate a safety warning: “Not candy. Do not let children eat this.”
Simpler Voice may offer the biggest benefit in visualizing and explaining the fine print on over-the-counter medications. Can anyone read that two-point type? With a quick scan, the app can explain how many pain reliever pills an adult or child should to take, how often, as well as warn of potential allergic reactions.
After piloting Simpler Voice in grocery stores around Austin, we hope to work with LCCT on expanding the app’s capability to legal documents, such as apartment rental agreements, and medical documents, like the pages of paperwork required at the doctor’s office.
Simpler Voice is one of IBM’s 15 global Science for Social Good projects launched this summer. Our team, including intern Minh Nguyen, from the University of Southern California, will work alongside LCCT to further develop and deploy the app for their students. Watch this space for news about Simpler Voice’s availability in your favorite app store.