March 8, 2016 | Written by: Alyssa Simpson
Share this post:
Do you remember learning how to read? I have a distinct memory of being five years old, spread out on the floor of my kindergarten classroom working on complex shape and color matching workbooks. I was an eager kindergartener, and we raced through those workbooks as quickly as we could. I have to admit though – I couldn’t read very well – I was following the queues of my peers. I can remember being asked to read a sentence out loud and struggling to put the letters together. For me, and many other children, learning to read was difficult. With the advancement of technology, teachers (albeit a different kind) are now looking to teach computers to read as well, and it isn’t as easy as you might think.
Today, Watson Vision Services are learning to tell the difference between shapes, colors, and words. Like a child, we are teaching the AlchemyVision API how to read. While we are just getting started, we have a limited offering available in BETA that can decipher English-language words from an image. First the API looks for a bounding box looking to understand where the words are located on an image. Once the image has been located, it then sounds out the word, matching it against known English language words in a dictionary.
In the computer vision world, this is called Optical Character Recognition. IBM DataCap has been doing this a long time, optimized for document scanning achieving impressive accuracy – over 90%. Watson Vision Services are focused on tackling a different challenge for OCR technologies from natural scene images or objects seen “in the wild”, such as a picture of a product on social media or a street sign. These types of words and images are much harder to pick out than traditional document scanning, but we are excited to take the first steps on this journey.