Extract table information from PDF files using OCR and analytics technology
From the developerWorks archives
Date archived: February 26, 2018 | First published: February 11, 2015
Learn how to build a REST application that provides a web service for converting PDF documents to text using IBM Bluemix. This service accepts a PDF file; converts the PDF file to a text file, capturing identified tables in the document (that is, XML or HTML); and returns the result to the user. The XML version is the output from the OCR engine, while the HTML version is the result of an error-correction process that fixes errors in the table structure identified by the OCR engine.
This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some content, steps, or illustrations may have changed.