Understanding information content with Apache Tika

From the developerWorks archives

Oleg Tikhonov and Chris Mattmann

Date archived: April 18, 2019 | First published: June 15, 2010

With the increasingly widespread use of computers and the pervasiveness the modern Internet has attained, huge amounts of information in many languages are becoming available. Automatic information processing and retrieval is urgently needed to understand content across cultures, languages, and continents. A recent Apache software project, Tika, is becoming an important tool toward realizing content understanding.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some content, steps, or illustrations may have changed.

Zone=Open source
ArticleTitle=Understanding information content with Apache Tika