Data mining in a document world

From the developerWorks archives

Martin Brown

Date archived: January 13, 2017 | First published: February 12, 2013

Predictive analytics, business intelligence, and data mining in general all require the storage and processing of complex and often wildly different data structures as the information is processed, resolved, and summarized. It is highly likely, particularly for business and financial information, that a significant amount of that data comes from relational databases. These follow a strict structure and require a significant amount of preparation in terms of designing your schema and data model beforehand. The new breed of NoSQL and document-based databases make much of this processing simpler because you can create and dump information in a flexible format. Additionally, you can work on methods to extract that data in the fixed format you require. In this article, I look at how to use document-based databases for data processing and analytics as part of your overall database solution.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some steps and illustrations may have changed.

Zone=Big data and analytics, Open source
ArticleTitle=Data mining in a document world