Large Database Connector

The Large Database connector enables IBM® Watson Explorer Engine applications to crawl large database repositories and index the information that they contain and is designed to be used with IBM InfoSphere Data Replication's CDC (CDC), a replication solution that captures database changes as they happen and delivers them to target databases and other applications. Additionally, the Large Database connector is designed to be installed in a CDC environment that provides a JMS service provider to handle database messaging between CDC implementations and Watson Explorer Engine instances.

Note: The Large Database connector is not appropriate for every data repository environment. It will not resolve issues where extremely large amounts of data are contained in databases that are not intended to handle large amounts of data. Additionally, if your database tables are less than 10G, you will probably not want need to incur the infrastructure overhead of the Large Database connector, and can use one of the other database connectors than are included with Watson Explorer Engine.

Significant features of the Large Database connector - Significant features of the Large Database connector include the following:

"Push" connector framework - The Large Database connector provides the ability to "push" to one or more Watson Explorer Engine collections from any server in your network where the Large Database connector is installed.
Service oriented architecture - The Large Database connector is installed as a standalone application intended to be left running.
Linear scalability - The Large Database connector is engineered to be fast and scalable, supporting the efficient crawling and indexing of terabytes of data, without having to redesign or change your system as your data needs increase.
Performance optimized - The Large Database connector supports incremental updates with a low performance load on your data repositories.
Lightweight administration framework - Engineered to be lightweight and optimized for performance, the Large Database connector does not provide a graphical administration tool. It can be administered and configured using commonly available command line utilities such as a text editor.
Data loss prevention engineering - The Large Database connector uses the Java Messaging Service (JMS) to preserve queued crawl data. Should a machine become unavailable in your environment, crawl data is not fully removed from the JMS provider service queue until it is completely crawled and indexed by Watson Explorer Engine.

This document is intended for use by systems administrators tasked with installing and configuring the Large Database connector. Other key audience members include IT management and personnel generally responsible for the maintenance of enterprise databases, Watson Explorer Engine, and Watson Explorer Application Builder applications.

A working knowledge of Watson Explorer Engine and a basic understanding of Watson Explorer Engine administration and configuration is required to follow along in this guide. A similar background in enterprise database administration is assumed.