Index considerations

The most significant sizing option for the FTS Server system is the hard disk requirements for the full text index. The FTS Server requires a fast disk subsystem. Because the textual representation of each indexed document is stored in the disk subsystem, a considerable amount of disk space might be needed.

Index size calculation

Although the disk space usage depends on the text in each document, this usage is linear to the original size of the indexed data. Typically, the size of the index on the disk is 50% - 150% of the original text size as illustrated in the following formulas:
minimum disk space = Number of documents x document size x 50%
maximum disk space required = Number of documents x document size x 150%

The actual percentage, 50% through 150%, is data-dependent. So, an exact number can be obtained only by testing with your data.

For example, 100,000 documents of 20 KB each can require about 1500 MB (100,000 x 20 KB x 75%) of disk space.

Note: To determine the text size for AFP and Line Data documents, extract a sample document and use the ARSXAFP server program to determine the text size.

The size of the index is not limited. However, when data is added to or removed from a text index, the text index structure is merged to improve query performance. The required processing time to complete the merger depends on several factors, such as index size and absolute throughput, which in turn depends on the data type and index format. These factors result in practical limits on the total text index size.

For query performance, the biggest impact is the number of matching results, not the size of the text index.

Temporary disk storage

During the indexing process, the server requires additional disk space for temporary storage. The maximum required disk space is approximately four times the total size of the text of the documents that are indexed.

Index location

The full text index is stored within the installation directories of the FTS Server. If you need to change the location of your index information for reasons such as index size or I/O performance, refer to the technical document at www.ibm.com/support/pages/node/6442931