IBM Content Manager OnDemand, Version 9.5

OS/390 indexer

You can use the OS/390 indexer to extract index data from and generate index data about line data and AFP reports. In addition, other data types, such as TIFF images, can be captured using the ANYSTORE Exit.
Restriction: The OS/390 indexer is supported on the z/OS and AIX platforms.
The OS/390® indexer extracts indexes and stores documents in a single pass of reading the input data. The OS/390 indexer indexes reports based on the organization of the data in the report. The OS/390 indexer processes two input sources:
  • Indexing parameters that specify how the data should be indexed. You can create the indexing parameters when you define a Content Manager OnDemand application. The parameters are of the same form as used by ACIF, along with some extensions which are unique to the OS/390 indexer.
  • The print data stream.
The OS/390 indexer indexes input data based on the organization of the data:
  • AFP reports. For AFP reports, the index values are already specified within the AFP data stream.
  • Document organization. For reports made up of logical items, such as statements, policies, and invoices. The OS/390 indexer can generate index data for each logical item in the report.
  • Report organization. For reports that contain line data with sorted values on each page, such as a transaction log or general ledger. The OS/390 indexer can divide the report into groups of pages and generate index data for each group of pages.
  • Anystore Exit. This exit determines the content and index values of each document.
  • Large Object. Large object support is designed to provide enhanced usability and better retrieval performance for reports that contain very large documents by segmenting the documents into groups of pages and downloading only the page groups that the users request to view.
Before you can index a report with the OS/390 indexer, you must create a set of indexing parameters. The indexing parameters describe the physical characteristics of the input data, identify where in the data stream that the OS/390 indexer can locate index data, and provide other directives to the OS/390 indexer. Collecting the information needed to develop the indexing parameters requires a few steps. For example:
  1. Examine the input data to determine how users use the report, including what information they need to retrieve a report from the system (indexing requirements).
  2. Create parameters for indexing.

You run the OS/390 indexer as part of the Content Manager OnDemand load process with the ARSLOAD program. The Content Manager OnDemand application retrieves the indexing parameters from the Content Manager OnDemand database and uses the parameters to process the input data.

The OS/390 indexer can logically divide reports into individual items, such as statements, policies, and bills. You can define up to 128 index fields for each item in a report.

The OS/390 indexer has been enhanced to allow for the storage of documents (or large object segments) that exceed 2 GB. A report might contain multiple documents (or large object segments) each of which exceeds 2 GB in size. This enhancement does not affect the limitations imposed by other indexers.

The limitations on the document size are based on the available hardware and any other limitations placed on the operating environment:
  1. If the document (or large object segment) size exceeds 20 MB, then the document data is temporarily stored in the OnDemand temporary HFS directory (described below). Therefore, if the largest document is 6 GB, then the temporary HFS directory must have at least 6 GB of available space.

    If the available HFS disk space is not sufficient to store the largest document in the report, the load fails.

    The temporary HFS directory is defined by one of these options:
    • The -c option in the ARSLOAD parameters. If this is not specified, then:
    • The environment variable ARS_TMP. If this is not specified, then:
    • The environment variable TEMP. If this is not specified, then:
    • The current working directory.
  2. In the final load stage, the complete document (or large object segment) needs to be loaded into memory. Therefore, if the document (or large object segment) is 6 GB in size, then the load program needs to be able to acquire 6 GB of memory to load the data. If the available memory is not sufficient to store the largest document in the report, the load fails.

Any data type can be captured using the OS/390 indexer. Native support exists for line data and AFP data. Other data types, such as PDF and TIFF images, can be captured by using the Anystore Exit. This provides a method to capture documents of any type and size (including those greater than 2 GB) into Content Manager OnDemand.

Indexing

Indexing parameters include information that allow the OS/390 indexer to identify key items in the input data stream so they can be extracted from the report and stored in the Content Manager OnDemand database. Content Manager OnDemand uses these index values for efficient, structured search and retrieval.

The OS/390 indexer uses the following methods to determine the index values for each document within a report.
  • AFP Reports. The OS/390 indexer can capture fully resolved AFP data streams (AFPDS). The AFPDS must contain the index values either in the form of TLE or NOP records. For details on these record types, see INDEXSTYLE.
    You can capture AFP resources in either of the following ways:
    • The resources are in-stream at the beginning of the AFPDS. In this case, the Begin Resource Group (BRG) record and End Resource Group (ERG) record must occur prior to the Begin Document (BDT) record.
    • The resources are in a separate input file and specified in the ARSLOAD JCL via a RESOURCE ddname.
    In either case, only resource records beginning with the BRG record and ending with the ERG record are captured and stored in the Content Manager OnDemand database.
  • Line Print Reports. Line Print Reports consist of text formatted print streams. Column one of each record contains a carriage control character.

    You specify the index information that allows the OS/390 indexer to segment the print stream into individual items called groups. A group is a collection of one or more pages. You define the bounds of the collection, for example, a bank statement, insurance policy, phone bill, or other logical segment of a report file. A group can also represent a specific number of pages in a report. For example, you might decide to segment a 10,000 page report into groups of 100 pages. The OS/390 indexer creates indexes for each group. Groups are determined when the value of an index changes (for example, account number) or when the maximum number of pages for a group is reached.

    An indexing parameter is made up of an attribute name (for example, Customer Name) and an attribute value (for example, Earl Hawkins). The parameters include pointers that tell the OS/390 indexer where to locate the attribute information in the data stream. For example, the tag Account Number with the pointer 1,21,16 means that the OS/390 indexer can expect to find Account Number values starting in column 21 of specific input records. The OS/390 indexer collects 16 bytes of information starting at column 21 and adds it to a list of attribute values found in the input. For each group that is identified by the OS/390 indexer, a set of index values that are associated with the group are stored by the Content Manager OnDemand load process into the Content Manager OnDemand database.

  • Anystore Exits. The use of an Anystore Exit allows for the capture of any type of data. The exit is responsible for reading the data to be captured, breaking it into documents, and determining the index values. A sample Anystore Exit is provided which captures TIFF images using a pre-generated set of indexing instructions read from a separate file.
  • Large Object. Provides enhanced usability and better retrieval performance for reports that contain very large logical items (for example, statements that exceed 500 pages) and files that contain many images, graphics, fonts, and bar codes. Content Manager OnDemand segments data into groups of pages, compressed inside a large object. You determine the number of pages in a group. When the user retrieves an item, Content Manager OnDemand retrieves and uncompresses the first group of pages. As the user navigates pages of the item, Content Manager OnDemand automatically retrieves and uncompresses the appropriate groups of pages. To enable large object support, you must specify INDEXOBJ=ALL in the indexing parameters.

    The INDEXOBJ=ALL parameter is supported for AFP reports as well as Line Print reports.

  • The OS/390 indexer also provides support for line print reports with global and/or local Xerox DJDE records. These documents can be loaded in the same manner as the standard line print reports described earlier with the addition of DJDE record handling logic. The global DJDE records are stored separately from the individual documents and retrieved at print time as required.