You can use the OS/390 indexer to extract
index data from and generate index data about line data and AFP reports. In
addition, other data types, such as TIFF images, can be captured using
the ANYSTORE Exit.
Restriction: The OS/390 indexer is
supported on the z/OS and AIX platforms.
The OS/390® indexer extracts indexes and stores
documents in a single pass of reading the input data. The OS/390 indexer
indexes reports based on the organization of the data in the report.
The OS/390 indexer processes two input sources:
- Indexing parameters that specify how the data should be indexed.
You can create the indexing parameters when you define a Content Manager OnDemand application. The parameters
are of the same form as used by ACIF, along with some extensions which
are unique to the OS/390 indexer.
- The print data stream.
The OS/390 indexer indexes input data based on
the organization of the data:
- AFP reports. For AFP reports, the index values are
already specified within the AFP data stream.
- Document organization. For reports made up of logical items, such
as statements, policies, and invoices. The OS/390 indexer
can generate index data for each logical item in the report.
- Report organization. For reports that contain line data with sorted
values on each page, such as a transaction log or general ledger.
The OS/390 indexer can divide the report into
groups of pages and generate index data for each group of pages.
- Anystore Exit. This exit determines the content and
index values of each document.
- Large Object. Large object support is designed to provide enhanced
usability and better retrieval performance for reports that contain
very large documents by segmenting the documents into groups of pages
and downloading only the page groups that the users request to view.
Before you can index a report with the OS/390 indexer,
you must create a set of
indexing parameters. The indexing
parameters describe the physical characteristics of the input data,
identify where in the data stream that the OS/390 indexer
can locate index data, and provide other directives to the OS/390 indexer.
Collecting the information needed to develop the indexing parameters
requires a few steps. For example:
- Examine the input data to determine how users use the report,
including what information they need to retrieve a report from the
system (indexing requirements).
- Create parameters for indexing.
You run the OS/390 indexer as part of the Content Manager OnDemand load process with the ARSLOAD
program. The Content Manager OnDemand application
retrieves the indexing parameters from the Content Manager OnDemand database and uses the parameters
to process the input data.
The OS/390 indexer can logically divide reports
into individual items, such as statements, policies, and bills. You
can define up to 128 index fields for each item in a report.
The OS/390 indexer
has been enhanced to allow for the storage of documents (or large
object segments) that exceed 2 GB. A report might contain multiple
documents (or large object segments) each of which exceeds 2 GB in
size. This enhancement does not affect the limitations imposed by
other indexers.
The limitations on the document size are based
on the available hardware and any other limitations placed on the
operating environment:
- If the document (or large object segment) size exceeds 20 MB,
then the document data is temporarily stored in the OnDemand temporary
HFS directory (described below). Therefore, if the largest document
is 6 GB, then the temporary HFS directory must have at least 6 GB
of available space.
If the available HFS disk space is not sufficient
to store the largest document in the report, the load fails.
The
temporary HFS directory is defined by one of these options:
- The -c option in the ARSLOAD parameters.
If this is not specified, then:
- The environment variable ARS_TMP. If this is
not specified, then:
- The environment variable TEMP. If this is not specified, then:
- The current working directory.
- In the final load stage, the complete document (or large object
segment) needs to be loaded into memory. Therefore, if the document
(or large object segment) is 6 GB in size, then the
load program needs to be able to acquire 6 GB of
memory to load the data. If the available memory is not sufficient
to store the largest document in the report, the load fails.
Any data type can be captured using the OS/390 indexer.
Native support exists for line data and AFP data. Other data types,
such as PDF and TIFF images, can be captured by using the Anystore
Exit. This provides a method to capture documents of
any type and size (including those greater than 2 GB) into Content Manager OnDemand.
Indexing
Indexing parameters
include information that allow the OS/390 indexer
to identify key items in the input data stream so they can be extracted
from the report and stored in the Content Manager OnDemand database. Content Manager OnDemand uses these index values
for efficient, structured search and retrieval.
The OS/390 indexer
uses the following methods to determine the index values for each
document within a report.
- AFP Reports. The OS/390 indexer can capture fully
resolved AFP data streams (AFPDS). The AFPDS must contain the index
values either in the form of TLE or NOP records. For details on these
record types, see INDEXSTYLE.
You
can capture AFP resources in either of the following ways:
- The resources are in-stream at the beginning of the AFPDS. In
this case, the Begin Resource Group (BRG) record and End Resource
Group (ERG) record must occur prior to the Begin Document (BDT) record.
- The resources are in a separate input file and specified in the
ARSLOAD JCL via a RESOURCE ddname.
In either case, only resource records beginning with the BRG
record and ending with the ERG record are captured and stored in the
Content Manager OnDemand database.
- Line Print Reports. Line Print Reports consist of text formatted
print streams. Column one of each record contains a carriage control
character.
You specify the index information that allows the OS/390 indexer
to segment the print stream into individual items called groups.
A group is a collection of one or more pages. You define the bounds
of the collection, for example, a bank statement, insurance policy,
phone bill, or other logical segment of a report file. A group can
also represent a specific number of pages in a report. For example,
you might decide to segment a 10,000 page report into groups of 100
pages. The OS/390 indexer creates indexes for each group.
Groups are determined when the value of an index changes (for example,
account number) or when the maximum number of pages for a group is
reached.
An indexing parameter is made up of an attribute
name (for example, Customer Name) and an attribute value (for
example, Earl Hawkins). The parameters include pointers that tell
the OS/390 indexer where to locate the attribute
information in the data stream. For example, the tag Account
Number with the pointer 1,21,16 means that
the OS/390 indexer can expect to find Account
Number values starting in column 21 of specific input records. The OS/390 indexer
collects 16 bytes of information starting at column 21 and adds it
to a list of attribute values found in the input. For each group that
is identified by the OS/390 indexer, a set of index
values that are associated with the group are stored by the Content Manager OnDemand load process into the Content Manager OnDemand database.
- Anystore Exits. The use of an Anystore Exit allows for the capture
of any type of data. The exit is responsible for reading the data
to be captured, breaking it into documents, and determining the index
values. A sample Anystore Exit is provided which captures TIFF images
using a pre-generated set of indexing instructions read from a separate
file.
- Large Object. Provides enhanced usability and better retrieval
performance for reports that contain very large logical items (for
example, statements that exceed 500 pages) and files that contain
many images, graphics, fonts, and bar codes. Content Manager OnDemand segments data into groups
of pages, compressed inside a large object. You determine the number
of pages in a group. When the user retrieves an item, Content Manager OnDemand retrieves and uncompresses
the first group of pages. As the user navigates pages of the item, Content Manager OnDemand automatically retrieves
and uncompresses the appropriate groups of pages. To enable large
object support, you must specify INDEXOBJ=ALL in the indexing parameters.
The INDEXOBJ=ALL parameter
is supported for AFP reports as well as Line Print reports.
- The OS/390 indexer also provides support for line
print reports with global and/or local Xerox DJDE records. These
documents can be loaded in the same manner as the standard line print
reports described earlier with the addition of DJDE record handling
logic. The global DJDE records are stored separately from the individual
documents and retrieved at print time as required.