Indexing with metadata indexes

An Adobe PDF document can contain metadata, which is general information such as title and author that applies to the entire document.

You typically create the document’s metadata when the document is created and can modify the metadata at any time. For more information on metadata, see the most current PDF Reference, section 10.2.

When INDEXMODE=METADATA is specified, the PDF indexer extracts fields from the Document Information Dictionary that correspond to the following metadata keywords, if they exist, and places their values into the .ind file:
  • Title
  • Author
  • Subject
  • Creator
  • Producer
  • CreationDate
  • ModDate
  • Trapped
The metadata keywords are the group field names within the .ind file and can be mapped to the application group fields in the application. You can opt not to map any group field names. Because the metadata keywords apply to the entire document, you can index the document only as one group. If TRIGGER, FIELD, or INDEX parameters are specified, they are ignored. Metadata indexing cannot be combined with indexing using a TRIGGER. If the document contains none of these metadata fields, the PDF indexer issues the following error message and stops processing:
ARS4940 Index not found by page page number  
where page number is the number specified in the INDEXSTARTBY parameter.

The PDF indexer converts dates that are specified in the PDF format of (D:YYYYMMDDHHmmSSOHH'mm) to a format of YYYYMMDDHHmmSS. The index values CreationDate and ModDate contain the date formatted with the local time. If the time zone information is specified in the PDF date ( the OHH'mm section ) the PDF indexer creates another index value named CreationDateTZ or ModDateTZ which contains the date formatted with the time adjusted to Universal Time. For more information on Adobe date formats, see the most current PDF Reference, section 3.8.3.

The only parameter required for metadata indexing is:
indexmode=metadata
Here is an example of an index file created by Metadata indexing:
COMMENT:
COMMENT: Generic Index File Format
COMMENT:
COMMENT:
COMMENT:Code Page of the Index Data
CODEPAGE:1208
COMMENT:Index Field(s)
GROUP_FIELD_NAME:Title
GROUP_FIELD_VALUE:Administrator's Guide
GROUP_FIELD_NAME:Author
GROUP_FIELD_VALUE:IBM
GROUP_FIELD_NAME:Creator
GROUP_FIELD_VALUE:XPP
GROUP_FIELD_NAME:Producer
GROUP_FIELD_VALUE:IBM
GROUP_FIELD_NAME:CreationDate
GROUP_FIELD_VALUE:20090408173745
GROUP_FIELD_NAME:CreationDateTZ
GROUP_FIELD_VALUE:20090408233745
GROUP_FIELD_NAME:ModDate
GROUP_FIELD_VALUE:20090408173745
GROUP_FIELD_NAME:ModDateTZ
GROUP_FIELD_VALUE:20090408233745
COMMENT:Index Offsets and Length
GROUP_OFFSET:0
GROUP_LENGTH:748641
GROUP_PAGES:387
GROUP_FILENAME:\pdf\pdfoutput\admin.pdf
COMMENT:
COMMENT:
COMMENT:
COMMENT:End Generic Indexing File