Processing PDF input files with the graphical indexer

This section describes how to use the graphical indexer to create indexing information for a PDF input file.

Restriction: The 32-bit OnDemand Administrator client fully supports working with PDF input files in the graphical indexer. PDF input files are not supported in the graphical indexer when using the 64-bit OnDemand Administrator client.
If you plan to use the Report Wizard or the graphical indexer to process PDF input files, then you must first install Adobe® Acrobat® on the workstation from which you plan to run the OnDemand Administrator client.
Important: Content Manager OnDemand provides the ARSPDF32.API file to enable PDF viewing from the client. If you install the client after you install Adobe Acrobat, then the installation program will copy the API file to the Acrobat plug-in directory. If you install the client before you install Adobe Acrobat, then you must copy the API file to the Acrobat plug-in directory. Also, if you upgrade to a new version of Acrobat, then you must copy the API file to the new Acrobat plug-in directory. The default location of the API file is C:\Program Files (x86)\IBM\OnDemand Clients\V10.1\PDF. The default Acrobat plug-in directory is C:\Program Files (x86)\Adobe\Acrobat x.y\Acrobat\plug_ins, where x.y is the version of Acrobat, such as 10.0 or 11.0 or 2016.

You can define indexing information in a visual environment. You begin by opening a sample input file with the graphical indexer.

You can run the graphical indexer from the Report Wizard or by choosing the sample data option from the Indexer Information tab of the application. After you open an input file in the graphical indexer, you define triggers, fields, and indexes. The PDF indexer uses the triggers, fields, and indexes to locate the beginning of a document in the input data and extract index values from the input data. Once you have defined the triggers, fields, and indexes, you can save them in the application so that Content Manager OnDemand can use them later on to process the input files that you load into the system.

You define a trigger, field, or index by drawing a box around a text string with the mouse and then specifying properties. For example, to define a trigger that identifies the beginning of a document, you could draw a box around the text string Account Number on the first page of a statement in the input file. Then, on the Add a Trigger dialog box, you would accept the default values provided, such as the location of the text string on the page. When processing an input file, the PDF indexer attempts to locate the specified string in the specified location. When a match occurs, the PDF indexer knows that it has found the beginning of a document. The fields and indexes are based on the location of the trigger.

The PDF file that you open with the graphical indexer should contain a representative sample of the type of input data that you plan to load into the system. For example, the sample input file must contain at least one document. A good sample should contain several documents so that you can verify the location of the triggers, fields, and indexes on more than one document. The sample input file must contain the information that you need to identify the beginning of a document in the input file. The sample input file should also contain the information that you need to define the indexes. When you load an input file into the system, the PDF indexer will use the indexing information that you create to locate and extract index values for each document in the input file.

The following example describes how to use the graphical indexer from the report wizard to create indexing information for an input file. The indexing information consists of a trigger that uniquely identifies the beginning of a document in the input file and the fields and indexes for each document.

  1. To begin, start the administrative client.
  2. Log on to a server.
  3. Start the report wizard by clicking the Report Wizard button on the toolbar. The report wizard opens the Sample Data dialog box.
  4. Click Select Sample Data to open the Open dialog box.
    Restriction: For IBM® i users: The PDF Indexer can process only stream files when running on IBM i. PDF spooled files are not supported.
  5. Type the name or full path name of a file in the space provided or use the Look in or Browse commands to locate a file.
  6. Click Open. The graphical indexer opens the input file in the report window.
  7. Press F1 at any time for assistance with using the graphical indexer.
  8. Define a trigger.
    • Find a text string that uniquely identifies the beginning of a document. For example, Account Number, Invoice Number, Customer Name, and so forth.
    • Using the mouse, draw a box around the text string. Start just outside of the upper left corner of the string. Click and hold mouse button one. Drag the mouse towards the lower right corner of the string. As you drag the mouse, the graphical indexer uses a dotted line to draw a box. When you have enclosed the text string completely inside of a box, release the mouse button. The graphical indexer highlights the text string inside of a box.
    • Click the Define a Trigger button on the toolbar to open the Add a Trigger dialog box. Verify the attributes of the trigger. For example, the text string that you selected in the report window should be displayed under Value; for Trigger1, the Pages to Search should be set to Every Page. Click Help for assistance with the other options and values that you can specify.
    • Click OK to define the trigger.
    • To verify that the trigger uniquely identifies the beginning of a document, first put the report window in display mode. Then click the Select tool to open the Select dialog box. Under Triggers, select the trigger. The graphical indexer highlights the text string in the current document. Select the trigger again. The graphical indexer should highlight the text string on the first page of the next document. Use the Select dialog box to move forward to the first page of each document and return to the first document in the input file.
    • Put the report window in add mode.
  9. Define a field and an index.
    • Find a text string that can be used to identify the location of the field. The text string should contain a sample index value. For example, if you want to extract account number values from the input file, then find where the account number is printed on the page.
    • Using the mouse, draw a box around the text string. Start just outside of the upper left corner of the string. Click and hold mouse button one. Drag the mouse towards the lower right corner of the string. As you drag the mouse, the graphical indexer uses a dotted line to draw a box. When you have enclosed the text string completely inside of a box, release the mouse button. The graphical indexer highlights the text string inside of a box.
    • Click the Define a Field button on the toolbar to open the Add a Field dialog box.
    • On the Field Information page, verify the attributes of the index field. For example, the text string that you selected in the report window should be displayed under Reference String; the Trigger should identify the trigger on which the field is based. Click Help for assistance with the options and values that you can specify.
    • On the Database Field Attributes page, verify the attributes of the database field. In the Database Field Name space, enter the name of the application group field into which you want Content Manager OnDemand to store the index value. In the Folder Field Name space, enter the name of the folder field that will appear on the client search screen. Click Help for assistance with the other options and values that you can specify.
    • Click OK to define the field and index.
    • To verify the locations of the fields, first put the report window in display mode. The fields should have a blue box drawn around them. Next, click the Select tool to open the Select dialog box. Under Fields, click Field 1. The graphical indexer highlights the text string in the current document. Select Field 1 again. The graphical indexer should move to the next document and highlight the text string. Use the Select dialog box to move forward to the each document and display the field. Then return to the first document in the input file.
    • Put the report window in add mode.
  10. Click the Create Indexer Parameters and Fields Summary toolbar button. Use the Create Indexer Parameters and Fields Summary dialog box to create and view a summary of the indexing parameters and field values.
  11. When you have finished defining all of the triggers, fields, and indexes, close the report window.
  12. Click Yes to save the changes to the indexer parameters.
  13. On the Sample Data window, click Next to continue with the report wizard.