An example of a document model
A document model consists of text field definitions and attribute definitions.
You must define one document model for each document format that you intend to index. Here is a simple document model for plain-text structured documents. Note that GPP in the example stands for General Purpose Parser.
<?xml version="1.0"?>
<GPPModel> - the GPP document model begin here
<GPPFieldDefinition - a field definition begins here
name="Head" - the name you assign to this field
start="[head]" - the boundary string at the beginning of the field
end="[/head]" - the boundary string at the end of the field
exclude="YES" />
<GPPFieldDefinition - the next field definition begins here
name="Abstract"
start="[abstract]"
end="[/abstract]"
exclude="NO" />
:
:
</GPPModel>
Document models are specified in the XML language using tags as
defined in Document model reference. A document
model consists of text field definitions and attribute definitions.
The previous example illustrates only text field definitions defined
in GPPFieldDefinition elements. In a similar way,
you can use GPPAttributeDefinition to define document
attributes.
The first line <?xml version="1.0"?> in the
example specifies that the document model is written using XML tags.
Each of the text field definitions specifies boundary strings to identify
the start and end of the field definition
in the source document. So, whenever a document contains the sequence
of characters [head] followed by some text and the
sequence of characters [/head], the text between
those boundary strings is taken to be the content of the text field
that is identified by the name head.
You assign a field name to each field definition. This field name is the means by which a query can restrict search to the content of a text field using a SECTION clause in the CONTAINS function. The name of the field can be either fixed or can be derived by a rule from the structural unit's content. Such a name could be, for example, the tag name of an XML entity, or the name of an XML attribute.