The following is a description of the process followed by the default Data Explorer Engine
application (the query-meta application script):
User Query: The user enters a query in an HTML form. Submission of this form
executes a Data Explorer Engine CGI script with a list of CGI parameters.
These parameters are passed to the Data Explorer object.
Global Options: Some CGI parameters are used to set global variables and
options, such as the number of documents requested (query.num-total), the
list of sources (query.sources), or the timeout (fetch.timeout).
Input Query Conversion: Using the input form description (see the
render.form variable) of the input query syntax, Data Explorer Engine converts
this list of parameters into a structured, normalized query representation,
consisting of logical operators and fields.
Source Specifications: Each source specified in the input query must be
defined somewhere (this is almost always in the System Configuration). Also
required are the forms and parsers associated with those sources. Each search
engine collection that you create is initially associated with a source of the same
name.
Per-Source Query Conversion: Data Explorer Engine converts the structured
query representation back into a list of CGI parameters for each source, attempting
with each of the forms specified in the source until it finds one that supports the
syntax of the query. If none are found for a particular source, that source cannot be
queried and the error is reported to the user. For each successfully converted query, a
series of corresponding URLs (the result pages) are enqueued for retrieval.
Fetching: The enqueued result page URLs are fetched simultaneously.
Parsing: Each result page is parsed according to the parser associated with
its source. Documents are extracted and passed to the Data Explorer object,
and include contents (title, snippet, etc.) and attributes (URL, score, etc.).
Clustering: The extracted documents are indexed and clustered. The resulting tree
is saved in a System Configuration (i.e., the
Data Explorer object is serialized into XML).
Display: For each user interaction with the clustered output, Data Explorer Engine
reloads the temporary file (indicated by the CGI parameters CGI
parameter) and extracts from it a browsing state, i.e., some XML with only the nodes and
documents that need to be presented to the end user (that XML can be viewed by adding CGI parameters to the
URL).