Normalize Query Input

After loading all the required project information and extracting the CGI input parameters, the first processing step after initialization is to use an input form to normalize the user's query into a single logical structure. An input form defines what kind of search specification a search application is willing to accept and the syntax that it must conform to. Normalizing the input gives Watson™ Explorer Engine a standard base from which to build the possibly very different search specifications required for each of the sources it is going to meta-search. It also helps support multiple ways for the user to input a search, for instance via a single search box (with possibly complex syntax) or via an advanced search form with multiple input fields.

The end goal of the normalization is to create a single logical structure that represents the values of a set of Watson Explorer Engine fields. The target set of these fields is search application dependent, but will typically include a field whose name is query. A Watson Explorer Engine field (represented by a field XML element) should not be confused with the HTML form fields (input as CGI parameters) through which the user has specified the search. The two kinds of field are logically distinct and input field values may be combined or transformed in other ways in the process of filling Watson Explorer Engine fields. The mapping from input fields to Watson Explorer Engine fields is specified by an input form (represented by a form XML element). An input form represents a set of input fields, each of which can accept either free-form text or one of a predefined set of strings. Free-form text fields are defined using an input element and fields with predefined values are defined by a select element. For both input and select elements, the input field is specified by the name attribute of the element and the target Watson Explorer Engine field is represented by the field attribute. If the value of the target Watson Explorer Engine field has a logical structure (commonly the query field with a Boolean structure of ANDs, ORs, NOTs), then this logical structure is made explicit as a tree of operator and term XML elements, with the term elements representing the strings that are the leaves of the tree. Multiple input fields are by default ANDed together. For instance, two input fields called q and title containing the string values, big OR large and travel, respectively, represented by

<param name="q" value="big OR large"/>
  <param name="title" value="travel"/>

would be mapped using an input form containing two input elements

<form name="sample-form">
  <input name="q" field="query"/>
  <input name="title" field="title"/>
  </form>

into

<query form="sample-form">
  <operator name="AND" middle-string="AND" logic="and">
  <term field="title" str="travel" position="1" />
  <operator name="OR" middle-string="OR" logic="or">
  <term field="query" str="big" position="2" />
  <term field="query" str="large" position="3" />
  </operator>
  </operator>
  </query>

Note that the query element in this XML fragment has nothing to do with the query field. Watson Explorer Engine uses a query element to contain the entire normalized input form, regardless of the names of the Watson Explorer Engine fields concerned.

For the next processig step, see Instantiate Source Forms.