Exploring the Sample Thesaurus

The sample XML thesaurus that you defined as the seed for the ontolection-francis collection uses a generic XML format that makes it easy to associate alternate terms and phrases with a primary term or phrase. The syntax that is supported in this file makes it easy to identify the type of relationships that exist between those terms.

The following code listing shows the francis-thesaurus.xml file that you defined as the seed for the ontolection-francis collection in the previous section:

<?xml version="1.0" encoding="utf-8" ?>
<thesaurus name="automotive" language="english" domain="example">
  <word name="car">
    <synonym>auto</synonym>
    <synonym>truck</synonym>
    <synonym>van</synonym>
  </word>
  <word name="racing">
    <related>races</related>
    <narrower>speedway</narrower>
  </word>
  <word name="horse">
    <related>racing</related>
    <related>track</related>
  </word>
  <word name="detective">
    <synonym>investigator</synonym>
    <related>Tommy Flat</related>
    <related>Ty Roberts</related>
  </word>
</thesaurus>

The basic structure of this file is a <thesaurus> element that contains multiple <word> elements. Each <word> element's name attribute identifies the term with which some number of other terms are associated. Each associated term or phrase for a given <word> element is defined within an XML element inside the scope of that <word> element. The names of the elements that can appear within a <word> element identify the relationship between the term or phrase that the element contains and the name attribute of the parent <word> element.

When using this XML thesaurus format, some examples of the types of elements that can appear within a <word> element are the following:

  • <acronym-synonym> : Encloses a term or phrase that is a possible expansion of an acronym identified in the <word> element's name attribute. The acronym-synonym element is used because each expansion of an acronym is effectively a synonym for that acronym. The following is a simple example:
    <word name ="DNS">
      <acronym-synonym>Domain Name Service</acronym-synonym>
      <acronym-synonym>Domain Name System</acronym-synonym>
      <acronym-synonym>Dragon Naturally Speaking</acronym-synonym>
      <acronym-synonym>Distributed Name Service</acronym-synonym>
    </word>
  • <broader>: Encloses a term or phrase that has broader implications in the same subject matter domain as the term identified in the <word> element's name attribute. The following is a simple example:
    <word name ="convertible">
      <broader>car</broader>
    </word>
  • <language>: specifies an equivalent term for the parent word in another language, where language can be the name of any language. The following is a simple example:
    <word name="car">
      <french>voiture</french>
      <german>Auto</german>
      <spanish>carro</spanish>
      <spanish>coche</spanish>
      <spanish>automóvil</spanish>
    </word>

    The <language> element is a more specific example of the <translation> element, but enables you to activate automatic suggestions for alternate terms in a specific language.

  • <narrower>: Encloses a term or phrase that has more specific implications in the same subject matter domain as the term identified in the <word> element's name attribute. The following is a simple example:
    <word name ="car">
      <narrower>convertible</narrower>
      <narrower>dragster</narrower>
      <narrower>station wagon</narrower>
      <narrower>SUV</narrower>
    </word>
  • <related> : Encloses a term or phrase that is related to but is not interchangeable with the term identified in the <word> element's name attribute. The following is a simple example:
    <word name ="car">
      <related>mileage</related>
      <related>speed</related>
      <related>traffic</related>
    </word>
  • <rewrite> : Encloses a term or phrase that is essentially a correction for the term identified in the <word> element's name attribute. This element is often used to identify the correct version of a term or phrase that is commonly mis-spelled. The following is a simple example:
    <word name ="velcoity">
      <rewrite>velocity</rewrite>
    </word>
  • <spelling> : Encloses a term or phrase that is an alternate spelling for the term identified in the <word> element's name attribute. This element is often used to identify alternate spellings for a given term. This element is commonly used for language variants, such as the US/UK spelling variations file that was used in Adding Conceptual Search to a Search Application. The following is a simple example:
    <word name ="flavor">
      <spelling>flavour</spelling>
    </word>

    Support for <spelling> keywords in conceptual search is not related to the Watson™ Explorer Engine spelling corrector.

  • <synonym> : specifies an equivalent or alternate term that has the same meaning as and is interchangeable with the term identified in the <word> element's name attribute. The following is a simple example:
    <word name ="car">
      <synonym>auto</synonym>
      <synonym>automobile</synonym>
      <synonym>motorcar</synonym>
    </word>
  • <translation> : specifies an equivalent for the parent word in another language. Similar to the <language> element, the <translation> element provides a more generic mechanism for offering alternative terms in other languages without having to specify multiple <language> elements in your automatic or suggested query expansion preferences. (See Modifying Default Expansions and Suggestions for more information about specifying default query expansions for conceptual search.) The following is a simple example of using the <translation> element:
    <word name ="car">
      <translation>voiture</translation>
      <translation>Auto</translation>
      <translation>carro</translation>
      <translation>coche</translation>
      <translation>automóvil</translation>
    </word>

Multiple elements of different types can appear within a single <word> element, as in the following example:

<word name ="car">
  <french>voiture</french>
  <german>Auto</german>
  <narrower>convertible</narrower>
  <narrower>dragster</narrower>
  <narrower>station wagon</narrower>
  <narrower>SUV</narrower>
  <related>mileage</related>
  <related>speed</related>
  <related>traffic</related>
  <spanish>carro</spanish>
  <spanish>coche</spanish>
  <spanish>automóvil</spanish>
  <synonym>auto</synonym>
  <synonym>automobile</synonym>
  <synonym>motorcar</synonym>
</word>

To finish configuring the ontolection-francis collection, you will need to configure some of these relations, as explained in the next section. To proceed to the next section, click Configuring XPaths for Semantic Relations.