Expanded Discussion on Metadata and Use

This section is a discussion of metadata beyond the scope of the learning material presented in the tutorial. Technical details, expanded scope and optional methods are discussed.

Traditional examples of metadata are file permissions, user and group ownership, and so on. For documents, web pages, and other online information, metadata more commonly refers to information such as the audience for a specific document, its author, creation and modification dates, and so on. HTML web pages support different types of fixed metadata through attributes of the <META> element such as keywords, author, copyright, and rating. These types of traditional metadata are either associated with a specific online resource (file permissions, file ownerships, and so on) or are manually inserted in online information when it is being created (using <META> tags).

Search engines, such as the one that is included in Watson™ Explorer Engine, preprocess the information in various online resources in order to expedite returning results for your queries. They do this by identifying documents and other online resources, exploring those documents (known as crawling), and returning their content for indexing. When you submit a query, the search engine can provide faster results by consulting the index rather than having to re-retrieve the information in the online resource that you are searching.

Metadata in online information such as web pages is designed to be extracted when that information is being crawled and indexed by a search engine. Beyond simply using the fixed metadata provided by <META> tags, a Watson Explorer Engine application can extract, create, and add metadata during the crawling process and use it during indexing without requiring changes to the original document. This can provide a substantially richer pool of information against which queries can be submitted. In Watson Explorer Engine applications, this metadata can also be used to refine queries by enabling you to sort search results in various ways, easily select subsets of those results (known as filtering), and more easily interact with search results by using familiar links and graphical selectors to view search results matching certain criteria or ranges of metadata values (known as binning).

This tutorial showed you how to identify and extract data and create metadata from the material that you are indexing when crawling a set of files. The same methodology should be used if the content comes from some other source. As an example, the files in this tutorial can easily be converted to be a web-site (see the Exporting Sample Files From Your Web Server for details) and the same results obtained.

Note: Some converters can be configured to automatically extract metadata content if the incoming data is of an expected type. See the tool-tips on the specific converter to identify which content can be automatically extracted. However, it is important to remember that the method of metadata extraction described in this tutorial (custom XSL converter) can be used to extract metadata from any content.