Date fields and custom date formats

The document date is critical for text analytics, especially for exploring how data changes over time and observing deviations and trends.

To preserve how dates are calculated, Watson Explorer Content Analytics includes a parametric index field named Date that you cannot edit or remove. When you configure the parser for a collection, however, you can specify custom date formats to ensure that date data added to the collection is mapped to this index field and indexed correctly.

Date data can be configured for a collection in several ways:
  • If you configure a crawler, you can map data source fields and metadata fields to the Date index field.
  • If you import CSV files to a collection, you can specify the format of date values.
  • If you add documents to a collection by using the REST administration API, the API can identity date fields.
  • If you map HTML and XML elements to index fields, you can map elements to the Date index field.
  • If you configure facets for a collection, you can map the Date index field to facets.
  • If you associate a UIMA annotator with a collection, the annotator can produce date values for the date facet.

Date facet

The Date index field is used as the document date in the query results. The value of this field is converted into a date facet in content analytics collections so that it can be used to compare time lines, deviations, and trends. In search collections, users can use the date facet to narrow results.

The date facet consists of the following path components: date, year, month, day, hour. The levels of the path components cannot be changed.

When the parser detects a date value, it converts the value into epoch time (the number of milliseconds since January 1, 1970, 00:00:00 GMT), such as 1235487600000. The characters in the string are handled as the number of milliseconds since the epoch date.

In addition to the predefined Date index field, you can configure other fields to be used as date fields. In this case, you must specify that the data source field or metadata field is a parametric index field and you must specify that the field contains date data. When the parser detects parametric date fields, the field value is converted to epoch time.

Date formats detected by default

The parser can automatically detect the following date and time formats, in the order specified in this table. In addition to these formats, you can configure the parser to recognize custom date formats for the content that you include in a collection.

Table 1. Automatically detected date formats
Date format Sample value
RFC 1123 Sun, 06 Nov 1994 08:49:37 GMT
RFC 850 Sunday, 06-Nov-94 08:49:37 GMT
asctime Sun Nov 6 08:49:37 1994
ISO8601. Only the calendar date is supported. Unsupported representations are:
  • Representations with reduced precision
  • Truncated representations
  • Expanded representations
  • Representation of decimal fractions
  • Representation with zone designator
2004-02-05
RFC 1123 without timezone Sun, 06 Nov 1994 08:49:37
RFC 850 without timezone Sunday, 06-Nov-94 08:49:37
Date and time format for the collection's default local. Obtained through the the Java DateFormat.getDateInstance() class:
  • DateFormat.getDateTimeInstance()
  • DateFormat.getDateInstance(DateFormat.FULL)
  • DateFormat.getDateInstance(DateFormat.LONG)
  • DateFormat.getDateInstance(DateFormat.MEDIUM)
  • DateFormat.getDateInstance(DateFormat.SHORT)
 

Custom date formats

When you configure parse and index options for a collection, you can specify custom date formats to ensure that date data that you include in the collection is indexed correctly. The parser tests your custom date formats (following the order that you specify) to parse date values, and then tests the default date formats. The first value that is successfully parsed is used as the date.

When you configure custom date formats, you specify:
  • The format string, such as EEE, d MMM yyyy HH:mm:ss Z (for example, Wed, 4 Jul 2001 12:08:56 -0700). The string can be in any format supported by the Java SimpleDateFormat class.
  • The locale and time zone for the date. The collection locale and time zone are selected by default.
  • The order in which your custom date formats are to be applied. After you add a new custom format, you can move it to first, last, or any position in the list.

Your custom date formats apply to all date content that is added or configured for a collection. For your changes to become effective, you must restart the parser. To apply the changes to documents in the index, either rebuild the index or, if the collection uses a document cache, rebuild the index from the cache.

Displaying dates in the query results

You can use any of the following methods to control how date fields are displayed in the query results:
  • Edit the properties file for the application, such as the config.properties file for the enterprise search application. In the date.fields property, specify a space-separated list of the fields that are to be formatted like date data in the query results. The format of the displayed date matches the locale settings in the Web browser.
  • Run the application customizer, expand the Results tab, and include the names of fields that are to be formatted as dates in the Date fields field. The format of the displayed date matches the locale settings in the Web browser.