viv:parse-date

parses a string and extracts a date from it

Synopsis

string
viv:parse-date
(date-str, format, timezone, range);
string date-str;
string format;
string timezone;
boolean range;

Arguments

  • date-str: the date string to parse.
  • format: a format string identifying the format of the date supplied as the first argument. If not supplied, the function will try to apply the date ranges listed below.
  • timezone: a string identifying the time zone to use. This can be either a numeric time zone offset or a time zone abbreviation, as listed below. If not specified, the default value for timezone is 'localtime'.
  • range: a Boolean value that, when true, specifies the valid date range as being a 32 bit value. The default value is false(), which does not restrict the date range to 32-bit values. When the 32 bit check is enabled, both dates outside the 32-bit range and any other invalid date will return a value of '0'.

Description

Attempt to parse a string and extract a date from it. The value returned by this function is the extracted date represented as the number of seconds since 00:00:00 on January 1, 1970, Coordinated Universal Time (UTC).

The optional format string specifies how the date should be parsed. The format specifiers are intended to be compatible with PHP or C strftime() format strings. This function supports the following codes:

  • %a or %A - The day-of-the-week name, either abbreviated (e.g. "Mon") or full (e.g. "Monday").
  • %b, %B, or %h - The month name, either abbreviated (e.g. "Jan") or full (e.g. "January").
  • %c - A shortcut for the time and date in %a %b %e %H:%M:%S %Y format, e.g. Mon Jul 7 15:30:45 2007.
  • %C - The century (00-99).
  • %d or %e - The day of the month (1-31), with or without leading zeros.
  • %D - A shortcut for the date in %m/%d/%y form, e.g. 03/19/06.
  • %F - A shortcut for the date in %Y-%m-%d form, e.g. 2004-09-25.
  • %H - The hour in 24-hour format (00-23), as a 2-digit number with leading zeros.
  • %I - The hour in 12-hour format (01-12), as a 2-digit number with leading zeros. Requires %p, the "AM/PM" specifier, to be included in the format string.
  • %j - The day number in the year (1-366), with or without leading zeros.
  • %k - The hour in 24-hour format (0-23), as a 1- or 2-digit number without leading zeros.
  • %l - The hour in 12-hour format (1-12), as a 1- or 2-digit number without leading zeros. Requires %p, the "AM/PM" specifier, to be included in the format string.
  • %m - The month number (1-12), with or without leading zeros.
  • %M - The minute (0-59), with or without leading zeros.
  • %n - The newline character.
  • %p - "AM" or "PM", or "am" or "pm". Required when using 12-hour format.
  • %r - A shortcut for the time in 12-hour format: %I:%M:%S %p, e.g. 10:40:22 PM.
  • %R - A shortcut for the 24-hour time without seconds: %H:%M.
  • %s - The number of seconds since the Unix Epoch (00:00:00 UTC on 1 January 1970).
  • %S - The seconds (0-61), with or without leading zeros.
  • %t - The tab character.
  • %T - A shortcut for the 24-hour time with seconds: %H:%M:%S, e.g. 23:59:59.
  • %u - The weekday number (1-7),, with or without leading zeros, where 1 is Monday.
  • %U - The week number in the year (0-53), with or without leading zeros, where Sunday is the first day of the week.
  • %w - The weekday number (0-6), with or without leading zeros, where 0 is Sunday.
  • %W - The week number in the year (0-53), with or without leading zeros, where Monday is the first day of the week.
  • %y - The 2-digit year, with or without leading zeros, where 69-99 refer to 1969-1999 and 00-68 refer to 2000-2068.
  • %Y - The 4-digit year.
  • %Z - The time zone. This may be specified either as a standard abbreviation (listed below), or as a signed offset in hours (and optionally minutes).
  • %% - The % character.

In the format string, the space character matches any number of whitespace characters.

Numeric time zone offsets may be in formats such as +0200, -03:30, +12, or -5. Supported time zone abbreviations are: GMT, UT, UTC, Z, WET, WEST, BST, ART, BRT, BRST, NST, NDT, AST, ADT, CLT, CLST, EST, EDT, CST, CDT, MST, MDT, PST, PDT, AKST, AKDT, HST, HAST, HADT, SST, WAT, CET, CEST, MET, MEZ, MEST, MESZ, EET, EEST, CAT, SAST, EAT, MSK, MSD, IST, SGT, KST, JST, GST, NZST, and NZDT.

The default value for the format parameter is the empty string, '""'. If a format string is not specified, the following date formats will be automatically recognized:

  • RFC 2616 3.3.1 (HTTP protocol) which supports:
    • Sun, 06 Nov 1994 08:49:37 GMT
    • Sunday, 06-Nov-94 08:49:37 GMT
    • Sun Nov 6 08:49:37 1994
  • Dates without week day name:
    • 06 Nov 1994 08:49:37 GMT
    • 06-Nov-94 08:49:37 GMT
    • Nov 6 08:49:37 1994
  • Dates without the time zone:
    • 06 Nov 1994 08:49:37
    • 06-Nov-94 08:49:37
    • 1994-11-06 08:49:37 PM
    • 1994-11-06 20:49:37
    • 1994/11/06 08:49:37 PM
    • 1994/11/06 20:49:37
    • 11-06-1994 08:49:37 PM
    • 11-06-1994 20:49:37 PM
    • 11/06/1994 08:49:37 PM
    • 11/06/1994 20:49:37
    • 19941106204937
    • 1994-11-06T20:49:37
  • Dates in a weird order:
    • 1994 Nov 6 08:49:37
    • GMT 08:49:37 06-Nov-94 Sunday
    • 94 6 Nov 08:49:37
  • Dates without times:
    • 1994 Nov 6
    • 06-Nov-94
    • Sun Nov 6 94
    • 1994/11/06
    • 1994-11-06
    • 19941106
    • 11/06/1994
    • 11-06-1994
  • Dates with unusual separators:
    • 1994.Nov.6
    • Sun/Nov/6/94/GMT
  • Time zones specified using RFC 822 style:
    • Sun, 12 Sep 2004 15:05:58 -0700
    • Sat, 11 Sep 2004 21:32:11 +0200
  • Compact numerical date strings:
    • 20040912 15:05:58 -0700
    • 20040911 +0200

Returns

A number of seconds since 00:00:00 on January 1, 1970, Coordinated Universal Time (UTC) or 0 if the input argument could not be parsed.

Example

Input Example:

            <process-xsl>
<xsl:variable name="datesec" select="viv:parse-date('06-Nov-94')" />
<datesec><xsl:value-of select="$datesec" /></datesec>
<date><xsl:value-of select="viv:seconds-to-local-date-time($datesec)" /></date>
<timezone><xsl:value-of select="viv:time-zone-name()" /></timezone>
</process-xsl>

Output Example:

            <datesec>784080000</datesec>
            <date>1994-11-05T20:00:00-04:00</date>
            <timezone>EDT</timezone>

Known Issues

The viv:parse-date() function only supports 32-bit dates on 32-bit Linux systems.

Notes

Fast-Indexing and the Date Type

When fast-indexing a field that contains a date, Watson™ Explorer Engine provides a date type to automate conversion from time values expressed as strings. Fields in a fast index that are declared as being of the date type automatically process their content using viv:parse-date. For this reason, you should not attempt to use viv:parse-date to process fast-indexed variables of type date.

As an example, assume that you had the following content:

    <document xmlns:xi="http://www.w3.org/2003/XInclude">
      <content name="last-update">Thu, 29 Mar 2007 15:20:01 +0100</content>
    </document>

If you fast-index the last-update field as a date (last-update|date), the indexer will automatically fast-index the result of viv:parse-date(content[@name="last-update"]), in which case the value that would be fast indexed would be an integer (1175178001). By default, when used within fast-indexing, viv:parse-date restricts date values to 32-bits. To fast-index dates outside this range, you can use viv:parse-date manually and fast-index the resulting value as an integer. Any searches on this field would have to be expressed as integer values (or converted into integer values before performing the search), so that the comparison would work correctly.