Expression syntax for advanced triggers and actions

You can write and edit simple or complex trigger expressions on the Trigger page by using the trigger syntax. This same syntax can be entered in some advanced actions to define a data source.

Trigger expression structure

Trigger expressions can contain an unlimited number of Boolean conditions. Each condition must be enclosed in parentheses to prevent ambiguity, for example:
(Boolean_condition) and not (Boolean_condition)
(Boolean_condition) or (Boolean_condition) or (Boolean_condition)
((Boolean_condition) and (Boolean_condition)) or not (Boolean_condition)

The following Boolean conditions are available: exists, true, numeric, and string.

Important:
  • Content field names are case-sensitive.
  • You must add a backslash ( \ ) as an escape character before occurrences of spaces, quotation marks, and backslashes in content field names.
  • Knowledge base names and category names must be enclosed in single quotation marks.

exists condition

The exists condition returns true when a content field exists in a content item. If a field is defined in the project field definitions but is not in the content item, the exists condition returns false. For multivalue fields, the exists condition can check for the existence of specific values, indicated by the part number (value index).

Usage
$content_field_name exists
exists $content_field_name
exists $content_field_name[part_number]

true condition

The true condition returns true for the specified percentage of time. This condition is useful for building rules that are triggered randomly. The true condition without qualification returns true all of the time.

Usage
true/nn
nn is the percentage of time that the condition returns true.

Numeric conditions

Numeric conditions return true based on the following numeric comparisons:
Equals
numeric_value = numeric_value
Does not equal
numeric_value <> numeric_value
Greater than
numeric_value > numeric_value
Greater than or equal to
numeric_value >= numeric_value
Less than
numeric_value < numeric_value
Less than or equal to
numeric_value <= numeric_value

You can enter numeric values (for example, 3, 0.5, -1.2) or derive numeric values by using operations on other numeric values. The following operations are available: * (multiplication), / (division), + (addition), and - (subtraction).

You can also extract numeric values as follows:
$content_field_name
Extracts a numeric value from a content field of data type: number.
Start of change var[temporary_variable_name] End of change
Start of change Extracts a value from a predefined temporary variable. The variable is deleted after the document is evaluated. End of change
Start of change var[NULL] is predefined as an empty string. End of change
counter[counter_name]
Extracts a numeric value from the specified counter in the decision plan environment.
size($content_field_name)
Extracts the number of values in a multivalue content field.
score('knowledge_base_name',category_name)
Extracts the category score from a knowledge base match operation.
Important: Category scores are extracted as values between 0 and 1. For example, a category score of 80% is rendered as 0.80 in the trigger expression. A trigger that returns true every time a category score exceeds 70% is written as follows:

score('knowledege_base_name' , 'category_name') > 0.70

score('knowledge_base_name',n)
Extracts the score of category from a knowledge base match operation. n represents the order of relevancy; for example, n=3 returns the third highest score.
score(threshold_file_name,category_name)
Extracts the threshold of a category from the specified threshold file. Threshold files are created by using the Threshold Options window in the Knowledge Base Editor; click Tools > Threshold Options.
strlen($content_field_name)
Returns the number of characters in a content field. The function returns 0 when the specified content field is empty or does not exist.

String comparisons

String conditions return true when two equivalent strings are found. You compare two string values by using the is keyword as follows:

string is string

string is one of the following values:
  • A simple string.
  • A string that is extracted from a single value or multivalue content field:
    Single value content field
    $content_field_name (for example, $author extracts John Smith)
    Multivalue content field
    $content_field_name[n]
    n is the part number (value index) of a multivalue content field.
  • A string that is extracted from a predefined variable: var[variable_name].
  • A string that is extracted from the specified counter: counter[counter_name].

The is keyword is used for case-sensitive string comparisons. For string comparisons that are not case-sensitive, use the is_lowcase keyword instead of is.

String searches

String searches return true when a string is found within a content field. Use the contains keyword in the trigger expression as follows:

$content_field_name contains string_constant

Use single quotation marks for case-sensitive searches ('string_constant'). Use the tilde symbol (~string_constant~) for searches that are not case-sensitive.

For multivalue content fields, specify the part number (value index) n as follows:

$content_field_name[n] contains string_constant

Use the contains_field keyword to find instances where the value of a content field or temporary variable is a substring of another content field. For example, the following expression returns true when the value of content_field_2 is found within content_field_1:

$content_field_1 contains_field $content_field_2

The contains_field keyword performs case-sensitive searches. Use the contains_lowcase_field keyword to perform searches that are not case-sensitive.

Tip: Searching with the contains_field keyword and the contains_lowcase_field keyword is slower than searching with the contains keyword. Use the contains keyword when you know the string that you are searching for. Use the contains_field keyword and the contains_lowcase_field keyword only when you do not know the search string in advance.

External string lists

You can search for strings that are contained in a text file by using the stringlist[string_list_name] keyword. Each string must be on a separate line. The text file must meet the following criteria:

Tip: You can create, edit, and access string lists by double-clicking Word and string list files in the Project Explorer panel.

The list of strings is not case-sensitive by default. If you want the strings to match exactly, enclose them in single quotation marks. For example, if you want to find a stand-alone acronym such as FBI, include spaces before and after the acronym and enclose the string in single quotation marks as follows: ' FBI '. Enclosing the string in single quotation marks (or the tilde symbol for searches that are not case-sensitive) ensures that the surrounding spaces are included in the search.

You can define a trigger expression that returns true when strings that are stored in a text file called stringlist_MyStrings.txt are found in a content field as follows:

$content field name contains stringlist[MyStrings]

Restriction: You cannot use an asterisk ( * ) as a wildcard character in strings.

Word search

A word search finds words, phrases, or a combination of words and phrases within one or all content fields by using a colon ( : ) in a trigger expression. Search is performed on text in content fields that IBM® Content Classification has tokenized, that is, text that is divided into words or units known as tokens. A token is a string of characters that does not contain spaces. A word search on tokenized text is different than string comparisons (by using the is keyword) and string searches (by using the contains keyword) that are performed on the original (raw) text.

Start of change You can use word search to find special characters, but it is a good idea to use string search in these cases. Tokenization adds spaces between punctuation marks and words. In general, if you want to find words with punctuation, use the contains keyword to find the string. For example, if you are looking for two adjacent spaces, use a string search instead of a word search. End of change

By default, only complete words are found in word searches. For example, searching for the word ring in a content field does not return true if the content field contains the word spring. To search for a word prefix or suffix, you can use a wildcard character (*).

Usage
$content_field_name : text_conditions

Search for one or more words in all content fields that are defined in the project's field definitions as follows. Fields that might be created by rules in your decision plan are not included.

$__all__ : text_conditions

text_conditions consist of one or more Boolean conditions, for example:

(text) and not (text)

(text or text or text) and not (text)

(text or (text and text)) and (text)

text can be a collection of words, a distance (proximity) operation, a pattern, a word list, a word, or a phrase.

Use single quotation marks for case-sensitive searches (for example, 'word'). Use the tilda symbol for searches that are not case sensitive (for example, ~word~).

External word lists

You can search for words or phrases that are contained in a text file by using the wordlist[word_list_name] keyword. Each word or phrase must be on a separate line. The text file must meet the following criteria:

Tip: You can create, edit, and access word lists by double-clicking Word and string list files in the Project Explorer panel.

For example, you can define a trigger expression that returns true when words that are stored in a text file called wordlist_MyWords.txt are found in a content field as follows:

$content field name : wordlist[MyWords]

Tip: If you want to find phrases with exact spacing, or strings that contain symbols or punctuation marks (for example, M.Sc.) use the contains stringlist keywords rather than : wordlist.

Wildcard characters

You can use an asterisk ( * ) as a wildcard character to find prefixes and suffixes of words in a phrase. For example, 'Uni* *ation*' finds 'United Nations' and 'Unix installation'. Note that an asterisk within a word is not processed as a wildcard character. A search for ho*se does not find house because the internal asterisk is treated as a literal character. Follow these guidelines for using wildcard characters:
  • To find words with the prefix aaa, type: aaa*.
  • To find words with a suffix of aaa, type: *aaa.
  • To find words that include a substring aaa, type: *aaa*.
Restriction: In a word list file, you cannot use an asterisk ( * ) as a wildcard character within a phrase. You can use a wildcard character only at the beginning or end of words and phrases.

Word search based on distance (proximity)

The distance operator d/ measures the proximity of words or phrases in tokenized text. The distance operator returns true when words that are listed before the operator are found within the specified distance n from words that are listed after the operator:

word_condition d/n word_condition

To find words that are in the same sentence: word_condition d/s word_condition

The sequence operator s/ is similar to the distance operator. Use the sequence operator to specify that words that are listed before the operator must precede words that are listed after the operator:

word_condition s/n word_condition

To find words that are in the same sentence: word_condition s/s word_condition

Pattern search

You can search for patterns in documents such as phone numbers or social security numbers by defining a pattern search. Pattern searches can be included in both triggers and actions.

Pattern searches have limited syntax compared to regular expressions. However, patterns can be included in word searches. For example, you can define a trigger that returns true based on the distance between two patterns.

To define a pattern search in a trigger, use the pattern keyword as follows:

$content_field_name : pattern 'pattern'

Use the following symbols to define your pattern search:

\d
Match a digit.
\a
Match an alphabetic character (not alphanumeric).
\s
Match a space.
\z
Match both alphabetic and numeric characters.
+
One or more of the previous character.
*
Zero or more of the previous character.
\+
The plus symbol.
\*
The asterisk symbol.
Important: Add a backslash ( \ ) as an escape character before every backslash in the expression.

For example, to find words in content field X that start with AA and end with three digits, enter:

$X : pattern 'AA\\d\\d\\d'

Category function

Use the cat keyword to extract the name of a category in a knowledge base according to its rank in the match results. The cat keyword is used as follows:

cat('knowledge_base_name',n)

n represents the order of relevancy; for example, n=3 returns the name of the third highest scoring category.

if-then-else construct

The trigger expression syntax supports if-then-else constructs: if (A) then (B) else (C).

This construct evaluates the first expression A. If the first expression is true, the second expression B is returned; otherwise, the third expression C is returned.

Restriction: The second and third expressions B and C must be the same type.

When used in advanced actions, this construct can be useful to extract a variable and ensure that the decision plan will not fail and continue processing. For example, in the following sample expression, the action will succeed even if $Categories does not exist or is empty:

if (size($Categories) > 0) then ($Categories[1]) else ('<None>')

Concatenating strings and content fields

You can concatenate two strings or two content fields that contain strings by using the concat keyword as follows:

'string1' concat $content_field_2'

$content_field_1 concat $content_field_2

The concat keyword works with single and multivalue strings:

Restriction: The concat keyword is used with case-sensitive strings only.

When you concatenate strings in content fields, the resulting string can be set into another content field. This is different than the Combine content fields advanced action which does not create a new string. Combine content fields is used for optimizing word searches across multiple fields only.

Extracting substrings

You can extract a substring by using the substring keyword as follows:

substring(string,position,length)

string
A string or content field ($content_field_name)
position
The starting position in the string (indexed from 1).
length
The length of the substring.

Date calculations and comparisons

You can perform calculations and comparisons on dates and times. This can be useful, for example, to set the expiration date of documents for archiving purposes. Use the date and period keywords in expressions as follows:

date('yyyy-MM-dd HH:mm:ss.S')
Converts a string in canonical date format into a structure that can be used for arithmetic calculations and comparisons.
date('now')
The date and time is taken from the system at run time.
date($content_field)
Extracts the date in canonical format from a content field.
export_date(date_structure,'date_format')
Exports the date in any ICU format. Use this function when you want to extract the date according to a specific format. The date is obtained from a date_structure in the format that is specified by the date_format parameter, for example:
export_date(date('now') + period('1'),'MM-dd-yyyy z HH:mm:ss')
import_date($content_field,'date_format')
Extracts the date from a content field in the format that is specified by the date_format parameter. Use this function when the date is supplied in non-canonical formats.
import_date replaces fdate (still available for backward compatibility).
period('time_period') or period($content_field)
Converts the specified time period into seconds. The time period must be in this format: days hours:minutes:seconds. For example, 2 6:3:35 represents two days, six hours, three minutes, and 35 seconds. The time period can be a string or the value of a content field in the required format.
Tip: Specify the time period in as many time units as necessary for your required level of granularity. For example, to convert 50 hours into seconds, specify: period('0 50'). In this example, it is not necessary to specify placeholders for the minutes and seconds time units.
Examples of date calculations
(date('1980-11-11 00:00:00.000') + period('50')) < import_date('1990-11-11 00:00:00.000', 'yyyy-MM-dd HH:mm:ss.S')
(date($content_field_1) - date($content_field_2)) > period('30')

Splitting content fields

Use the split keyword to split a content field into several parts based on a specified delimiter. This function is typically used in advanced actions.

split($content_field_name,'delimiter')

The delimiter string can include the following markers:
<or>
Searches for more than one delimiter string to split the content field.
<sp>
Searches for any space character, such as \t, \n, and \r.
<n>
Searches for a new line character: \n
<t>
Searches for a tab character: \t
<r>
Searches for a carriage return character: \r
The following examples show how you can use the split keyword with delimiter markers:
split($content_field_name,'<n>')
Splits the specified content field into separate lines.
split($content_field_name,'ZZZ')
Splits the specified content field using ZZZ as the delimiter.
split($content_field_name,'a<or>b<or>c')
Splits the specified content field using a, b, or c as a delimiter.
split($content_field_name,'.<sp>')
Splits the specified content field into sentences by searching for a period that is followed by any space.
split($content_field_name,'.<or><sp>')
Splits the specified content field into words by using any space character or period as delimiters.
For example, you can use the split function in the Set content field advanced action as follows:
  1. Specify a content field that will contain the result of the split function.
  2. In the Copy data from area, select Other data sources.
  3. In the Type box, select Expression to evaluate.
  4. In the Value box, enter the split function, for example: split($ICM_Title,'<sp>')
Start of change

Joining parts of a multivalue content field

Use the join keyword to join parts of a multivalue content field into a single string as follows:

join($content_field, <joiner_string>)

The second parameter specifies the separator between the parts.

End of change