Example queries

View example queries that use search patterns and regular expressions to search for entries.

Search patterns/regular expressions

The search patterns and regular expressions that you can use on a field depend on the Data Type of the fieldName. Check the IndexConfiguration to determine the Data Type of the fieldName that you want to query. The following table explains the search patterns that can be used for each Data Type.

IndexConfiguration Data Type

IndexConfiguration

settings

Internal representation Remarks Search Pattern / Solr RegEx supported
TEXT

Searchable and (sortable and/or filterable)

string String stores a word/sentence as an exact string without performing tokenization etc.
Commonly used for storing exact matches, for example, for faceting.

<fieldName>:/<Apache solr regEx>/

OR

<fieldName>:<search string>

See Example 1 below.

TEXT Searchable text_general Text typically performs tokenization, and secondary processing (such as lower-casing etc.).
Used for all scenarios to match part of a sentence.

<fieldName>:<search string>

See Example 2 below.

DOUBLE Searchable tdouble

Square brackets indicate an inclusive range query where matching values include the upper and lower bound.

Curly brackets indicate an exclusive range query where matching values are between the upper and lower bounds. This excludes the upper and lower bounds themselves.

Using square brackets and curly brackets both types where one end of the range is exclusive and the other is exclusive

<fieldName>:[ * TO *]

<fieldName>:{ * TO * }

<fieldName>:{ * TO * ]

See Example 3 below.

LONG Searchable tlong As described in DOUBLE Data type in the row above.

<fieldName>:[ * TO *]

<fieldName>:{ * TO * }

<fieldName>:{ * TO * ]

DATE Searchable tdate As described in DOUBLE Data type in the row above.

<fieldName>:[ * TO *]

<fieldName>:[ * TO * }

Example:

OrderDate: [2015-11-23T00:00:00Z TO 2016-11-24T00:00:00Z}

Specifying a fieldName

The logRecord field is the value of the actual log record index. It cannot be used in Apache Solr RegEx expressions. logRecord is always defined as ‘text-general’, and its value are tokenised. If fieldName is not specified in the query, then logRecord is used for the field name by default. For example, the query
+"Transaction id 1234" +"error code 456"
is equivalent to the query
+logRecord:"Transaction id 1234" +logRecord:"error code 456"

It is therefore important that you specify a fieldName in your query.

Example 1 (TEXT DataType, sortable and/or filterable)

Case1: Using the Solr RegEx {fieldName}:/{Apache Solr RegEx Expr}/.

In this example, you want to find instances of the error code 6789 in the SUMMARY field that have a response time of more than 5 seconds, as in this entry:
fieldName:SUMMARY
fieldContents: "Transaction 12345 has failed with response time of 10 seconds and error code of 6789."

The syntax for querying using regular expressions is: {fieldName}:/{Apache Solr RegEx Expr}/

The query for this example is:
SUMMARY:/.* ([6-9]|[1-9][0-9]) seconds.*6789\./

This query specifies that the fieldName SUMMARY is to be searched, and the Solr RegEx that must be matched in this field. The Solr RegEx specifies that it must be able to find a single digit integer in the range 6-9 OR two digits that range from 10-99, immediately followed by the character sequence ' seconds', and then a series of characters (.*) that ends with the character sequence '6789' (with escape \ for dot)

Alternatively, the numerical range feature operator (<>) can be used:
SUMMARY:/.* <6-99> seconds.*6789\./

Case2: Using a regular Solr search pattern {fieldName}:{search string}.

In this example, you want to find instances where:
  • the SUMMARY field contains error code 6789.
  • the HOSTNAME field contains myhost.ibm.com.
  • the USER field does not contain sysadmin.
as in this entry:
fieldName:SUMMARY
fieldContents: "Transaction 12345 has failed with response time of 10 seconds and error code 6789."

fieldname:USER
filedContents: myhost.ibm.com
filedContents: myhost.ibm.com
filedContents: remotehost.ibm.com

fieldname:USER
filedContents:sysadmin
filedContents:user1
filedContents:user2
…
The query for this example is:
+SUMMARY: "*error code 6789*" +HOSTNAME:myhost.ibm.com -USER:sysadmin
or
+SUMMARY: "*error code 6789*" AND HOSTNAME:myhost.ibm.com NOT USER:sysadmin

Example 2 (TEXT DataType, without sortable and/or filterable)

In this example, you want to find instances where:
  • the SUMMARY field contains 6789.
  • the SUMMARY field contains Transaction 12345.
  • the USER field does not contain sysadmin.
such as in this entry:
fieldName:SUMMARY
fieldContents: "Transaction 12345 has failed with response time of 10 seconds and error code of 6789."

fieldName:User
filedContents:sysadmin
filedContents:user1
filedContents:user2
...
The query for this example is:
+SUMMARY:"error code 6789" +SUMMARY="Transaction 12345" -User:sysadmin
As the text gets tokenized, multiple AND class should be used for the SUMMARY field.

Note: Similar queries can be performed on fields with a TEXT DataType that are sortable and/or filterable.

Example 3 (Numeric DataTypes)

This is an example of range searches for a numeric (double or long) data type on the following entry.

fieldName:SerialNum
fieldContents:1
fieldContents:2
fieldContents:3
...
fieldContents:11
fieldContents:12

The query + SerialNum:[3 TO 10] returns records with a value from 3-10.

The query + SerialNum:{3 TO 10] returns records with a value from 4-10.

The query + SerialNum:{3 TO 10} returns records with a value from 4-9.

About Apache Solr regular expressions

Allowed characters

Any unicode characters can be used in Solr RegEx, but certain characters are reserved. The reserved characters are: . ? + * | { } [ ] ( ) " \.

If you enable optional features, (see below), then these characters may also be reserved: # @ & < > ~.

Any reserved characters must be escaped with a backslash (\), including backslash (\\).

Characters are interpreted 'literally' when they are surrounded by double quotes, (except double quotes itself). For example, loganalysis"@developer.com".

The following regular expression are not supported by Solr RegEx:

  • \w word
  • \b word
  • \d digit
  • \s whitespace
  • ^ start of string
  • $ end of string
  • \t tab
  • \n newline
  • \r carriage return

The following table provides information on the operators that can be used in Solr RegEx.

Operators for Solr RegEx

Examples

Match any character.
Use period (.) to represent any character.

For the string "loganalysis":
  • logana..... #matches
  • .og…l.sis #matches
Match one or more.
Use plus sign (+) to match preceding shortest pattern one or more times.
For the string "sssooolllrrr":
  • s+o+l+r+ # match
  • ss+oo+ll+rr+ # match
  • s+.+ # match
  • ss+oooo+ # no match
Match zero or more. Use asterisk (*) to match preceding shortest pattern zero or more times.

For the string "mmmnnn":

  • m*n* # match
  • m*n*o* # match
  • .*nnn.* # match
  • mmm*nnn* # match
Match zero-or-one.
Use question mark "?" to match preceding pattern zero or one time.

For the string "yyyzzz":

  • yyy?zzz? # match
  • yyy?zzzz? # match
  • .....?.? # match
  • yy?zz? # no match
Specify min to max.
Use curly brackets ({}) to specify a minimum and (optionally) a maximum number of times that the preceding shortest pattern can repeat.
The allowed forms are:
  1. {4} # repeat exactly 4 times
  2. {3,6} # repeat at least three times, and at most 6 times
  3. {2,} # repeat at least twice

For the string "aaabbb":

  • a{3}b{3} # match
  • a{2,4}b{2,4} # match
  • a{2,}b{2,} # match
  • .{3}.{3} # match
  • a{4}b{4} # no match
  • a{4,6}b{4,6} # no match
  • a{4,}b{4,} # no match

Grouping.
Use parentheses (()) to form sub-patterns.

The quantity operators ({}) listed above operate on the shortest previous pattern, which can be a group.

For the string "ababab":
  • (ab)+ # match
  • ab(ab)+ # match
  • (..)+ # match
  • (...)+ # no match
  • (ab)* # match
  • abab(ab)? # match
  • ab(ab)? # no match
  • (ab){3} # match
  • (ab){1,2} # no match

Alternation.
Use the pipe symbol (|) as an OR operator.

The match will succeed if the pattern on either the left-hand side OR the right-hand side matches.

The alternation applies to the longest pattern, not the shortest.

For the string "aabb":
  • aabb|bbaa # match
  • aacc|bb # no match
  • aa(cc|bb) # match
  • a+|b+ # no match
  • a+b+|b+a+ # match
  • a+(b|c)+ # match

Character classes.
Ranges of potential characters can be represented as character classes by enclosing them in square brackets ([]).

A leading ^ negates the character class. The allowed forms are:

  • [abc] # 'a' or 'b' or 'c'
  • [a-c] # 'a' or 'b' or 'c'
  • [-abc] # '-' or 'a' or 'b' or 'c'
  • [abc\-] # '-' or 'a' or 'b' or 'c'
  • [^abc] # any character except 'a' or 'b' or 'c'
  • [^a-c]# any character except 'a' or 'b' or 'c'
  • [^-abc] # any character except '-' or 'a' or 'b' or 'c'
  • [^abc\-] # any character except 'a' or 'b' or 'c' or '-'
For the string "abcd":
  • ab[cd]+ # match
  • [a-d]+ # match
  • [^a-d]+ # no match
Optional Feature for Solr RegEx

Example

For complement: Use tilde(~) to negate the shortest pattern next to it.

For instance, "ab~cd" means:

Starts with a

Followed by b

Followed by a string of any length that it anything but c

Ends with d

For the string "abcdef":
  • ab~df # match
  • ab~cf # match
  • ab~cdef # no match
  • a~(cb)def # match
  • a~(bc)def # no match

For interval: Use angle brackets (<>) to specify the numeric range

For the string: "solr90":
  • solr<1-100> # match
  • solr<01-100> # match
  • solr<001-100> # no match

For any string: Use @ to match any string in its entirety. This could be combined with the intersection and complement above to express "everything except".

For example: @&~(solr.+) # anything except string beginning with "solr"

 

Apache Solr is an open source product, and the following notes are for guidance only. For the full set of RegEx features that are compatible with Apache Solr, and for additional help and support, please see: