Regex for well-structured logs
Well-structured logs are a style of event formatting that is composed of a set of properties and are presented in the following way:
<name_of_property_1><assignment_character>
<value_of_property_1><delimiter_character>
<name_of_property_2><assignment_character>
<value_of_property_2><delimiter_character>
<name_of_property_3><assignment_character>
<value_of_property_3><delimiter_character>...
Use the following general guidelines:
- The <assignment_character> either '=' or ':' or a multi-character sequence such as '->'.
- The <delimiter_character> either a white space character (space or tab) or a list delimiter, such as a comma or semi-colon.
- The <value_of_property> and sometimes <name_of_property> are encapsulated in quotation marks or other wrapping characters.
For example, consider a simple login event that is generated by a device or an application. The
device might report on the account of a user who logged in, the time the login occurred, and the IP
address of the computer from which the user logged in. A name/value pair-style event might look like
this
snippet:
<13>Sep 09 22:40:40 192.0.2.12 action=login accountname=JohnDoe clientIP=192.0.2.24
timestamp=01/09/2016 22:40:39 UTC
Note: The
string "<13>Sep 09 22:40:40 192.0.2.12" is a syslog header. The string is not part
of the event body.
The following table shows how the properties of the well-structured log example above, can be captured:
Property | Regex |
---|---|
action | action=(.*?)\t |
accountname | accountname=(.*?)\t |
clientIP | clientIP=(.*?)\t |
timestamp | timestamp=(.*?)\t |
The patterns that are enclosed within the brackets denote the capture group. Each regex in the table captures everything after the equal sign (=) and before the next tab character.