Regex for well-structured logs

Well-structured logs are a style of event formatting that is composed of a set of properties and are presented in the following way:


Use the following general guidelines:

  • The <assignment_character> either '=' or ':' or a multi-character sequence such as '->'.
  • The <delimiter_character> either a white space character (space or tab) or a list delimiter, such as a comma or semi-colon.
  • The <value_of_property> and sometimes <name_of_property> are encapsulated in quotation marks or other wrapping characters.
For example, consider a simple login event that is generated by a device or an application. The device might report on the account of a user who logged in, the time the login occurred, and the IP address of the computer from which the user logged in. A name/value pair-style event might look like this snippet:
<13>Sep 09 22:40:40 action=login  accountname=JohnDoe  clientIP=  
timestamp=01/09/2016 22:40:39 UTC
Note: The string "<13>Sep 09 22:40:40" is a syslog header. The string is not part of the event body.

The following table shows how the properties of the well-structured log example above, can be captured:

Table 1. Regex for capturing properties of a well-structured log
Property Regex
action action=(.*?)\t
accountname accountname=(.*?)\t
clientIP clientIP=(.*?)\t
timestamp timestamp=(.*?)\t

The patterns that are enclosed within the brackets denote the capture group. Each regex in the table captures everything after the equal sign (=) and before the next tab character.