Example script for email parsing

The SOAR Platform contains a Python 3 script called Sample script: process inbound email (v49.0). The script uses the following process to parse email message objects:

If an existing incident exists whose title reflects the email message that was received, the script associates the email message with the existing incident.
If the script does not find an existing incident, it takes the following actions.
- Creates an incident with a suitable title.
- Associates the email message with the new incident.
- Adds the email message's subject as an artifact to the new incident.
- Sets the incident's reporter field to be the email address that sent the message.
Parses the email body text for URLs, IP addresses, and file hashes. After the script filters out invalid and allowlist values, it adds the remaining data to the incident as artifacts.
Adds non-inline email message attachments to the incident.

To run the script, you must have a mailbox connection to retrieve email messages from an email server, and the rule that calls it must have Email Message as the Object Type.

Configuring the script

To configure the SOAR Platform Sample script: process inbound email (v49.0) script, you need to add an incident owner. You can also define one or more allowlists.

New incidents need an owner, which is an individual who is identified by the email address or a group name. In the provided script, the value is left blank. To edit the script to add a user as the owner, locate and edit line 8 of the script. For example, add L1@businessname.com as follows:

# The new incident owner - email address of a user or name of a group and cannot be blank.
# Change this value to reflect who will be the owner of the incident before running the script.
newIncidentOwner = "L1@businessname.com"

An allowlist is a list of trustworthy data items that would not normally become suspicious artifacts; for example, your own email server's IP address. The two categories of allowlist that are used in the script are IP address and URL domain, as shown in the following table. These allowlists are configured by altering data in the script.

Variable Name	Line Number	Purpose
`ipV4AllowList`	12	IP v4 allowlist
`ipV6AllowList`	31	IP v6 allowlist
`domainAllowList`	52	URL domain allowlist

Initially, the allowlists are composed of commented-out entries that serve as examples of the data you might choose to exclude from consideration. The allowlists have no effect unless you uncomment the entries and make a grammatically correct list, or add entries of your own.

The IP address allowlists are divided into separate IPv4 and IPv6 lists. These lists apply to the IP addresses retrieved by pattern matching in the body of the email message. If an IP address appears on a allowlist, then it is not added as an artifact to the incident.

The two categories of IP allowlist entry are CIDR (Classless Inter-Domain Routing) and IPRange. For example, in IPV4, IBM® owns the 9 class A network. You can choose to also allowlist an IP range, such as 12.0.0.1 - 12.5.5.5. To add these criteria to the allowlist, add the following to ipV4Allowlist.

  "9.0.0.0/8",
  "12.0.0.1-12.5.5.5"

If you choose to allowlist an explicit IP address, such as 13.13.13.13, specify it as follows.

"13.13.13.13"

IP v6 allowlists operate the same. For example, you can add "aaaa::/16" to allowlist a V6 CIDR. The following example shows how to add these changes to the IPV4 and IPV6 allowlists.

 # Allowlist for IP V4 addresses 
 ipV4AllowList = AllowList([
   "9.0.0.0/8",
   "12.0.0.1-12.5.5.5",
   "13.13.13.13"
 ])

 # Allowlist for IP V6 addresses
 ipV6AllowList = AllowList([
   "aaaa::/16"
 ])

The domain allowlist applies to URLs found in the body of the email. If an allowlisted domain is discovered in a potential URL artifact, it is not added to the incident. Domains can be added explicitly, such as mail.businessname.com, or by using a wildcard, such as *.otherbusinessname.com. First, locate the following line.

# Domain allowlist
domainAllowList = AllowList([
  #"*.ibm.com"
])

Change the line as follows.

# Domain allowlist
domainAllowList = AllowList([
  "mail.businessname.com",
  "*.otherbusinessname.com"
])

Extension and customization of the sample email processing script

To extend and customize the email processing script, you can choose to run multiple scripts for the same email or to modify the sample script that is provided with the SOAR Platform. Adding more scripts is generally a better idea than adding more complexity to one script.

The SOAR Platform might be expected to receive multiple categories of email messages from different apps. Some of the processing of the email messages might be common, and some processing might be category or app specific. Keeping the common processing in one script, and the specialized processing in others allows a cleaner and more maintainable implementation.

Each script execution is run within defined computational quota limits, which is 5 seconds of execution time or 50,000 lines of Python executed. Regular Expression processing is performed by the "re" (regular expression) Python module, execution of which is considered part of the quota. It is possible to create a complex regular expression whose execution requires a great many lines of Python to be interpreted on a particular email message. Execution of many such complex regular expressions might overrun the 50,000 line limit.

Example: Dealing with phishing reports

Scenario: Emails arriving in a particular mailbox reflect individuals who are forwarding suspected phishing messages. In addition to the common processing in the sample script, have the scripts for these email messages record the reporter's email address as possibly the target of a phishing attack. Also, record the sender of the forwarded phishing email as suspicious.

A solution: Add the following Python 3 script to the SOAR Platform.

import re

def addArtifact(regex, artifactType, description):
    """This method adds new artifacts to the incident derived from matches of the the regular expression  parameter within the email body contents.
    Parameter "regex" - a regular expression to match against the email body contents.  Parameter "artifactType" - the type of the artifact(s).
    Parameter "description" - the description of the artifact(s).  """
    # Using a set to enforce uniqueness
    dataList = set(re.findall(regex, emailmessage.body.content))
    if dataList is not None and len(dataList) > 0:
        map(lambda theArtifact: incident.addArtifact(
            artifactType, theArtifact, description), dataList)
###
# Mainline starts here
###
# Add "Phishing" as an incident type for the associated incident
incident.incident_type_ids.append("Phishing")
# Add the email sender information to the incident as the recipient of the phishing attempt
reportingUserInfo = emailmessage.sender.address
if emailmessage.sender.name is not None:
    reportingUserInfo = "{0} <{1}>".format(emailmessage.sender.name, emailmessage.sender.address)
    incident.addArtifact("Email Recipient", reportingUserInfo,
                         "Recipient of suspicious email")
# Extract email sender information on the assumption that a fishing email is being forwarded
if not emailmessage.body.content is None:
    addArtifact(r"From: (.*)\n", "Email Sender", "Suspicious email sender")
    addArtifact(r"Reply-To: (.*)\n", "Email Sender",
                "Suspicious email sender (Reply-To)")

It is important to run the phishing-specific script after the common script because the common script causes the incident variable to be set, and the phishing-specific script expects it to be done. You can use one of the following methods.

The phishing-specific script runs as the second script of a multi-script rule that first runs the standard script.
The phishing-specific script runs in a separate rule that runs afterward.

In either case, the phishing-specific script runs only on the condition that the email message that was created indicates that it is a phishing report.

If you implement the solution in one script, add the Phishing incident type to the incident at a point in the script when the incident object exists. For example, the following command occurs after the incident is created or found.

incident.incident_type_ids.append("Phishing")

Example: Campaign identifier

Scenario: The email message subject alone might not be enough to collect related emails into one incident.

It might be that the email message subject is not specific or reliable enough to use as the way to collect related emails. In particular, an attack that takes place where multiple attack vectors are employed in a single campaign might result in different kinds of email messages the system receives for this campaign.

One solution to the problem is to create a new field in an incident to contain a campaign identifier. This identifier might be derived from the content of the email messages or chosen from a hardcoded list when the campaign is recognized by the parsing script.

A solution.

Create an incident custom field for the campaign identifier, which must be of type Text.
Copy the sample parsing script into a new script.
Modify the new script to create a value for the campaign identifier. You can do it by selecting some text from the content of the email messages or selecting from a hardcoded list of campaign identifiers if certain criteria are met.
To associate the email message with an existing incident, search for incidents whose campaign identifier field is the same as the campaign identifier value for the email message. It would replace the search based on email message subject.
If no suitable incident is found, create a new incident and set its campaign identifier field to be the campaign identifier value.
Modify the rules so that the new script runs instead of the sample script.