Amazon AWS S3 REST API protocol configuration options

The Amazon AWS S3 REST API protocol is an active outbound protocol that collects AWS CloudTrail logs from Amazon S3 buckets.

Note: It's important to ensure that no data is missing when you collect logs from Amazon S3 to use with a custom DSM or other unsupported integrations. Because of the way that the S3 APIs return the data, all files must be in an alphabetically increasing order when the full path is listed. Make sure that the full path name includes a full date and time in ISO9660 format (leading zeros in all fields and a YYYY-MM-DD date format).

Consider the following file path:

<Name>test-bucket</Name><Prefix>MyLogs/</Prefix><Marker>MyLogs/2018-8-9/2018-08-09T23-5925.955097.log.g</Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

The full name of the file in the marker is MyLogs/2018-8-9/2018-08-09T23-59-25.955097.log.gz and the folder name is written as 2018-8-9 instead of 2018-08-09. This date format causes an issue when data for the 10 September 2018 is presented. When sorted, the date displays as 2018-8-10 and the files are not sorted chronologically:

2018-10-1

2018-11-1

2018-12-31

2018-8-10

2018-8-9

2018-9-1

After data for 9 August 2018 comes in to IBM® QRadar®, you won't see data again until 1 September 2018 because leading zeros were not used in the date format. After September, you won't see data again until 2019. Leading zeros are used in the date (ISO 9660) so this issue does not occur.

By using leading zeros, files and folders are sorted chronologically:

2018-08-09

2018-08-10

2018-09-01

2018-10-01

2018-11-01

2018-12-01

2018-12-31
Restriction:

A log source can retrieve data from only one region, so use a different log source for each region. Include the region folder name in the file path for the Directory Prefix value when you use the Directory Prefix event collection method to configure the log source.

The following table describes the common parameter values to collect audit events by using the Directory Prefix collection method or the SQS event collection method. These collection methods use the Amazon AWS S3 REST API protocol.
Table 1. Amazon AWS S3 REST API protocol common log source parameters with the Directory Prefix method or the SQS method
Parameter Description
Protocol Configuration Amazon AWS S3 REST API
Log Source Identifier

Type a unique name for the log source.

The Log Source Identifier can be any valid value and does not need to reference a specific server. The Log Source Identifier can be the same value as the Log Source Name. If you have more than one configured log source per DSM, ensure that you give each one a unique name.

Authentication Method
Access Key ID / Secret Key
Standard authentication that can be used from anywhere.
For more information about configuring security credentials, see Configuring security credentials for your AWS user account.
EC2 Instance IAM Role
If your managed host is running on an AWS EC2 instance, choosing this option uses the IAM Role from the instance metadata that is assigned to the instance for authentication. No keys are required. This method works only for managed hosts that are running within an AWS EC2 container.
Access Key

The Access Key ID that was generated when you configured the security credentials for your AWS user account.

If you selected Access Key ID / Secret Key or Assume IAM Role, the Access Key parameter is displayed.

Secret Key

The Secret Key that was generated when you configured the security credentials for your AWS user account.

If you selected Access Key ID / Secret Key or Assume IAM Role, the Secret Key parameter is displayed.

Assume an IAM Role Enable this option by authenticating with an Access Key or EC2 instance IAM Role. Then, you can temporarily assume an IAM Role for access.
Assume Role ARN The full ARN of the role to assume. It must begin with arn: and can't contain any leading or trailing spaces, or spaces within the ARN.

If you enabled Assume an IAM Role, the Assume Role ARN parameter is displayed.

Assume Role Session Name The session name of the role to assume. The default is QRadarAWSSession. Leave as the default if you don't need to change it. This parameter can contain only upper and lowercase alphanumeric characters, underscores, or any of the following characters: =,.@-

If you enabled Assume an IAM Role, the Assume Role Session Name parameter is displayed.

Assume Role External ID An identifier that you might need to assume a role in another account. You get this value from the administrator of the account that the role belongs to.

This value can be any string, such as a passphrase, GUID, or account number.

For more information, see How to use an external ID when granting access to your AWS resources to a third party (https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html).

If you enabled Assume an IAM Role, the Assume Role External ID parameter is displayed.

S3 Collection Method
SQS Event Notifications
Poll an SQS Queue that contains ObjectCreated notifications that are configured on the S3 folders of your choice. Then, download and process the files that are referenced in the notification from the S3 bucket. You can use either a single queue or separate queues to cover multiple buckets and accounts. This decision depends on your configuration.

For more information, see Table 3.

Use a Specific Prefix
Logs within the specified folder are processed. For CloudTrail log sources, the prefix must be for a single account or region only. For other log sources, the prefix must point to a single directory that contains files with ISO8601 timestamps at the beginning of each file name. The timestamp in each file name ensures that the protocol collects new events with this method.

For more information, see Table 2.

Region Name The region that the SQS Queue or the AWS S3 bucket is in.

Example: us-east-1, eu-west-1, ap-northeast-3

Event Format Choose the format of the events that are contained in the files.
AWS CloudTrail JSON
Files that contain JSON formatted events for Amazon Cloud Trail (.json.gz files only).
LINEBYLINE
Raw log files that contain one record per line. Compression with gzip (.gz or .gzip), and zip (.zip) is supported.
AWS VPC Flow Logs
Files that contain AWS native/OCSF VPC Flow logs (.txt.gz or gz.parquet files only). This option sends flows to the Network Activity tab in QRadar. The configured log source doesn't show a Last Event Seen Time because the output is flow data.
AWS Network Firewall Logs
Files that contain AWS Network Firewall Alert or Flow logs. This option sends flow logs to the Network Activity tab and sends alert logs as events to the Log Activity tab in QRadar. The Amazon AWS Network Firewall DSM parses the logs.
Tip: If your system is not licensed for flows, use the LINEBYLINE processor so that the DSM can parse the AWS Network Firewall logs.
W3C
For use with the Cisco Cloud Web Services DSM (.gz files only).
Cisco Umbrella CSV
For use with the Cisco Umbrella DSM (.gz files only).
Apache Parquet
Choose this option to convert Apache Parquet files into JSON events (.gz.parquet and .parquet files only).
Flow Destination Hostname The flow processor hostname where the VPC Flow or AWS Network Firewall flow logs are sent.

If you select AWS VPC Flow Logs or AWS Network Firewall in the Event Format parameter, you can configure this parameter.

Flow Destination Port The flow processor port where the VPC Flow or AWS Network Firewall flow logs are sent.

If you select AWS VPC Flow Logs or AWS Network Firewall in the Event Format parameter, you can configure this parameter.

Use As A Gateway Log Source

Select this option for the collected events to flow through the QRadar Traffic Analysis engine and for QRadar to automatically detect one or more log sources.

When you select this option, the Log Source Identifier Pattern can optionally be used to define a custom Log Source Identifier for events that are being processed.

Log Source Identifier Pattern

If you selected Use As A Gateway Log Source, you can define a custom log source identifier for events that are being processed and for log sources to be automatically discovered when applicable. If you don't configure the Log Source Identifier Pattern, QRadar receives events as unknown generic log sources.

Use key-value pairs to define the custom Log Source Identifier. The key is the Identifier Format String, which is the resulting source or origin value. The value is the associated regex pattern that is used to evaluate the current payload. This value also supports capture groups that can be used to further customize the key.

Define multiple key-value pairs by typing each pattern on a new line. Multiple patterns are evaluated in the order that they are listed. When a match is found, a custom Log Source Identifier is displayed.

The following examples show multiple key-value pair functions.
Patterns
VPC=\sREJECT\sFAILURE
$1=\s(REJECT)\sOK
VPC-$1-$2=\s(ACCEPT)\s(OK)
Events
{LogStreamName: LogStreamTest,Timestamp: 0,Message: ACCEPT OK,IngestionTime: 0,EventId: 0}
Resulting custom log source identifier
VPC-ACCEPT-OK
Use Predictive Parsing If you enable this parameter, an algorithm extracts log source identifier patterns from events without running the regex for every event, which increases the parsing speed.
Tip: In rare circumstances, the algorithm can make incorrect predictions. Enable predictive parsing only for log source types that you expect to receive high event rates and require faster parsing.
Show Advanced Options Select this option if you want to customize the event data.
File Pattern

This option is available when you set Show Advanced Options to Yes.

Type a regex for the file pattern that matches the files that you want to pull; for example, .*?\.json\.gz.

Local Directory

This option is available when you set Show Advanced Options to Yes.

The local directory on the Target Event Collector. The directory must exist before the AWS S3 REST API protocol attempts to retrieve events.

S3 Endpoint URL

This option is available when you set Show Advanced Options to Yes.

The endpoint URL that is used to query the AWS S3 REST API.

If your endpoint URL is different from the default, type your endpoint URL. The default is https://s3.amazonaws.com.

Use S3 Path-Style Access

Forces S3 requests to use path-style access.

This method is deprecated by AWS. However, it might be required when you use other S3 compatible APIs. For example, the https://s3.region.amazonaws.com/bucket-name/key-name path-style is automatically used when a bucket name contains a period (.). Therefore, this option is not required, but can be used.

Use Proxy

If QRadar accesses the Amazon Web Service by using a proxy, enable Use Proxy.

If the proxy requires authentication, configure the Proxy Server, Proxy Port, Proxy Username, and Proxy Password fields.

If the proxy does not require authentication, configure the Proxy IP or Hostname field.

Recurrence

How often a poll is made to scan for new data.

If you are using the SQS event collection method, SQS Event Notifications can have a minimum value of 10 (seconds). Because SQS Queue polling can occur more often, a lower value can be used.

If you are using the Directory Prefix event collection method, Use a Specific Prefix has a minimum value of 60 (seconds) or 1M. Because every listBucket request to an AWS S3 bucket incurs a cost to the account that owns the bucket, a smaller recurrence value increases the cost.

Type a time interval to determine how frequently the poll is made for new data. The time interval can include values in hours (H), minutes (M), or days (D). For example, 2H = 2 hours, 15M = 15 minutes, 30 = seconds.

EPS Throttle

The maximum number of events per second that QRadar ingests.

If your data source exceeds the EPS throttle, data collection is delayed. Data is still collected and then it is ingested when the data source stops exceeding the EPS throttle.

The default is 5000.

The following table describes the specific parameter values to collect audit events by using the Directory Prefix event collection method:

Table 2. Amazon AWS S3 REST API protocol log source-specific parameters with the Directory Prefix method
Parameter Description
S3 Collection Method Select Use a Specific Prefix.
Bucket Name

The name of the AWS S3 bucket where the log files are stored.

Directory Prefix

The root directory location on the AWS S3 bucket from where the CloudTrail logs are retrieved; for example, AWSLogs/<AccountNumber>/CloudTrail/<RegionName>/.

To pull files from the root directory of a bucket, you must use a forward slash (/) in the Directory Prefix file path.

Note:
  • Changing the Directory Prefix value clears the persisted file marker. All files that match the new prefix are downloaded in the next pull.
  • The Directory Prefix file path cannot begin with a forward slash (/) unless only the forward slash is used to collect data from the root of the bucket.
  • If the Directory Prefix file path is used to specify folders, you must not begin the file path with a forward slash (for example, use folder1/folder2 instead).

The following table describes the parameters that require specific values to collect audit events by using the SQS event collection method:

Table 3. Amazon AWS S3 REST API protocol log source-specific parameters with the SQS method
Parameter Description
S3 Collection Method Select SQS Event Notifications.
SQS Queue URL The full URL that begins with https://, for the SQS Queue that is set up to receive notifications for ObjectCreated events from S3.