Connect the Amazon Athena data source to the
platform to enable your applications and dashboards to collect and analyze Amazon Athena security data. Universal Data Insights connectors enable federated search across
your security products.
Before you begin
Collaborate with an AWS administrator to
obtain a user account with access to query the CloudWatch data source.
Configure VPC Flow Logs in
Amazon Athena
- Enable the VPC flow logs in Amazon
Console.
- Configure VPC flow log service to save logs in Amazon S3 bucket. For more information, see Publishing flow logs to Amazon
S3 (https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-s3.html).
- Create Amazon VPC table for VPC flow logs in
Amazon Athena service. For more information, see Querying Amazon VPC Flow
Logs (https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs.html).
Configure Amazon GuardDuty in Amazon Athena
- Enable the GuardDuty features in
Amazon Console.
- Configure the GuardDuty feature to
export findings in Amazon S3 bucket. For more
information, see Export Findings
(https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_exportfindings.html).
- Create a table for GuardDuty findings
in Amazon Athena. For more information, see Querying Amazon GuardDuty
Findings (https://docs.aws.amazon.com/athena/latest/ug/querying-guardduty.html).
Configure
Amazon Security Lake in
Amazon Athena
- Enable and start Amazon Security Lake in
Amazon Console. For more information, see Getting Started with Amazon Security Lake
(https://docs.aws.amazon.com/security-lake/latest/userguide/getting-started.html).
- Ensure that Amazon Security Lake stores logs in
Open Cybersecurity Schema Framework (OCSF) format. For more information, see Open Cybersecurity Schema Framework
(OCSF) format (https://schema.ocsf.io/).
- The client program must have query access to the AWS
Lake Formation tables as a subscriber. The following list contains the
minimum IAM permissions for the Amazon Athena connector:
- "athena:GetQueryExecution"
- "athena:GetQueryResults"
- "athena:ListWorkGroups"
- "athena:StartQueryExecution"
- "athena:StopQueryExecution"
- "glue:GetDatabases"
- "glue:GetTable"
- "s3:AbortMultipartUpload"
- "s3:DeleteObject"
- "s3:GetBucketLocation"
- "s3:GetObject"
- "s3:ListBucket",
- "s3:ListBucketMultipartUploads"
- "s3:ListMultipartUploadParts"
- "s3:PutObject"
- "sts:AssumeRole"
For more information, see
Subscriber management in Amazon Security Lake
(https://docs.aws.amazon.com/security-lake/latest/userguide/subscriber-management.html).
If you have a firewall between your cluster and the data source target, use the
IBM® Security Edge Gateway to host the containers. The Edge Gateway must be V1.6 or later. For more information,
see Setting up Edge
Gateway.
About this task
Amazon Athena uses standard SQL to analyze data in
Amazon S3. Data source connections for Amazon GuardDuty logs and VPC Flow logs are supported.
Structured Threat Information eXpression (STIX) is a language and
serialization format that organizations use to exchange cyberthreat intelligence. The Amazon Athena connector uses STIX patterning to query Amazon Athena data and returns results as STIX objects. For more information about how the Amazon Athena data schema maps to STIX, see Amazon Athena
STIX Mapping
(https://github.com/opencybersecurityalliance/stix-shifter/blob/develop/adapter-guide/connectors/aws_athena_supported_stix.md).
Procedure
- Go to
.
- On the Data Sources tab, click
Connect a data source.
- Click Amazon Athena, then click
Next.
- Configure the connection to the data source.
- In the Data source name field, assign a
name to uniquely identify the data source connection.
You can create multiple connection
instances to a data source so it would be good to clearly set them apart by name. Only alphanumeric
characters and the following special characters are allowed: - .
_
- In the Data source description field,
write a description to indicate the purpose of the data source connection.
You can create
multiple connection instances to a data source, so it is useful to clearly indicate the purpose of
each connection by description. Only alphanumeric characters and the following special characters
are allowed: - . _
- If you have a firewall between your cluster and the data source target,
use the Edge Gateway to host the containers. In
the Edge gateway (optional) field, specify which Edge Gateway to use.
Select an Edge Gateway to host the connector. It can take up to five
minutes for the status of newly deployed data source connections on the Edge Gateway to show as being connected.
- In the Region field, set the Amazon Athena region for the data source. Select your region code
from the Region column of the Service Endpoints table in the Amazon Athena endpoints and quotas
(https://docs.aws.amazon.com/general/latest/gr/athena.html).
- In the Amazon S3 Bucket Location field, set the location of the
S3 bucket where query results will be stored.
- If you are using Amazon Athena with VPC
flow logs, specify the name of the database that contains the VPC flow logs in the VPC
Flow Logs database name (optional) field.
- If you are using Amazon Athena with VPC
flow logs, specify the name of the table that contains the VPC flow logs in the VPC Flow
Logs table name (optional) field.
- If you are using Amazon Athena with
Amazon GuardDuty, specify the name of the database
that contains the Amazon GuardDuty logs in the
Amazon GuardDuty database name (optional) field.
- If you are using Amazon Athena with
Amazon GuardDuty, specify the name of the table
that contains the Amazon GuardDuty logs in the
Amazon GuardDuty table name (optional) field.
- If you are using Amazon Athena to query
Amazon Security Lake logs, specify the name of the
AWS
Lake formation database that contains the security logs in the
OCSF logs database name (optional) field.
- If you are using Amazon Athena to query
Amazon Security Lake logs, specify the name of the
AWS
Lake formation table that contains the security logs in the OCSF
logs table name (optional) field.
-
Set the query parameters to control the behavior of the search query on the data source.
- In the Concurrent search limit field, set the
number of simultaneous connections that can be
made to the
data source. The default limit for the number of connections is 4. The value must not be less than 1
and must not be greater than 100.
- In the Query search timeout limit field,
set the time limit in minutes for how long the query is run on the data source. The default time
limit is 30. When the value is set to zero, there is no timeout. The value must not be less than 1
and must not be greater than 120.
- In the Result size limit field, set the maximum
number of entries or objects that are returned by search query. The default result size limit is
10,000. The value must not be less than 1 and must not be greater than 500,000.
- In the Query time range field, set the time
range in minutes for the search, represented as the last X minutes. The default
is 5 minutes. The value must not be less than 1 and must not be greater than 10,000.
Important: If
you increase the concurrent search limit and the result size limit, a greater amount of data can be
sent to the data source, which increases the strain on the data source. Increasing the query time
range also increases the amount of data.
- Optional: If you need to customize the STIX attributes mapping, click
Customize attribute mapping and edit the JSON blob to map new or existing
properties to their associated target data source fields.
- Configure identity and access.
- Click Add a Configuration.
- In the Configuration name field, enter a
unique name to describe the access configuration and distinguish it from the other access
configurations for this data source connection that you might set up. Only alphanumeric characters
and the following special characters are allowed: - . _
- In the Configuration description field, enter
a unique description to describe the access configuration and distinguish it from the other access
configurations for this data source connection that you might set up. Only alphanumeric characters
and the following special characters are allowed: - . _
- Click Edit access and choose which users can
connect to the data source and the type of access.
- Establish AWS authentication to
enable access to the AWS search API.
- To establish an AWS key-based authentication,
enter values for the AWS Access key id and AWS secret access
key parameters.
- To establish an AWS role-based
authentication, enter values for the AWS Access key id, AWS secret
access key, and AWS IAM Role parameters.
- To grant access to your AWS resources and
establish an Assume Role authentication, enter a value for the External ID for AWS Assume
Role parameter. For more information, see Using an external ID for third-party access
(https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html).
For more information about
AWS
authentication, see
Configuring AWS authentication.
- Click Add.
- To save your configuration and establish the connection, click
Done.
You can see the data source connection configuration that
you added under Connections on the data source settings page. A message on the card indicates
connection with the data source.
When you add a data source, it might take
a few minutes before the data source shows as being connected.
Tip: After you connect a
data source, it might take up to 30 seconds to retrieve the data. Before the full data set is
returned, the data source might display as unavailable. After the data is returned, the data source
shows as being connected, and a polling mechanism occurs to validate the connection status. The
connection status is valid for 60 seconds after every poll.
You can add other
connection configurations for this data source that have different users and different data access
permissions.
- To edit your configurations, complete the following steps:
- On the Data Sources tab, select the data source connection that
you want to edit.
- In the Configurations section, click Edit
Configuration (
).
- Edit the identity and access parameters and click
Save.
What to do next
Test the connection by running a query with IBM Security Data Explorer. To use Data Explorer, you must have data sources that are
connected so that the application can run queries and retrieve results across a unified set of data
sources. The search results vary depending on the data that is contained in your configured data
sources. For more information about how to build a query in Data Explorer, see Build a query.