Amazon S3 Integration Requirements

The following are the prerequisites necessary for Automatic Data Lineage to connect to this third-party system, which you may choose to do at your sole discretion. Note that while these are usually sufficient to connect to this third-party system, we cannot guarantee that the connection or integration will be successful since we have no control, liability, or responsibility for third-party products or services, including for their performance.

Network Access

The Manta Amazon S3 scanner uses AWS API to connect to Amazon S3.

The Manta deployment running the Amazon S3 scanner must have network access to the AWS API (hosted by Amazon).

Security Principal

To access AWS API, an AWS IAM principal (user) needs to exist with an access key generated.

The Access key ID and Secret access key will be used to authenticate the Manta Amazon S3 scanner to the scanned AWS account.

Policies

The security principal used by the scanner needs to have sufficient permissions to download S3 bucket metadata using the AWS API. This means the principal needs to be affected by permission policies allowing it to perform the following actions on extracted buckets.

A simplistic policy allowing Automatic Data Lineage to scan all the account’s buckets might look like the following.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": "*"
        }
    ]
}

Note on Cross-Region Bucket Visibility

The Manta Amazon S3 scanner initiates a connection to the initial region (specified in the connection configuration) to:

Region-local connections are then used to scan the contents of each extracted bucket.

In theory, there should be no issue with accessing the s3:GetBucketLocation on any accessible bucket from any initial region. However, during testing, we have come across intermittent issues accessing bucket metadata when connection region differs from accessed bucket region (especially for connections initiated in us-east-1). This behavior is not covered by Amazon S3 documentation.
We have added a verification step to the connection validation to detect this behavior and report it. If it occurs, we recommend switching the initial region to another mirror (e.g., from us-east-1 to us-east-2).