Amazon Security
When you use an Amazon S3 stage or an Amazon EMR or EMR Serverless pipeline, you can configure Transformer to use one of the following authentication methods to connect securely to Amazon Web Services (AWS):
- Instance profile
- When Transformer runs on an Amazon EC2 instance that has an associated instance profile, Transformer uses the instance profile credentials to automatically authenticate with AWS.
- AWS access keys
- When Transformer does not run on an Amazon EC2 instance or when the EC2 instance doesn’t have an instance profile, you can authenticate using an AWS access key pair. When using an AWS access key pair, you specify the access key ID and secret access key to use.
- None
- When accessing a public bucket, you can connect anonymously using no authentication.
Assume Another Role
When using instance profile or AWS access keys authentication, you can configure an Amazon S3 stage or an Amazon EMR or EMR Serverless pipeline to assume another IAM role.
When Transformer assumes a role, it temporarily gives up the instance profile or IAM user permissions and uses the permissions assigned to the assumed role. To assume a role, Transformer calls the AWS STS AssumeRole API operation and passes the role to use. The operation creates a new session with the temporary credentials, as long as the following conditions are true:
- The IAM policy attached to the current principal - the IAM role or user - grants permission to assume the specified role.
- The IAM trust policy attached to the role to be assumed permits the current principal to assume it.
Stage Library and Cluster Type Requirements
To assume another role, Transformer must use an Amazon Web Services library for Apache Hadoop 2.8.0 or later.
- Google Dataproc cluster
- Databricks cluster
- Hadoop YARN cluster using Cloudera CDH version 5.x.x
- AWS cluster-provided libraries when the cluster has the Amazon Web Services library for Apache Hadoop 2.8.0 or later installed.
- AWS Transformer-provided libraries for Hadoop 3.2.0
Assume Role Methods
- Assume a role with no restrictions
-
When configured to assume a role with no restrictions, any StreamSets user account that starts the pipeline can assume the role specified stage or pipeline cluster properties, as long as the IAM policies attached to the current principal and to the role to be assumed allow it.
For example, any StreamSets user who starts the job for the pipeline can assume the
finance
role when the IAM trust policy attached to thefinance
role allows the role to be assumed by the IAM role or user identified by the selected authentication method. - Assume a role using an external ID condition
- You can use an external ID condition when you configure an Amazon S3 stage or an Amazon EMR or EMR Serverless pipeline.
- Assume a role using session tags to restrict role access
- For increased security, you can configure an Amazon S3 stage or an Amazon EMR or EMR Serverless
pipeline
to assume a role and set session tags to restrict the
user accounts allowed to assume the role. When configured to set session tags,
the stage or pipeline passes the following session tag to the AWS STS AssumeRole
API operation:
streamsets/principal=<user>
Where
<user>
is the name of the currently logged in StreamSets user that starts the pipeline or job for the pipeline.AWS IAM verifies that the user account set in the session tag can assume the specified role. The IAM trust policy attached to the role to be assumed must allow the current principal permission to assume the role and must have constraints using IAM condition keys to limit the AssumeRole action based on the requested session tags.
For example, when the StreamSets user Joe starts the job for the pipeline, he can assume the
finance
role when the IAM trust policy attached to thefinance
role allows the userjoe
to assume the role. The StreamSets user Emily cannot assume thefinance
role because the trust policy attached to thefinance
role does not allow the useremily
to assume the role.
To enable Transformer to assume a role, you first must create the trust policy in AWS that allows the role to be assumed. Then, you configure the required cluster or stage properties in Control Hub.
Create the Trust Policy
In AWS, you must create and attach a trust policy to the role to be assumed. The policy must allow other principals - IAM roles or users - to assume the role.
- Trust policy to assume the role with no restrictions
- Create and attach a trust policy to the role to be assumed that allows another IAM role or user to assume the role.
- Trust policy to assume a role with an external ID condition
- Create and attach a trust policy to the role to be assumed that allows the IAM role or user to assume the role and that includes an external ID condition.
- Trust policy to assume a role using session tags to restrict role access
- Create and attach a trust policy to the role to be assumed that allows the IAM role or user to assume the role, uses session tags, and restricts the session tag values to specific StreamSets user accounts.
For more information about creating an IAM trust policy, see the AWS IAM documentation.
Configure Transformer to Assume a Role
After you create and attach a trust policy to the role to be assumed, you can configure an Amazon S3 stage or an Amazon EMR or EMR Serverless pipeline to assume the role.