IBM Support

StreamSets Transformer for Spark - Amazon Elastic MapReduce - Job fails with InvalidRequestException from AWS

Troubleshooting


Problem

There may be a permissions issue preventing the EMR application container handling the Transformer job from deleting files from the S3 location.

To confirm this, download the application container logs (referred to by Transformer for Spark as the driver logs) and check the stderr file for exceptions that can help confirm the root cause of the job failure.

The logs are archived to S3 and can be accessed through the AWS console. To access the logs follow these steps:

  1. Log into the AWS console and access the EMR service

  2. Navigate to the EMR cluster being used by Transformer

  3. Under Cluster Management, click the link to go to the log destination on S3:

    image-20240415-161606.png
  4. Click the containers link:

    image-20240415-161917.png
  5. The first container in the list should be the application master container (typically with the suffix 0001). Click the link for that container:

    image-20240415-163925.png

  6. The stderr and stdout files inside the subfolders should have details about what happened during the run of the job. Specifically, toward the end of the stderr file, you should see the last error that was encountered, for example, in this case there was an AccessDenied error returned by S3:

    24/04/11 04:33:08 ERROR S3AFileSystem: Partial failure of delete, 1 errors
    com.amazonaws.services.s3.model.MultiObjectDeleteException: One or more objects could not be deleted (Service: null; Status Code: 200; Error Code: null; Request ID: 7VFNDDY8914DJQ06; S3 Extended Request ID: asqFiNBWVOYF4rTjbAkXn4GELpoTrojyPAEAPaxxgD27z0+SUtIIoV9snCZ6bG0Ej9827JyCSnSqRLcr9+9JAA==; Proxy: null), S3 Extended Request ID: asqFiNBWVOYF4rTjbAkXn4GELpoTrojyPAEAPaxxgD27z0+SUtIIoV9snCZ6bG0Ej9827JyCSnSqRLcr9+9JAA==
    ...
    24/04/11 04:33:08 ERROR S3AFileSystem: das/: "AccessDenied" - Access Denied

    This should give you the information needed to resolve the root cause of the issue that caused the job to fail. The Request ID and Extended Request ID might help as well.

Symptom

When running StreamSets Transformer for Spark jobs on Amazon Elastic MapReduce (EMR), jobs sometimes fail with an AWS InvalidRequestException.

If you log in to the AWS console and access the S3 service and then manually clear out Transformer’s staging directory, then restart the job, it runs successfully.

Resolving The Problem

Using the information from the application container logs, address the root cause of the issue and then restart the Transformer job. For example, in the case noted above, the AccessDenied error was being returned by S3, indicating that the EMR cluster had insufficient permissions to delete the old files from the staging folder on S3 when trying to run the job.

Document Location

Worldwide

[{"Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSZUHT0","label":"IBM StreamSets Transformer for Spark"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
15 March 2025

UID

ibm17186267