File ingestion prefix structure

Sterling Intelligent Promising uses a structured prefix system to organize files during the ingestion process, enabling efficient tracking, error handling, and cleanup. Each prefix plays a distinct role in managing the lifecycle of ingested files, helping to trace issues, monitor progress, and maintain optimal storage requirements.

You must understand the following prefixes that are created automatically by Sterling Intelligent Promising:

/summary

The summary json file contains statistical data that relates to the file ingestion process. The file is uploaded to the summary prefix location in the source bucket.

Review the following example of a summary file.

{
	  "date" : "2024-09-27",
	  "tenantId" : "us-f226da3c",
	  "fileType" : "item.ingest",
	  "producer" : "external-application",
	  "ingestRuleId" : "sip-premium-item.ingest-rule",
	  "ingestSchemaId" : "item.ingest-202404",
	  "bucketName" : "sip-integration-test-automation",
	  "fileKey" : "us-f226da3c/UPLOAD/ITEM/items.csv",
	  "consumer" : "sip-catalog",
	  "fileTransformStartTime" : "2024-09-27T15:02:28.512Z",
	  "rowsFailedSchemaVal" : "1",
	  "failedRowsFileKey" : "us-f226da3c/failed/sip-catalog/UPLOAD/ITEM/240927T150228570_items_failed_rows.csv",
	  "failedReasonsFileKey" : "us-f226da3c/failed/sip-catalog/UPLOAD/ITEM/240927T150228600_items_failure_reasons.txt",
	  "rowsTransformed" : "4",
	  "fileTransformEndTime" : "2024-09-27T15:02:28.818Z",
	  "fileExchangeStartTime" : "2024-09-27T15:02:28.830Z"
	  "eventPayloadBatchSize" : "1",
	  "fileExchangeEndTime" : "2024-09-27T15:02:28.881Z",
	  "completedFileKey" : "us-f226da3c/completed/UPLOAD/ITEM/240927T150228881_items.csv"
	}

In this example, the fileKey represents the original file key.

When a failure occurs, information is provided about the following:

failedRowsFileKey: The file key for the failed rows file.
failedReasonsFileKey: The file key for the failure reasons file.

/completed

All files that completed processing. It is important to clean up the completed files routinely to minimize the storage requirements on IBM Cloud® Object Storage.

/failed

When rows in an input CSV file fail to process, they are written to a separate CSV file and uploaded to the failed prefix location within the source bucket. Also, a corresponding failure reasons file is generated and uploaded to the same location. This file lists the reason for each failure.

Review the following example for a failed file ingestion process:
For the prefix/file1.csv source file, the failed files are stored as follows:
- failed rows file: failed/<consuming_application>/prefix1/<timestamp>_file1_failed_rows.csv
- failure reasons file: failed/<consuming_application>/prefix1/<timestamp>_file1_failure_reasons.txt

Review the following example of a failure reasons text file:
```
1 |5 |Expected 95 fields, was 94
```
The failure reasons file includes:
- The row number from the failed rows CSV file.
- The corresponding row number from the original CSV source file.
- The data fields that are responsible and the reason for the failure.
Note: Row numbers in the failure file refer only to data rows and the header row is excluded. Use this information to correlate and to locate the problematic entries accurately.

/reprocessed

Any files that are in a queue for reprocessing due to connectivity issues.

No actions are required for this prefix. Any files that are stored in the reprocessing prefix temporarily are retried automatically in Sterling Intelligent Promising. Additional actions are not required to handle the data.