Data payload corruption

Data that is transferred to complete an HTTP request can get modified unintentionally and silently. Corruption might be tolerable, or might be intolerable for sensitive data. There are features in AS4 Microservice to minimize undetected corruption and prevent the acceptance of known corrupted data.

As data is transferred through multiple components through the Internet, data can be corrupted in various ways. Proxy servers can introduce corruption and corruption can occur because of errors on the disks that hold the data. Corruption can occur to data that is downloaded and data that is uploaded. The MD5 algorithm converts any sequence of bytes (zero or more) into a single 16-byte value that is called a hash. The MD5 algorithm conforms to the IETF RFC 1321 standard. The algorithm runs on any computer language, operating system, or hardware to compute the same value for the same input sequence. The MD5 algorithm produces a cryptographic hash because it is unlikely that two inputs have the same 16-byte output.

POST request

When a POST is initiated, a hash can optionally be included in the header. If the client generates the Content-MD5 header, the server independently computes the MD5 value that is based on the bytes of the body. If any bytes changed in transit, the MD5 value that the server computes differs from what the client specified in the header. Calculate the hash by using commands and provide the hash in the format:
Content-MD5: 5MTASjcWUgmtLbAi8AZ0jQ==
where Content-MD5 is the header name and 5MTASjcWUgmtLbAi8AZ0jQ== is the value.
Upon arrival, the hash is compared to the hash value calculated from the data payload. If both hashes match, the payload is approved for storage. If the two hashes do not match, the server rejects the payload with a 400 Bad Request failure status. If a Content-MD5 is not included in the header, the storage server can generate a header for each blob based on the following property:
StoreWithMD5Always="true"
If StoreWithMD5Always="true" the hash is stored in the metadata that is associated with the payload. An example of the hash value that is stored in the metadata is:
<entry key="md5Digest">5MTASjcWUgmtLbAi8AZ0jQ==</entry>
where 5MTASjcWUgmtLbAi8AZ0jQ== is the MD5 hash value that is represented as a string according to the IETF RFC 1864 specification.

GET request

If the blob metadata does not include a hash, the GET response does not contain a Content-MD5 header. When the blob metadata does include the MD5 digest, the server returns that value in a Content-MD5 header in the response.

If any component retrieves a blob from storage and finds the Content-MD5 header, the component can calculate the MD5 value as the data is being received. After the last byte from the GET request is read, the calculated digest value is compared to what the header contains. If there is a mismatch, the request is rejected. In this scenario, the client computes and verifies the hash.

Additionally, the storage component computes the hash on the data read from the blob and compares it with the hash in the metadata of the blob. If there is a mismatch, then the storage component holds off from sending the last byte to the client, which forces the client to reject the response.

DELETE request

A DELETE request with a Content-MD5 header succeeds if the blob metadata hash matches the hash in the header. A request where the Content-MD5 header does not match fails. If the request is repeated without the Content-MD5 header, the request succeeds, even though the blob metadata includes the MD5 digest.