Large content upload streaming

You can upload large content from the GraphQL API to the Content Platform Engine server.

By default, the GraphQL API attempts to stream the content that is uploaded from the GraphQL server to the Content Engine server. Before V5.6.0, the uploaded content was loaded into memory before streaming to the Content Platform Engine, leading to high memory use or out-of-memory conditions for large content.

There are a few differences with the current default behavior that supports the streaming of content. These behavioral differences should not be encountered in most situations or are easily remedied. There are also ways to revert to the previous behavior of loading content in memory – either for a single request or to make it the server-wide default.

A multi-part form POST for content upload consists of the following parts:

One or more pre-defined parts, and
The accompanying content upload part or parts.

The following pre-defined parts are recognized by the GraphQL server:

graphql or operations: If a part is named with either of the "graphql" or "operations" keys, it contains JSON information. It includes the main graphql text under the field “query” along with the variable values specified in the "variables" field. This is one of the two ways in which the main graphql text can be sent in a multi-part form payload.
query: The "query" part contains the main graphql text, not JSON. This is the other way that the main graphql text can be specified. It is an alternative to the graphql or operations part and contains only the graphql text. Other data such as "variables" would be in a separate part.
variables: The "variables" part is used when the main graphql text is in the query part. It contains references to values that are assigned to the variables references by the main graphql text.
operationName: The "operationName" part is used when the main graphql text is in the query field. The graphql text holds multiple named operations but can execute only one in a single request. The operationName value names the operation that needs to be executed.
map: Defines a map of file upload parts to locations in the graphql that reference those parts through variables. The Content Services GraphQL API does not use this map but it is recognized and parsed by the GraphQL server. The API assumes that all variables representing uploaded content map to the corresponding part by name alone.
extensions: The "extensions" part is used when the main graphql text is in the query part. The Content Services GraphQL API does not use these extensions, but it is a part recognized and parsed by the GraphQL server.

The parts in the multi-part form POST need to follow this order for proper large content upload streaming:

Pre-defined parts are processed before any of the content upload parts.
The parts for the content elements being uploaded should follow the pre-defined parts. If there is more than one content element upload part, those parts should follow the order in which the content elements are referenced in the GraphQL text.

Processing a pre-defined part

When parsing the multi-part content, the server looks for one of the graphql, operations, or query parts first. If any non-pre-defined parts are located before finding the part containing the graphql text, that part is assumed to represent an uploaded content element. The part is set-aside and the multi-part content is parsed until the graphql text is located.

When processing pre-defined parts, the GraphQL server sets aside a part that contains the uploaded content as follows:

The part is set-aside first into memory up to a maximum in-memory size. If the maximum in-memory size is reached, the part then spills over to a file.
For parts that spill over to a file, the parts are set-aside until the maximum file size is reached and an error is returned. The file is saved to temp file storage, and usually, the temp file is deleted when the content is transferred to the Content Engine server. If not, the temp file is deleted automatically at some point if not deleted as part of the transfer to the Content Engine.

Processing a content upload part

Once a part that contains the main graphql text is located, and a part that represents an uploaded content element is encountered, the multi-part content is not parsed for any more pre-defined parts. The remaining parts representing uploaded content elements are streamed directly to the Content Engine server (unless any parts needed to be set-aside, in which case the set-aside contents will be transferred to the Content Engine).

If a pre-defined part is out-of-order, and encountered after starting to process the uploaded content element parts, an error is returned. For example, this occurs when the main graphql text is in the query part but another needed part, variables, is out of order with some content element upload parts. To resolve this, ensure that all pre-defined parts are located before content element upload parts.

When you upload multiple content elements, make sure that the parts representing the content elements are in order. Additionally, you should specify the file content type and the file name as values for contentType and retrievalName fields. A sample part looks like:

contentElements:{
replace: [
{
type: CONTENT_TRANSFER
contentType: "application/pdf"
subContentTransfer: {
content:$contvar
retrievalName: "Large_file.pdf"
}
. . .

if these field values are not specified during multiple content element upload, one or more parts must be set-aside in memory or in a file in order to parse ahead to the next part to use the file part headers. If the set-aside parts in the file reach maximum file size, it returns an error. When you specify the values for the contentType and retrievalName fields, the content is always properly streamed from the GraphQL server to the Content Engine server.