Using batch processing in IBM App Connect

Batch processes are optimized for handling larger volumes of data than the standard retrieve action in IBM® App Connect Enterprise as a Service. A flow starts a batch process, which runs as a separate asynchronous process with different operational limits to the flow.

You can use a batch process to copy records from one external system to another, or to complete an action that needs to be repeated many times. You might also want to use a batch process in your flow if you want to retrieve more records than the retrieve action allows. The standard retrieve action has a limit of 1000 records, whereas the batch retrieve node has no limit, other than any that you choose to set. A batch process also differs from the standard retrieve action because it's an asynchronous process that the flow initiates, but it doesn't run as part of the flow. A batch process completes at a different time to your main flow.

A batch process also has different operational limits to the main flow. Therefore, you're less likely to exceed memory constraints when you process many records if you use a batch process, rather than a For each node. You can use batch processes in trial and enterprise subscriptions. If you're on the flow runs price plan, one invocation of a batch node counts as one flow run, regardless of the number of records that are processed. For more information, see Pricing plans.

You can use a Batch process node with a range of applications, including Amazon S3, Confluence, IBM Db2®, Microsoft applications, Oracle, Salesforce, and SAP.

The following example uses a batch process to retrieve lead records from one Salesforce instance and then update or create records in another Salesforce instance.

What to consider first

When you choose the actions to include in your batch process, consider what might happen if the request to the external system failed and App Connect retried the request. Some application actions are idempotent and others are not. Idempotent actions can be completed many times without different outcomes. If nonidempotent actions complete more than once, the system might be in a different state than if the action completed only once. For example, if you use the Salesforce Create lead action in your batch process, this action might be repeated by App Connect for a particular record, perhaps because the Salesforce system took too long to respond. Because Create lead is not an idempotent action, you might find that you have some duplicate leads when the batch process is complete. Instead, you can choose an idempotent action such as Update or create lead, and configure the action so that it creates a lead only if an equivalent lead doesn't exist in your Salesforce account. Therefore, it doesn't matter whether the action is tried only once or five times. You get the exact number of leads that you retrieved from your source system.

Find or create what you need

Note: If you already have an App Connect instance, and App Connect is connected to the appropriate applications for your batch process, you can skip this step.

For this example, you need a Salesforce account with some data that you can retrieve. If you create a free Salesforce account to try out App Connect, make sure that you create a Developer account rather than a Trial account. If you connect to App Connect with a Trial account, the Salesforce events don't work.
You need the authentication details for any other applications that you want to access in your flow (if App Connect isn't already connected to your accounts). You can connect to your accounts on the App Connect Designer Catalog page, or you can connect as you add each application to your flow.
Some applications need some extra information to be able to connect to App Connect. If you need help with finding this information, see "How to" guides for apps.

Create your flow

(App Connect automatically saves your changes as you go. If you move away from the flow at any point, the flow is saved as a draft flow that you can come back to later.)

Create or edit the flow that you want to add a batch process to. For example, on the Designer home page, click Create an event-driven flow.
Enter a name that identifies the purpose of your flow.
Select your first application (source) and an event to trigger the actions in the rest of your flow. This example copies leads from one Salesforce account to another, and runs the flow on a specific date. In this case, the Scheduler Schedule flow event from the Toolbox starts the flow.
To add a batch process action to your flow, click the plus icon, click Toolbox, then select Batch process. A branch off the main flow contains a Batch process box to contain all the actions in your batch process.
Select the application that you want to extract data from, then expand the object that you want to retrieve, and select the retrieve link for that object. For this example, to retrieve Salesforce leads, select Retrieve leads.
To retrieve records that meet certain criteria, add one or more conditions. Alternatively, to retrieve all records, delete the condition by clicking Remove condition (x).
To set a maximum number of records to retrieve, select Specify maximum number of items to retrieve. You can either type in a number or click the icon to set the limit to the maximum number of records.
In Specify a log ID for each record of the batch, define an ID that identifies each record in the batch in the log.
Messages are written to the log if you include a Log node in your flow or if an error occurs during the batch process. To identify specific records that aren't processed successfully, you can define the ID that appears in the log. Make sure that the ID is unique for each record in the batch, and is a maximum of 256 characters long. (If the ID is longer than 256 characters, it is truncated in the log.) The ID can consist of fields that are mapped from the source application, text, and JSONata expressions. For more information, see JSONata.
For example, some text ((Main batch)) is added to identify the main batch process in a flow. The ID also contains mappings to the lead ID and first and last name fields.
Figure 1. A Batch record ID set to (Main batch) Lead ID First Name Last Name

If you add a Log node to your flow, or errors occur during batch processing, the ID that you defined appears in the batch-record-id_str column of the log.
To process the records that are retrieved, you can add more actions or logic after the retrieve action. In the Batch process box, click the plus icon, expand a target application, then select an action. All actions that you add in the Batch process box are completed for each record that is retrieved at the beginning of the batch process. For example, you might select the Salesforce Update or create lead action to update or create a lead in a second Salesforce account.
Figure 2. An update or create node is used as an idempotent action for data copy

Remember: The Update or create lead action is a better choice than the Create lead action for this kind of data copy process. This action is idempotent, which reduces the chances of getting duplicate leads. You can configure Where condition fields to identify whether a record exists to be updated, or does not exist so must be created.

Figure 3. Condition fields that you can map for the Salesforce Update or create lead action
Map appropriate information between your retrieved records and your action. In the available inputs for mapping, applications that are part of the batch process are prefixed with "Batch process".
Figure 4. Batch process Salesforce lead fields available for mapping
You can add one or more batch completion actions at the end of the batch process, which can complete actions based on the status of the batch process. Make sure that you add any batch completion actions after the batch completion icon in the batch process flow .
In the following example, an If node is added after the batch completion icon. The If node contains different actions that are completed depending on whether the batch process completed successfully, failed, was stopped, or timed out.
You can also add actions to the main flow, outside the batch process. These actions happen after the batch process starts, but they run independently of the batch process Therefore, these actions can finish before or after the batch actions complete. For this reason, you can't map data from the batch process branch into actions in the main flow. For example, you can add a Slack Send message action that posts a message on your chosen Slack channel to indicate that a batch process has started.
Figure 5. A Slack "Send message" action is added to the main flow after the Batch process branch
If you're on the VPC hours plan, when you finish defining your flow, you can start it to test the batch process. Click Test flow, then choose how to test your flow. For more information, see Testing flows during development.
If you're on the flow runs plan, deploy your flow by following the instructions in Deploying integrations on the flow runs plan.

Examining a running batch process

If you're on the VPC hours price plan, you can examine your running batch process when you test it on the Designer dashboard. After you deploy a flow that contains a batch process, you can use the public API to view the status and to pause, resume, or stop the file extraction part of a batch process. For more information, see Administering batch processes with the API. On the App Connect Designer dashboard, a tile represents your flow and shows whether a batch is running.

A tile on the dashboard that shows a batch in progress. — Figure 6. A tile on the dashboard that shows a batch in progress

To see a summary of completed and running batches for a flow, open the tile's options menu [⋮] and click View batches. You can get information about a batch process for up to 6 hours after the batch operation completes.

You can use options in the Actions column to stop a running batch, or to view the logs for a batch process in the built-in log viewer. For more information, see Viewing log messages in the log viewer.

Administering batch processes with the API

You can use the public API for App Connect Enterprise as a Service to get the status of your batch processes and to pause, resume, and stop the file extraction part of a batch process. You can get information about a batch process for up to 6 hours after the batch operation completes. To find out how to get an access token to access the App Connect Enterprise as a Service API, see Accessing the API.

You can use the following routes in the API to administer your batch processes.

To get details of all batch processes that are running or that completed within the last 6 hours on a particular integration runtime, use a get method with the following route. Replace integrationRuntimeName with the name of the integration runtime where your batch process is running.
```
/api/v1/integration-runtimes/integrationRuntimeName/batches
```
To get the details of a specific running or completed batch process, use a get method with the following route and specify your integration runtime, batch ID, and flow ID. If the batch process completed more than 6 hours ago, a 404 not found error is returned.
```
/api/v1/integration-runtimes/integrationRuntimeName/batches/batchID?flow_id=flowID
```
To pause a running batch process, use a post method with the following route and specify your integration runtime and the batch ID. The pause action applies to extraction (retrieving records) from the source system only. You can't pause the processing of those retrieved records.
```
/api/v1/integration-runtimes/integrationRuntimeName/batches/batchID/pause
```
To resume a paused batch process, use a post method with the following route and specify your integration runtime and the batch ID. The resume action applies to extraction (retrieving records) from the source system only. When you resume a paused batch, the batch process continues extracting records from the point at which you paused it.
```
/api/v1/integration-runtimes/integrationRuntimeName/batches/batchID/resume
```
To stop a running or paused batch process, use a post method with the following route and specify your integration runtime and the batch ID:
```
/api/v1/integration-runtimes/integrationRuntimeName/batches/batchID/stop
```

To view the API documentation, see OpenAPI document.