Using batch processing in IBM App Connect

Batch processes are optimized for handling much larger volumes of data than the standard retrieve action in IBM® App Connect on IBM Cloud. A batch process is initiated by a flow and runs as a separate asynchronous process with different operational limits to the flow.

You can use a batch process to copy records from one external system to another, or to carry out an action that needs to be repeated many times. You might also want to use a batch process in your flow if you want to retrieve more records than the retrieve action will allow. The standard retrieve action has a limit of 1000 records, whereas the batch retrieve node has no limit, other than any that you choose to set. A batch process also differs from the standard retrieve action because it's an asynchronous process that's initiated by your flow, but doesn't run as part of it. A batch process completes at a different time to your main flow.

A batch process also has different operational limits to the main flow, so you're less likely to exceed memory constraints when processing many records if you use a batch process, rather than a For each node. For more details about the operational limits of App Connect on IBM Cloud, see What are the operational limits for App Connect?.

You can use a Batch process node with a range of applications, including Amazon S3, Confluence, IBM Db2®, Microsoft applications, Oracle, Salesforce, and SAP.

The following example uses a batch process to retrieve lead records from one Salesforce instance and then update or create records in another Salesforce instance.

What should I consider first?

Consider the following behavior and restrictions when you use this Batch process node.
  • The Batch process node is available in App Connect on IBM Cloud. This means that you can take advantage of the logs that are available with IBM Cloud®.
  • The number of batch processes that you can run depends on the plan that you've purchased. If you try to run more batch processes concurrently than your plan will allow, you'll see a message in the IBM Cloud logs that says something like "The batch process could not be started because the maximum number of concurrent running batch processes for the instance has been reached".
  • You'll be billed for the flow invocation, plus any outbound data that your flow sends to external applications.
  • When you're choosing which actions to include in your batch process, consider what would happen if the request to the external system failed for some reason, and App Connect retried the request.

    Some application actions are idempotent (they can be carried out many times without different outcomes) and others are not (if they were carried out more than once, the system would be in a different state than if the action was carried out only once). For example, if you use the Salesforce Create lead action in your batch process, this action could be repeated by App Connect for a particular record, say because the Salesforce system took too long to respond. Because Create lead is not an idempotent action, you could potentially find that you have some duplicate leads when the batch process is complete. Instead, we recommend that you choose an idempotent action such as Update or create lead, and configure the action so that it will only create a lead if an equivalent lead doesn't already exist in your Salesforce account. This way, it doesn't matter whether the action is tried only once, or five times – you will have the exact number of leads that you retrieved from your source system.

Find or create everything you need

Note: If you already have an App Connect instance, and App Connect is connected to the appropriate applications for your batch process, you can skip this step.
  • Obtain an App Connect on IBM Cloud service. You can use an App Connect service that you already have or can sign up for App Connect.

    In this example, we use a new free App Connect; Lite plan service.

  • For this example, you need a Salesforce account with some data that you can retrieve. If you create a free Salesforce account to try out App Connect, make sure that you create a Developer account rather than a Trial account. If you connect to App Connect with a Trial account, the Salesforce events don't work.
  • You need the authentication details for any other applications that you want to access in your flow (if you haven't already connected App Connect to your accounts). You can connect to your accounts on the App Connect Designer Catalog page, or you can connect as you add each application to your flow.

    Some applications need some extra information to be able to connect to App Connect. If you need help finding this information, see "How to" guides for apps.

Next, create your flow

(App Connect automatically saves your changes as you go. If you move away from the flow at any point, the flow is saved as a draft flow that you can come back to later.)

  1. Create or edit the flow that you want to add a batch process to. For example, on the Designer home page, click Create an event-driven flow.
  2. Enter a name that identifies the purpose of your flow.
  3. Select your first application (source), then select an event to trigger the actions in the rest of your flow. For this example, we want to copy leads from one Salesforce account to another, and to run the flow on a specific date. In this case, we choose the Scheduler Schedule flow event from the Toolbox to start the flow.
  4. To add a batch process action to your flow, click the plus icon, click Toolbox, then select Batch process. This creates a branch off the main flow with a Batch process box, to contain all the actions that make up your batch process.
  5. Select the application that you want to extract data from, then expand the object that you want to retrieve, and select the retrieve link for that object. For this example, to retrieve Salesforce leads, select Retrieve leads.
  6. If you want to retrieve records that meet certain criteria, add one or more conditions. Alternatively, if you want to retrieve all records, delete the condition by clicking the cross to the right of the condition field.
  7. To set a maximum number of records to retrieve, select Specify maximum number of items to retrieve. You can either type in a number or click the icon to set the limit to the maximum possible number of records. Screenshot of the icon that allows you to set the limit to the maximum value
  8. To define an ID that identifies each record in the batch in the log, select Specify a log ID for each record of the batch and define the ID.

    Messages are written to the log if you include a Log node in your flow or if an error occurs during the batch process. To identify specific records that aren't processed successfully, you can define the ID that appears in the log. Make sure that the ID is unique for each record in the batch, and is a maximum of 256 characters long. (If the ID is longer than 256 characters, it is truncated in the log.) The ID can consist of fields that are mapped from the source application, free text, and JSONata expressions. For more information, see JSONata.

    For example, some text ((Main batch) ), is added to identify the main batch process in a flow. The ID also contains mappings to the Lead ID and first and last name fields.
    Figure 1. A Batch record ID set to (Main batch) Lead ID First Name Last Name
    A Batch record ID set to (Main batch) Lead ID First Name Last Name

    (Click image to view full size.)

    If you add a Log node to your flow, or errors occur during batch processing, the ID that you defined appears in the batch-record-id_str column of the log.

  9. To process the records that are retrieved, you can add more actions or logic after the retrieve action. In the Batch process box, click the plus icon, expand a target application, then select an action. All actions that you add in the Batch process box are completed for each record that has been retrieved at the beginning of the batch process. For example, you might select the Salesforce Update or create lead action to update or create a lead in a second Salesforce account.
    Figure 2. An update or create node used as an idempotent action for data copy
    The graphic shows a screenshot of the two Salesforce actions within the Batch process box.

    (Click image to view full size.)

    Remember: The Update or create lead action is a better choice than the Create lead action for this kind of data copy process because it'is idempotent and therefore reduces the chances of duplicate leads being created. You can configure Where condition fields to identify whether a record exists to be updated, or does not exist so must be created.
    Figure 3. Condition fields that you can map for the Salesforce / Update or create lead action
    Condition fields that you can map for the Salesforce / Update or create lead action.

    (Click image to view full size.)

  10. Map appropriate information between your retrieved records and your action. In the available inputs for mapping, applications that are part of the batch process are prefixed with "Batch process".
    Figure 4. Batch process Salesforce lead fields available for mapping
    The graphic shows a screenshot of the fields that you can map from the Salesforce lead.

    (Click image to view full size.)

  11. You can add one or more batch completion actions at the end of the batch process, which can complete actions based on the status of the batch process. Make sure that you add any batch completion actions after the batch completion icon in the batch process flow Batch completion icon.
    In the following example, an If node is added after the batch completion icon, with different actions that are completed depending on whether the batch process completed successfully, failed, was stopped, or timed out.
    An If node added after the batch completion icon. If the status of the batch is complete, an email is sent; if the batch failed, a different email is sent; if the batch was stopped, a spreadsheet is updated; and if the batch process timed out, a message is posted to Slack

    (Click image to view full size.)

  12. You can also add actions to the main flow, outside the batch process. These actions happen after the batch process has started, but run independently of the batch process Therefore, these actions can finish before or after the batch actions complete. For this reason, you can't map data from the batch process branch into actions in the main flow. For example, you can add a Slack Send message action that posts a message on your chosen Slack channel to tell you that a batch process has started.
    Figure 5. A Slack "Send message" action is added to the main flow after the Batch process branch
    A Slack "Send message" action is added to the main flow after the Batch process branch.

    (Click image to view full size.)

  13. When you finish defining your flow, click Start flow, then click Dashboard to exit the flow and return to the dashboard.

Examining a running batch process

On the dashboard, your flow is represented by a tile that shows whether a batch is currently running.
Figure 6. A tile on the dashboard that shows a batch in progress
A tile on the dashboard that shows a batch in progress.
To see a summary of completed and running batches for a flow, open the tile's options menu [⋮] and click View batches.
Figure 7. The Batch status dialog box
The Batch status dialog box

(Click image to view full size.)

You can use options in the Actions column to stop a running batch, or to view the logs for a batch process in the built-in log viewer or within an IBM Log Analysis with LogDNA service instance that's configured to receive service logs. For more information, see Viewing App Connect logs in the log viewer and Monitoring and managing App Connect logs in IBM Log Analysis with LogDNA.

Note: Note that batch status information is cleared when you stop the flow. Stopping a flow also stops any running batch processes. If errors occur in a batch process, and you've defined an ID for each record in the batch process, you can find your defined ID in the batch-record-id_str column of the Kibana log, and therefore identify any specific records that weren't processed successfully.

You can also use options in the Actions column to pause, then later resume, extraction activities of a running batch process. The pause and resume actions apply to extraction (retrieving records) from the source system (you cannot pause and resume the processing of those retrieved records). When you resume, the batch process continues extracting records from the point at which it was paused. If a batch process fails to extract a record, the extraction process is paused automatically. After a set time, the batch extraction automatically resumes from the point at which it was paused. This is useful if the extraction failed because of network issues, or if you hit rate limits on the application from which you extract, for example. The extraction process can be paused and resumed continuously until the process times out. You can also manually resume an automatically paused batch extraction by using the menu on the dashboard tile.