Using the events scroll operation to export analytics data

Use the /events/scroll operation to export 10,000 or more analytics event records with the REST API.

The standard method to retrieve analytics event data with the API Connect Analytics REST API is using the /events API call. The /events API call is optimized for fast retrieval of the most recent event data, and is limited to returning a maximum of 10,000 API event records. If you want to retrieve more than 10,000 event records, then use the /events/scroll API, which is optimized for retrieval of large sets of analytics event data.

When you query your analytics data with /events/scroll, OpenSearch finds all the event records that match your query and creates a pointer that is called the scroll context. You then use the scroll context to retrieve the event data in batches.

Important: The creation of the results set and scroll context uses a lot of OpenSearch resources. It is recommended to not run multiple different /events/scroll queries simultaneously, and to delete the scroll context after your event data is retrieved.

Exporting analytics events with /events/scroll

The /events/scroll operation returns event data in batches. You define the batch size in the first call to /events/scroll, and then retrieve each batch in subsequent calls. You must make each subsequent call to /events/scroll within a certain time, called the scroll context keep-alive time. The procedure is as follows:

  1. Make the first /events/scroll call. Specify the batch size and the keep-alive time for the scroll operation. For example, to return event data in batches of 1000, and to keep the scroll context alive for 10 minutes, then POST the following JSON to /events/scroll:
    {
      "size": 1000, # Return the first 1000 event records.
      "scroll": 10m # Keep the scroll context alive for 10 minutes.
    }
    • size defines the batch size.
    • scroll defines the keep-alive time and uses the format <number><units>, for example: 30s, 5m, 3h, 1d. The maximum value for scroll is 1d.
    Example response:
    {
        "total": 5926, # The total events found that match the query.
        "scroll_id": "FGl....", # Scroll context id to be used to get the next batch.
        "events": [...] # First batch of event data.
    }
  2. Make the next /events/scroll call, specifying the scroll_id from the previous response:
    {
      "size": 1000,
      "scroll_id": "<as returned from previous call>",
      "scroll": 10m 
    }
    The API returns the next batch of 1000 event records, and a new scroll_id for the next call.
    Note: If your scroll_id is expired, the following response is returned:
    {
      "status": 404,
      "message": [
        {
          "trace": "7c7b3b11ee0e95b61452e8a78086d8e2",
          "errors": [
            {
              "code": "search_context_missing_exception",
              "message": "No search context found for id [48420]",
              "more_info": ""
            }
          ]
        }
      ]
    }

    If your scroll_id expires, then you must start again. Set a higher keep-alive time for the scroll context to allow more time for event record batches to be returned and the next /events/scroll call to be made.

  3. Continue making /events/scroll calls, updating the scroll_id each time with the response from the previous call.
  4. After the last call to retrieve your event data, delete the scroll context with a POST to /events/scroll/delete:
    {"scroll_id": "<as returned from previous call>"}
    Example response:
    {
        "succeeded": true,
        "num_freed": 15
    }
    Note: The scroll context is automatically deleted when it expires, but if you have a long keep-alive time, then it is recommended to delete it explicitly to release resources.

Worked example

The following example shows how you can use the curl command to manually retrieve your API event data by using the /events/scroll API.
Note: For large amounts of analytics data, manual retrieval with curl is slow and prone to human error. A better approach is to write a script to make the /events/scroll calls and update the scroll_id each call. See https://ibm.biz/apic-analytics-events-scroll for an example Python script.
  1. Get a count of total event records to decide the optimal batch size and number of calls to make:
    curl -k -X GET --url 'https://example.api.connect.com/analytics/analytics/cloud/events/count' -H 'Authorization: Bearer <bearer_token>'
    {
        "total": 12453
    }

    For 12453 events, a batch size of 1250, and a total of 10 calls might be a good balance between size of output and number of calls.

  2. Make the first request, specifying the batch size of 1250, and a scroll context keep-alive time of 5 minutes:
    curl -k -X POST -d '{"size": "1250", "scroll": "5m"}' --url 'https://example.api.connect.com/analytics/analytics/cloud/events/scroll' -H 'Content-Type: application/json' -H 'Authorization: Bearer <bearer_token>'
    Returns:
    {
        "total": 12453, # The total events found that match the query.
        "scroll_id": "<scroll_id>",
        "events": [...] # First batch of 1250 events.
    }
  3. Make the second request, specifying the scroll_id returned from the previous request:
    curl -k -X POST -d '{"size": "1250", "scroll": "5m", "scroll_id": "<scroll_id from previous response>"}' --url 'https://example.api.connect.com/analytics/analytics/cloud/events/scroll' -H 'Content-Type: application/json' -H 'Authorization: Bearer <bearer_token>'
  4. Repeat the request 8 more times, updating the scroll_id with the output from the previous request each time.

    Because total events are 12453 and batch size is 1250, the last request returns just 1203 events.

    Any subsequent requests that you make with the scroll_id return an empty events array.

  5. Delete the scroll context:
    curl -k -X POST -d '{"scroll_id": "<scroll_id from previous response>"}' --url 'https://example.api.connect.com/analytics/analytics/cloud/events/scroll/delete' -H 'Content-Type: application/json' -H 'Authorization: Bearer <bearer_token>'

Performance comparison of /events/scroll and /events

The following tables show performance comparisons for querying analytics event data by using scripted calls. Times are the total time to retrieve all event records in seconds. The /events API cannot retrieve more than 10000 events (nor batches of 1000 or more), so these fields are marked as n/a.
Note: Response times on your analytics subsystem might vary according to the size of your API event records, total number of records, and your available hardware resources.
Table 1. Fetching 1000 event records (5.3 Mb)
Batch size Number of calls /events time /events/scroll time
100 10 7 9
500 2 3 4
1000 1 n/a 3
2000 1 n/a 5
Table 2. Fetching 10,000 event records (26 Mb)
Batch size Number of calls /events time /events/scroll time
100 100 67 75
500 20 27 32
1000 10 n/a 24
2000 5 n/a 20
Table 3. Fetching 100,000 event records (265 Mb)
Batch size Number of calls /events time /events/scroll time
100 1000 n/a 732
500 200 n/a 299
1000 100 n/a 258
2000 50 n/a 196
Table 4. Fetching 760,265 event records (2 GB)
Batch size Number of calls /events time /events/scroll time
100 7603 n/a 5647
500 1521 n/a 2423
1000 761 n/a 2158
2000 381 n/a 1919