Using the events scroll operation to export analytics data
Use the /events/scroll
operation to export 10,000 or more analytics
event records with the REST API.
The standard method to retrieve analytics event data with the API Connect
Analytics REST API is using the /events
API call. The
/events
API call is optimized for fast retrieval of the most recent event data, and
is limited to returning a maximum of 10,000 API event records. If you want to retrieve more than
10,000 event records, then use the /events/scroll
API, which is optimized for
retrieval of large sets of analytics event data.
When you query your analytics data with /events/scroll
, OpenSearch finds all the event
records that match your query and creates a pointer that is called the scroll context. You then use
the scroll context to retrieve the event data in batches.
/events/scroll
queries simultaneously,
and to delete the scroll context after your event data is retrieved.Exporting analytics events with /events/scroll
The /events/scroll
operation returns event data in batches. You define the batch
size in the first call to /events/scroll
, and then retrieve each batch in
subsequent calls. You must make each subsequent call to /events/scroll
within a
certain time, called the scroll context keep-alive time. The procedure is as follows:
- Make the first
/events/scroll
call. Specify the batch size and the keep-alive time for the scroll operation. For example, to return event data in batches of 1000, and to keep the scroll context alive for 10 minutes, then POST the following JSON to/events/scroll
:{ "size": 1000, # Return the first 1000 event records. "scroll": 10m # Keep the scroll context alive for 10 minutes. }
size
defines the batch size.scroll
defines the keep-alive time and uses the format <number><units>, for example: 30s, 5m, 3h, 1d. The maximum value forscroll
is 1d.
Example response:{ "total": 5926, # The total events found that match the query. "scroll_id": "FGl....", # Scroll context id to be used to get the next batch. "events": [...] # First batch of event data. }
- Make the next
/events/scroll
call, specifying thescroll_id
from the previous response:{ "size": 1000, "scroll_id": "<as returned from previous call>", "scroll": 10m }
The API returns the next batch of 1000 event records, and a newscroll_id
for the next call.Note: If yourscroll_id
is expired, the following response is returned:{ "status": 404, "message": [ { "trace": "7c7b3b11ee0e95b61452e8a78086d8e2", "errors": [ { "code": "search_context_missing_exception", "message": "No search context found for id [48420]", "more_info": "" } ] } ] }
If your
scroll_id
expires, then you must start again. Set a higher keep-alive time for the scroll context to allow more time for event record batches to be returned and the next/events/scroll
call to be made. - Continue making
/events/scroll
calls, updating thescroll_id
each time with the response from the previous call. - After the last call to retrieve your event data, delete the scroll context with a POST to
/events/scroll/delete
:{"scroll_id": "<as returned from previous call>"}
Example response:{ "succeeded": true, "num_freed": 15 }
Note: The scroll context is automatically deleted when it expires, but if you have a long keep-alive time, then it is recommended to delete it explicitly to release resources.
Worked example
/events/scroll
API./events/scroll
calls and
update the scroll_id
each call. See https://ibm.biz/apic-analytics-events-scroll for an example Python script.- Get a count of total event records to decide the optimal batch size and number of calls to make:
curl -k -X GET --url 'https://example.api.connect.com/analytics/analytics/cloud/events/count' -H 'Authorization: Bearer <bearer_token>'
{ "total": 12453 }
For 12453 events, a batch size of 1250, and a total of 10 calls might be a good balance between size of output and number of calls.
- Make the first request, specifying the batch size of 1250, and a scroll context keep-alive time
of 5
minutes:
curl -k -X POST -d '{"size": "1250", "scroll": "5m"}' --url 'https://example.api.connect.com/analytics/analytics/cloud/events/scroll' -H 'Content-Type: application/json' -H 'Authorization: Bearer <bearer_token>'
Returns:{ "total": 12453, # The total events found that match the query. "scroll_id": "<scroll_id>", "events": [...] # First batch of 1250 events. }
- Make the second request, specifying the
scroll_id
returned from the previous request:curl -k -X POST -d '{"size": "1250", "scroll": "5m", "scroll_id": "<scroll_id from previous response>"}' --url 'https://example.api.connect.com/analytics/analytics/cloud/events/scroll' -H 'Content-Type: application/json' -H 'Authorization: Bearer <bearer_token>'
- Repeat the request 8 more times, updating the
scroll_id
with the output from the previous request each time.Because total events are 12453 and batch size is 1250, the last request returns just 1203 events.
Any subsequent requests that you make with the
scroll_id
return an empty events array. - Delete the scroll
context:
curl -k -X POST -d '{"scroll_id": "<scroll_id from previous response>"}' --url 'https://example.api.connect.com/analytics/analytics/cloud/events/scroll/delete' -H 'Content-Type: application/json' -H 'Authorization: Bearer <bearer_token>'
Performance comparison of /events/scroll
and
/events
/events
API cannot retrieve more than 10000 events (nor batches of 1000 or more),
so these fields are marked as n/a.Batch size | Number of calls |
/events time |
/events/scroll time |
---|---|---|---|
100 | 10 | 7 | 9 |
500 | 2 | 3 | 4 |
1000 | 1 | n/a | 3 |
2000 | 1 | n/a | 5 |
Batch size | Number of calls | /events time |
/events/scroll time |
---|---|---|---|
100 | 100 | 67 | 75 |
500 | 20 | 27 | 32 |
1000 | 10 | n/a | 24 |
2000 | 5 | n/a | 20 |
Batch size | Number of calls | /events time |
/events/scroll time |
---|---|---|---|
100 | 1000 | n/a | 732 |
500 | 200 | n/a | 299 |
1000 | 100 | n/a | 258 |
2000 | 50 | n/a | 196 |
Batch size | Number of calls | /events time |
/events/scroll time |
---|---|---|---|
100 | 7603 | n/a | 5647 |
500 | 1521 | n/a | 2423 |
1000 | 761 | n/a | 2158 |
2000 | 381 | n/a | 1919 |