Premigrating Business Automation Insights data to OpenSearch

Premigration is an optional step that is recommended for migrating a high volume of data. Premigration reduces Business Automation Insights downtime as it can be run before you upgrade a CP4BA deployment. Premigration can be run while a CP4BA deployment is running.

Before you begin

Make sure that you completed steps 1 to 6 in Installing OpenSearch and migrating Elasticsearch data.

About this task

It is recommended to test document migration in a batch of between 100,000 and 500,000. Then use this insight to calculate how long it should take to migrate all of your data given the throughput of your test. Depending on the time, you can decide whether you want to split the premigration of data into multiple batches with the premigration procedures. The time that it takes to complete the migration is related to the volume of data and the performance of the resources (CPU, memory, network, disk).

You have three strategies to choose from. The choice of which strategy to take depends on the evaluated migration time and whether the duration exceeds the acceptable downtime for Business Automation Insights.
Strategy 1: Migration without premigration

If document volumes are low, all indexes can be migrated without running the premigration procedures before Business Automation Insights is shut down.

Strategy 2: Migration with premigration in one batch
If data volumes allow for the container to process it in one batch, you can migrate your immutable documents in one go by running the es-os-migration-script.sh script on all premigration indexes at once. Shut down Business Automation Insights, and then migrate the remaining documents.
Strategy 3: Migration with premigration in multiple batches
 If data volumes are large, it is necessary to split into multiple migration executions by using the premigration procedures. You can migrate your immutable documents in chunks by running multiple batches of the es-os-migration-script.sh script on the premigration indexes. Shut down Business Automation Insights, and then migrate the remaining documents.
Note: Running the premigration procedures with multiple batches gives the chance to back up the migrated data, and take maintenance actions on the cluster between cycles.

The following diagram shows the three strategies that you can choose. It highlights the volume of Elasticsearch (ES) immutable documents that can be transferred to OpenSearch (OS) in green while event processing remains online (premigration). All three strategies require Business Automation Insights to be shut down to be able to migrate the mutable documents offline, as indicated by the red arrow. Strategy 1 migrates all the documents while Business Automation Insights is offline

Migration strategies 1 to 3
The following table shows the options to run the es-os-migration-script.sh script and its description.
Options Description
-dryrun List of indexes of Elasticsearch and displays the dry-run steps.
-doc_count List all indexes of Elasticsearch and OpenSearch with document count, and exit.
-include=<comma separated indices> List of indexes to include.
-exclude=<comma separated indices> List of indexes to exclude.
-include_regex=<regex pattern> List of regex patterns to include indexes.
-exclude_regex=<regex pattern> List of regex patterns to exclude indexes.
-startdate=<start date> Start date for data migration (format: 'YYYY-MM-DDTHH:MM:SS')
-enddate=<end date> End date for data migration (format: 'YYYY-MM-DDTHH:MM:SS')
-timestamp_key=<name> Key for date values (default: 'timestamp')
-delete Delete OpenSearch indexes
logfile Optional: Log file to save the migration summary.
--help Displays usage details.

Procedure

  1. Run the following command to display the list of indexes available for premigration. If you plan to do the premigration then you must ensure that all premigration indexes are migrated for documents up until the premigration end time.
    curl -X GET -u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD} --insecure "${ELASTICSEARCH_URL}/_cat/indices?v&s=docs.count:desc,index" | grep -v -e "store" -e "pfs" -e "active"
  2. Set the premigration end time to a time before the start of the premigration. You might want to pick 00:00 on the prior day to avoid any time zone mistakes.
    export PREMIGRATION_END_DATE=2024-06-30T00:00:00
  3. It is recommended to test the migration on one or a few indexes whose document count adds up to the range of 100,000 to 500,000.
    1. Change the directory to the migration folder under cert-kubernetes/scripts.
      cd cpfs/migration
    2. Run the following command.
      ./es-os-migration-script.sh -enddate=$PREMIGRATION_END_DATE -include=<index1>,<index2>

      Where <index1>, <index2> are the list of indexes from step 1 whose documents add up to between 100,000 and 500,000.

      While the script runs, record the time that it takes. Also, monitor the resource usage of your elasticsearch-es-data and opensearch-ib pods along with the associated PVs.

      If you see any bottlenecks, you might need to increase the resources. Depending on the infrastructure, it is possible to see document migration times in the range of 500-1000 documents a second.

      Important:  If the migration of an index or indexes fails to complete, you can use the following command to remove the indexes from OpenSearch. You can also use this command if you want to run the same test again with a minor change.
      ./es-os-migration-script.sh -delete -include=<index1>,<index2>

       Where <index1>,<index2> are the indexes you want to remove from OpenSearch.

  4. If you decide not to skip premigration, use one of these options to perform premigration depending on whether you need to migrate data in multiple batches.
    • Option 1: Run the following command if you do not want to migrate data in multiple batches. The following command is used to migrate all premigration index data.
      ./es-os-migration-script.sh -enddate=$PREMIGRATION_END_DATE -exclude_regex=active,icp4ba-bai-store,icp4ba-pfs@
      Note: You can use the -dryrun option to see what indexes it is going to touch with the migration command.
    • Option 2: Use the following method, if you need to break up the migration into multiple batches. It helps to know how data is distributed in time.
      1. Select an end date in the past that sufficiently reduces the size of the documents that are migrated and export the variable.
        The following command is an example.
        export end_date=2023-06-30T00:00:00
      2. Run the migration script with that end date.
        ./es-os-migration-script.sh -enddate=$end_date -exclude_regex=active,icp4ba-bai-store,icp4ba-pfs@
      3. Set a new start and end date based on how you are partitioning the documents for migration.
        The following command is an example.
        export start_date=$end_date
        export end_date=2024-01-01T00:00:00
      4. Run the migration script for the new time block.
        ./es-os-migration-script.sh -startdate=$start_date -enddate=$end_date -exclude_regex=active,icp4ba-bai-store,icp4ba-pfs@
      5. Repeat steps c and d depending on how you are partitioning the data for migration until you reach the $PREMIGRATION_END_DATE.
      6. Run the following command to complete the premigration.
        ./es-os-migration-script.sh -startdate=$end_date -enddate=$PREMIGRATION_END_DATE -exclude_regex=active,icp4ba-bai-store,icp4ba-pfs@
  5. Run the following command to view the indexes that are migrated into OpenSearch.
    curl -X GET -u ${OPENSEARCH_USERNAME}:${OPENSEARCH_PASSWORD} --insecure "${OPENSEARCH_URL}/_cat/indices?v&s=docs.count:desc,index"

    Ensure that all premigration indexes with documents before $PREMIGRATION_END_DATE are migrated to OpenSearch.

What to do next

After you checked that all the premigration indexes with documents before $PREMIGRATION_END_DATE are migrated to OpenSearch, continue with the steps in Installing OpenSearch and migrating Elasticsearch data.