Taking snapshots by using _snapshot API

You can take snapshots of your OpenSearch data by using _snapshot API.

About this task

Taking snapshots is described on the Take and restore snapshots External link opens a new window or tab page of the OpenSearch documentation. Snapshot management (SM) lets you automate taking snapshots as described on the Snapshot management External link opens a new window or tab page of the OpenSearch documentation.

Procedure

  1. Ensure that the Flink jobs are canceled.
    1. Make sure that the management service is up and running.
      kubectl wait --for=condition=Ready --timeout=-1s pod -l component=bai-insights-engine-management -n ${NAMESPACE}
    2. Retrieve the management service URL and credentials.
    3. Retrieve the list of identifiers (the values of the <jid> fields) for the Flink jobs that are currently running.
      curl -X GET -k -u ${MANAGEMENT_USERNAME}:${MANAGEMENT_PASSWORD} "${MANAGEMENT_URL}/api/v1/processing/jobs/list"
    4. For each job identifier, cancel the corresponding Flink job with creation of a savepoint. Take note of the location of each savepoint.

      To create savepoints and cancel jobs, follow steps 3 and 4 of Restarting from a checkpoint or savepoint.

    5. Remove the Flink job submitters.
      kubectl get jobs -o custom-columns=NAME:.metadata.name -n ${NAMESPACE}| grep bai- | grep -v bai-setup | xargs kubectl delete job
  2. Retrieve the OpenSearch URL and credentials.
    OPENSEARCH_URL="https://$(kubectl get routes opensearch-route -o jsonpath="{.spec.host}" -n "$NAMESPACE")"
    OPENSEARCH_USERNAME=$(kubectl get secret/opensearch-admin-user -o json -n "$NAMESPACE" | jq -r '.data|keys[0]')
    OPENSEARCH_PASSWORD=$(kubectl extract secret/opensearch-admin-user --keys="$OPENSEARCH_USERNAME" --to=- -n "$NAMESPACE" 2>/dev/null)
  3. Declare the location of the snapshot repository by using the PUT /_snapshot/<your_repository_name> request of OpenSearch API.
    The following code sample shows an example of the expected payload.
    PUT /_snapshot/my_backup
    {
      "type": "fs",
      "settings": {
        "location": "/workdir/snapshot_storage",
        "compress": true
      }
    }
    You can call the API through a command-line tool such as curl or postman.
    The following code sample shows an example in curl command:
    curl -kl -u "$OPENSEARCH_USERNAME:$OPENSEARCH_PASSWORD" -XPUT "$OPENSEARCH_URL/_snapshot/my_backup" -H "Content-Type: application/json" -d'{"type":"fs","settings":{"location": "/workdir/snapshot_storage","compress": true}}'
  4. Create the OpenSearch snapshot.
    curl -kL -u "${OPENSEARCH_USERNAME}:${OPENSEARCH_PASSWORD}" -XPUT "${OPENSEARCH_URL}/_snapshot/my_backup/backup1?wait_for_completion=true&pretty=true"
  5. Resume the Flink jobs from the savepoints that were generated at step 1.d.
    1. Edit your custom resource YAML file to specify the savepoint path for each job name with the savepoint name from the ones that were generated at step 1.d.
      bai_configuration:
         bpmn:
            recovery_path: /mnt/pv/savepoints/dba/bai-bpmn/savepoint-<savepoint-id>
         icm:
            recovery_path: /mnt/pv/savepoints/dba/bai-bpmn/savepoint-<savepoint-id>
      Important:
      Do not prefix the savepoint path with file://.
      Do not specify the savepoint path for the ODM Flink job.
      Note: If you are upgrading, stop here and continue your upgrading. See Upgrading. Otherwise, continue following this procedure.
    2. Apply the updates to the operator.
      kubectl apply -f /my_icp4a_cr.yaml --overwrite=true
  6. Scale the operator deployment back up to redeploy the Flink jobs.
    kubectl scale --replicas=<initialReplicas> deployment ibm-insights-engine-operator

Results

A snapshot with name backup1 is created in the my_backup snapshot repository.

What to do next

If you plan to restore this snapshot in another environment, you must copy the entire snapshot repository directory, not only the snap-*.dat and meta-*.dat files. A complete snapshot includes:

  • snap-*.dat — snapshot manifest
  • meta-*.dat — snapshot metadata
  • index.latest — pointer to the latest snapshot
  • index-* — global index metadata
  • indices/ — shard and segment data for all indices

All of these files and directories are required for a successful restore. If the index-* or indices/ directories are missing, the restore operation fails because the snapshot contains metadata but not the underlying index data.

To restore a snapshot in another environment:

  1. Copy the entire snapshot repository directory from the source environment.
  2. Configure the target environment to use this directory as its snapshot repository.
  3. Register the repository in OpenSearch.
  4. Run the restore operation by using the Snapshot API.