QRadar EDR On Prem: Backup Failure Occurring During Backup of Elasticsearch and Cassandra Datastores in On‑Premises QRadar EDR

Troubleshooting

Problem

In some on‑premises QRadar EDR environments, issues may occur during the execution of a backup.

Symptom

While the backups for PostgreSQL and the QRadar EDR datastore complete successfully, the process can stop stops during the Elasticsearch and Cassandra datastore backup.

Environment

QRadar EDR On-premise

Diagnosing The Problem

If the following error log entry are displayed, it is possible that the Elasticsearch backup has failed.

Example output:

TASK [backup : backup elasticsearch indices for elasticsearch instance reaqta-hive-opensearch] ***

If logs such as those shown below are output, it is possible that the Cassandra backup has failed.

Example output:

Operation UNLOAD_20251205-100928-468036 completed with 41 errors

Resolving The Problem

Workaround 1:
During backup operations, temporary spikes in resource consumption may trigger specific issues and cause processes to be forcibly terminated. In such cases, administrators should consider the following actions.

• Increase the CPU and memory resource limits to at least double the current values
• Allocate a minimum of 800Gi of backup storage for Cassandra

Steps:

■ Increasing CPU and memory limits

Use the following command to increase resource limits, then rerun the backup to confirm whether it completes successfully.

oc exec deploy/cp-serviceability -- /opt/bin/modify_deployment 
-k default 
-a backup-restore 
-r requests.cpu:250m 
-r requests.memory:500Mi 
-r limits.cpu:2000m 
-r limits.memory:8Gi 
--token=$(oc whoami -t)

This method is effective when "OOMKilled" issues are occurring.

■ Modifying the backup PVC size
To modify the size of the backup PVC, refer to the following documentation: Persistent volume storage sizing
For reference, details on modifying resource specifications can be found here: Modifying QRadar Suite Software deployment resource specifications

■ Verifying backup integrity
After applying the above settings, perform the following steps to confirm that the backup completes successfully.

Run a standard Cassandra backup
```
oc exec cp4s-backup-restore-668b9c6486-zp27z -- /opt/bin/backup-cp4s --generate-aes-key -b cassandra
```
This allows you to verify that the full Cassandra backup completes successfully.
Check the data size in the backup folder
```
oc exec deployment/cp4s-backup-restore -- sh -c 'du --summarize --human-readable /opt/data/backup/cassandra/*'
```
This allows you to verify the actual data stored and determine whether the backup is incomplete.
If all required resources have been secured and the backup procedure has been executed correctly, but the backup still fails with any of the following behaviors within the OpenShift cluster, please proceed to Workaround 2.

• Process exits with error code 137.
• Some Cassandra nodes fail on specific rows and the process stops with an error.
• "No search context found" errors appear.
• The pod fails the liveness/readiness checks and restarts (visible in pod events via oc describe pod).

Workaround 2:
■ Adjust livenessProbe failureThreshold
By patching the backup-restore pod’s liveness probe and increasing the failureThreshold to 30, you can reduce the likelihood of OpenShift forcibly terminating the pod during backup operations.

Steps:

Run the following command:

oc patch --namespace edr deployment cp4s-backup-restore --type='json' --patch='[{"op": "replace", "path": "/spec/template/spec/containers/0/livenessProbe/failureThreshold", "value": 30}]'

After running the backup, revert the change:

oc patch --namespace edr deployment cp4s-backup-restore --type='json' --patch='[{"op": "replace", "path": "/spec/template/spec/containers/0/livenessProbe/failureThreshold", "value": 3}]'

This method is effective when the backup fails due to liveness probe failures.

If the backup still fails after applying the above steps, proceed to Workaround 3.

Workaround 3:
■ Snapshot Backup and Restore Procedures (Cassandra / OpenSearch)

Pre-check steps:
This method requires StorageClass snapshot support. Please use the following steps to verify that your StorageClass supports snapshots.
The simplest method is to check via the OpenShift console (UI).

Steps:

From the left menu, navigate to "Storage" > "PersistentVolumeClaims"
Select the PVC you intend to back up (e.g., reaqta-hive-opensearch-all-data-000)
From the Actions dropdown, select "Create snapshot"
On the Create VolumeSnapshot page, check whether a Snapshot Class (e.g., csi-vsphere-vsc) appears in the dropdown list
If a Snapshot Class is available, the script can create VolumeSnapshots for the following PVCs:
• reaqta-hive-opensearch-all-data
• data-cassandra-reaqta-hive-cassandra

This workaround provides the PVC backup and restore procedure using the snapshot-manager.sh script. This script must be executed in a Linux environment using bash, and it performs several validations at startup. You must know the StorageClass used by each PVC and the SnapshotClass used to create snapshots.
Usage examples and step-by-step instructions are provided below.

Prerequisites:
・The administrator must be logged in to the cluster from which you intend to take the backup.
・Use a Linux (bash) environment with the OpenShift CLI (oc) installed, and ensure you have the required permissions.

Steps:

Please click the snapshot-manager.sh link below, copy the code, and create the script manually.
Steps:
a. Create an empty text file named "snapshot-manager.sh".
b. Click the snapshot-manager.sh link and copy the code.
c. Paste the code into the empty file you created and save it.
d. Use the chmod command to make the snapshot-manager.sh script executable.
Check the StorageClass of the PVC
Identify the StorageClass used by each PVC you want to back up.
Command:
```
oc get pvc <PVC name> -o jsonpath='{.spec.storageClassName}'
```
Example:
```
oc get pvc default-cluster-1 -o jsonpath='{.spec.storageClassName}'
```
Example output:
```
rook-ceph-block
```
Use this value as the STORAGE_CLASS environment variable.
Check the VolumeSnapshotClass
Identify which VolumeSnapshotClass is available.
Command:
```
oc get volumesnapshotclass -o jsonpath='{.items[*].metadata.name}'
```
Example output:
```
csi-rbdplugin-snapclass
```
Use this value as the SNAPSHOT_CLASS environment variable.
Prepare to run the script
a) Set the required storage-related environment variables
b) Execute the script
Command:
```
STORAGE_CLASS=rook-ceph-block SNAPSHOT_CLASS=csi-rbdplugin-snapclass ./scripts/snapshot-manager.sh help
```
Displays the help menu and available subcommands.

Available commands

Usage:
./scripts/snapshot-manager.sh backup                            #Create offline snapshots of all PVCs
./scripts/snapshot-manager.sh restore <timestamp>               #Restores the PVCs using the specified snapshots
./scripts/snapshot-manager.sh export <timestamp> <directory>    #Export snapshots to local tar.gz files
./scripts/snapshot-manager.sh import <directory>                #Import snapshots from local tar.gz files
./scripts/snapshot-manager.sh list                              #List available snapshots
./scripts/snapshot-manager.sh delete-single                     #Delete a specific backup set
./scripts/snapshot-manager.sh delete-old <keep>                 #Delete old snapshots, keeping N most recent (default: 3)
./scripts/snapshot-manager.sh help                              #Show this help message
Examples:
./scripts/snapshot-manager.sh backup
./scripts/snapshot-manager.sh restore 20260216-153000
./scripts/snapshot-manager.sh export 20260216-153000                       # Export to ./snapshots-20260216-153000
./scripts/snapshot-manager.sh export 20260216-153000 /backup/snapshots     # Export to custom directory
./scripts/snapshot-manager.sh import ./snapshots-20260216-153000           # Import from directory
./scripts/snapshot-manager.sh list
./scripts/snapshot-manager.sh delete-single 20260216-153000
./scripts/snapshot-manager.sh delete-old                                   # Keep 3 most recent backups
./scripts/snapshot-manager.sh delete-old 5                                 # Keep 5 most recent backups

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSS3Y22","label":"IBM Security QRadar Suite - EDR"},"ARM Category":[{"code":"a8m3p000000hBSZAA2","label":"Agent-\u003EInstallation-\u003ELinux"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Product Synonym

ReaQta; QRadar EDR

Was this topic helpful?

Document Information

Modified date:
26 March 2026

UID

ibm17263053

Tips

QRadar EDR On Prem: Backup Failure Occurring During Backup of Elasticsearch and Cassandra Datastores in On‑Premises QRadar EDR

Troubleshooting

Problem

Symptom

Environment

Diagnosing The Problem

Resolving The Problem

Document Location

Product Synonym

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?