IBM Support

QRadar: Disk replication falling behind alerts on High Availability (HA) appliances

Troubleshooting


Problem

On QRadar High Availability (HA) clusters, the administrator receives repeated system notifications about disk replication falling behind or the /store partition being unavailable. A common reason for repeated notifications for disk replication falling behind or partitions unavailable can be an over burdened management interface. When the management interface is saturated with sync requests or collecting data, the following system notifications might be repeatedly displayed to the administrator:
  1. "DRBD Sentinel: Disk replication is falling behind"
  2. "Disk Sentry has detected that one or more storage partitions are not accessible"

Cause

The management interface is used for management and event traffic. When an HA pair is created in QRadar, the management interface is also used to communicate and sync data between the active and standby appliances. The volume of data to collect both events and sync data can exceed the bandwidth of the management interface. Network congestion and increased latency can cause status queries to not actively respond within the expected timeframe between active and standby hosts. When the management interface is over burdened, the system can fall behind on replication or not successfully check disk status checks, triggering unexpected system notifications.

Environment

QRadar High Availability deployments with not enough bandwidth in the management interface or without crossover interface configured.

Diagnosing The Problem

The administrators are advised to reach out to their network team to determine whether the port the server is connected is to is over leveraged.
A less accurate approach from QRadar is to review the bandwidth between the primary and secondary peers. This test can be done by loading a saved search MGMT: Bandwidth Manager from the Log Activity tab. This search displays bandwidth usage between the console and hosts.

Resolving The Problem

The administrators are advised to read the QRadar HA documentation to familiarize themselves with these deployments before you run the steps in this technical note.
The administrators can select one of the following options to resolve the network congestion:
 
  1. Configure a crossover interface to offload the Distributed Replicated Block Device traffic used to sync data between HA peers to another interface.
    Note: Adding a High Availability (HA) crossover is intended for QRadar appliances configured as HA pairs and not appliances installed on virtual machines.
  2. Reduce the traffic being ingested by the management interface by configuring other interfaces roles for events or flow ingestion. This option is best suited for individual appliances to alleviate interface congestion. For more information, see Configuring network interfaces.
  3. Increase the management interface capacity by configuring link aggregation (bonding) or migrate it to a 10Gbps interface as the management interface. This option is best suited for individual appliances to alleviate interface congestion. For more information, see Configuring bonded management interfaces.

    Result
    The management interface has more bandwidth to successfully evaluate the HA status without triggering system notifications.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtXAAQ","label":"High Availability"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
10 March 2022

UID

ibm16515876