Data rebalancing after a data node is added

When you add a data node, IBM® QRadar® rebalances the data to improve search and overall system performance.

Data rebalancing includes decompressing older data, and moving data that was on the original storage device with a target to evenly distribute it across all connected devices.

For example, your deployment has an event processor that receives 20,000 events per second (EPS). When you add data nodes, QRadar automatically distributes the events across the event processor and all data nodes that are available to it. If you add three data nodes, the event processor stores 5,000 EPS and sends 5,000 EPS to each of the attached data nodes. The event processor is still processing all of the events, but the data nodes provide more storage, indexing, and search capabilities to improve the overall performance.

How does rebalancing work?

Cluster members consist of one event processor and one or more data nodes. Data can move between any members of the cluster in any direction. Data moves between members of the cluster transactionally by hourly folders. One hour of data is the smallest block of data that moves. If any file from an hourly folder is not copied, the entire transaction is rolled back.

Rebalancing does not merge hourly folders. For example, if an hourly folder exists on the destination, rebalancing does not move data from the same hourly folder from other members of the cluster. Before rebalancing starts, the cluster determines its target. The target is the percentage of free space that rebalancing tries to achieve on all members of the cluster. The target doesn't account for absolute free space in gigabytes, it accounts only for the percentage.

When you add a data node, if you do not associate it with a processor, hourly folders are created each hour that the services are running. The hourly folders that are created on the data node do not accept data from a processor or other data nodes. Only the folders that are created between the time that you add the data node and connect it to a processor are affected. All other hourly folders move data as expected. For example, if you add a data node and connect it to a processor 48 hours later, none of the 48 hourly directories on the processor or other data nodes are accepted by the new data node. The data does not move from the source and is stored locally.

Members that have a higher percentage of free space are targets. After the cluster determines its target, the members that have a smaller percentage of free space than the target become sources. Each source connects, and pushes data, to each destination. Some components in your QRadar deployment might restart and cause the rebalancing process to fail. Rebalancing restarts itself and continues from where it failed to completion. When rebalancing restarts, it does so with a progressively increasing timeout period (5 minutes, 10 minutes, 30 minutes, and so on) to avoid too many failed attempts during full deployment or maintenance. Whole rebalancing concludes between Ariel processes on members of the cluster.

How does scattering work?

Scattering distributes incoming data from the event processor among all members of the cluster. Scattering works with events and flows and is not bound to the smallest hourly block. For example, one hour of events is scattered across all clusters into the same hourly folder.

Scattering distributes events and flows proportionally to the amount of free space in percentage on the member of the cluster. Scattering moves data sequentially to the cluster hosts in round-robin fashion according to the free space percentage.

If any errors or connectivity issues occur, scattering tries to move the data to the next member of the cluster. If it is unsuccessful, it stores data locally on the event processor so that no data is lost. Data is scattered between the ecs-ep process (source) and multiple data node processes (destinations) on the data node.

How is existing data moved between the event processor (source) and the data node (target)?

When you add a data node, QRadar calculates a target space. The target space is the amount of free space on the event processor, plus the amount of free space on the data nodes, divided by the total amount of event processors and data nodes. For example, you have one event processor and two data nodes. If the event processor has 60% free space and both data nodes have 100% free space, the target space is 86.6% (60 + 100 + 100 / 3). When the target is defined, the data is moved in one hour blocks at a time until the target space is reached (86.6% in this example) on any cluster hosts.

How is new data moved between the event processor (source) and the data node (target)?

When the initial balancing is complete, QRadar scatters new data across the event processors and data nodes, according to the amount of free space available. For example, if an event processor has 25% free space and a data node has 40% free space, the data node receives 40 events, while the EP receives 25 events until both appliances have approximately the same amount of free space.

When is balancing complete?

The balancing process is complete when all source data is processed, or when the target space constraints are reached.