IBM Support

QRadar: HA synchronization progress resets to 0%

Troubleshooting


Problem

When doing a full Data Replication Block Device sync with high-availability (HA) in QRadar, there may be a situation that causes the synchronization progress to reset to 0%. This does not mean the synchronization has actually been reset and needs to start over. It is a temporary indicator of percentage until synchronization percentage is recalculated and it is not an indication of an actual problem.

Symptom

When monitoring the total Data Replication Block Device synchronization progress, the overall progress may have progressed to some higher value. At some point, this progress resets back to 0% and appears to start again.

Cause

Various things can cause the progress percentage to reset, such as a full deployment, lost link, or a spontaneous network outage. The important thing to note here is that a full sync does not start again from the beginning. It will pick back up where it left off.

Resolving The Problem

When you notice the progress resets back to 0% for some reason, this does not mean that the full sync started over. It should pick back up from the point where the previous sync stopped. To verify this, run cat /proc/drbd. You should see an output similar to this:
  Sun Mar 24 11:14:32 +00 2019   0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----    ns:0 nr:1089876000 dw:1089866364 dr:0 al:0 bm:0 lo:41 pe:42 ua:41 ap:0 ep:1 wo:d oos:18121311576    [=======>............] sync'ed: 43.3% (17696592/31165372)M    finish: 104:46:52 speed: 48,024 (57,796) want: 102,400 K/sec
In the above example, sync shows at 43.3% complete. The key thing to look at is the oos field. This indicates how many kilobytes are "out of sync". This translates to 17696592 MB, which is the first number after the "sync'ed" percentage. The second number translate to the total amount of kilobytes left to transfer when it begins.
In the scenario where it drops to 0%, the /proc/drbd output may look similar when it resumes:
  Sun Mar 24 11:21:03 +00 2019   0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----    ns:0 nr:153775024 dw:153770416 dr:0 al:0 bm:0 lo:21 pe:43 ua:20 ap:0 ep:1 wo:d oos:17966538772    [>....................] sync'ed: 0.9% (17545448/17694024)M    finish: 85:12:25 speed: 58,568 (55,504) want: 16,040 K/sec
A few key things to look at in this example. The oos field picked back up where the previous sync left off. If it had started over from the beginning, it would be set to 31913340928. Also, you can see the 2 numbers after the sync % have changed. It initially started this latest sync with 17694024 MB to synchronize. There is currently 17545448 MB left, so 0.9% has completed. As you can see, the progress % indicates it dropped back to 0%, but it resumed from the previous sync state and the new percentage is based off what is left to synchronize.
To see this Data Replication Block Device information from an earlier time, you can review the systemStabMon logs. These are located under /var/log/systemStabMon/YYYY/MM/DD/drbd.log. All days other than the current will be compressed as a .gz file (can be viewed with the zless command). In the drbd.log file, you can view a snapshot of this output throughout the day and track down to the timeframe when the synchronization was reset. From here, you can verify that the oos field did not reset back to the original value and that it picked back up at the state before the progress was reset.

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
06 April 2021

UID

ibm10878206