Troubleshooting
Problem
You can use the replicationVerify.pl script to validate the QRadar configuration database is synchronized across the environment. This tool verifies that the replication process is working and the databases are the same on all managed hosts.
Cause
During incremental replication, changes are replicated from the Console to the Managed Hosts every minute, while a full replication happens every 2 hours. Since data can accumulate quickly on all managed hosts, tables not being fully replicated even after a full deploy is a common issue admins can face.
Diagnosing The Problem
You can search the qradar.error logs for evidence of the message "Database is out-of-sync with the console. We will attempt to begin with a full dump next interval" on the managed host in question to get an indication whether there are replication issues on the managed host.
- SSH into the QRadar console
- SSH into the affected managed host
- Run the following command to grep the error log for database-related errors.
grep "Database is out-of-sync" /var/log/qradar.error
Result
If you see errors, this is a strong indication of replication errors. Running thereplicationVerify.pl
can help you diagnose the issue.
Resolving The Problem
The replicationVerify.pl script displays a list of tests and their results. This script gives a general idea of what is happening in the deployment with regards to replication. To get more details about what is happening, you can run the script with a details flag "-d" or the debug option "-d -d". If the script returns errors, see the Understanding the detailed output section for information on steps to address them.
- SSH into the QRadar console.
- Run the following command:
/opt/qradar/support/replicationVerify.pl
Result
The following is an example of a system with no errors:connecting to console DB Collecting list of managedhosts Gathering Console's table definitions for replicated tables Gathering Console's replication stored procedures Gathering Table Sizes of replicated tables on console checking console for Bloat [OK] comparing MH to console's replication setup x.x.x.x tests: comparing schema [OK] comparing counts [OK] comparing output of 'hostname -i' [OK] comparing Stored Procedures [OK] comparing table sizes [OK] checking for bloat [OK]
The following is an example of a system with replication issues:connecting to console DB Collecting list of managedhosts Gathering Console's table definitions for replicated tables Gathering Console's replication stored procedures Gathering Table Sizes of replicated tables on console checking console for Bloat [OK] comparing MH to console's replication setup x.x.x.x tests: comparing schema [ERROR] 1 tables with different column config between console and MH comparing counts [WARN] 1 tables with different counts between Console and MH comparing output of 'hostname -i' [ERROR] hostname not proper in /etc/hosts for 192.168.12.41 comparing Stored Procedures [ERROR] 1 differences in stored procedures comparing table sizes [WARN] 6 tables where the sizes are different checking for bloat [WARN] 1 tables with potential bloated 1 of 1 managed hosts had at least one problem with replication rerun the script with the -d option for more details on the problems use the --ip option to target the host(s) that had problems
Note: If nothing is returned, try using the option "-d -d". If you see the warning "[WARN] No managed hosts. No need to test replication", your system might not have any managed hosts set up.
Understanding the detailed output
1 | Connecting to console DB |
Connecting to the console's database. |
2 | Collecting list of managedhosts |
Getting the list of the managed hosts to test against. |
3 | Gathering Console's table definitions for replicated tables |
Collecting the list of tables involved in the replication. |
4 | Gathering Console's replication stored procedures |
Collecting the stored procedures on the console. |
5 | Gathering Table Sizes of replicated tables on console |
Collecting the tables sizes for the console. |
6 | Checking the console for Bloat [OK] |
Checking the console for bloat. |
7 | Comparing MH to console's replication setup |
Most of the work starts here. |
8 | <IP address> tests |
The host being tested. |
9 | Comparing schema |
The comparing schema test is complaining about the public.managedhost table. Next, it prints the summary of columns for both the console and the managed host. It requires a line by line comparison to see where the problem is. In this case, the id column is a bigint on the console, and it is an integer on the managed host.
This is likely caused by either a patch failing on one of the systems, or a system not patched. Verify all systems are at the same patch level. For information on verifying that systems in your deployment are properly patched to the same version, If any systems in your deployment are not at the same QRadar version, rerun the patch on those managed hosts.
|
10 | Comparing counts |
The comparing count comparison. It does a select count(*) of the table on both the console and the managed host, and it displays the table the counts for both the console and the managed host. This could be because a recent Console update has not been pushed to the managed hosts. The update will go in the next replication bundle. To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration. Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy. |
11 | [ERROR] Managed hosts state did not sync in the with the console’s TX - can not test table counts since MH never synched with console’s transaction number |
This error message means that the console and managed host were on different Transactions. The script waits for up to 60 seconds for them to be synced again. If they cannot be synchronized in 60 seconds, it times out and moves on, since it cannot do a COUNT test comparison when the Transaction IDs are not the same.
Try to rerun the script for only the troublesome host by using --ip <IP address> to see whether it can synchronize the transactions. If it still cannot get in sync, then it is possible the host is too far behind to catch-up. To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration. Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy. |
12 | Comparing the output of 'hostname -i' |
Comparing the 'hostname - i' output points out there is something wrong with the managed host with its /etc/hosts file. Clean up this file to resolve this error. |
13 | Comparing stored procedures |
The comparing test looks at the stored procedures, used for replication, on the console to see whether they are the same as the ones on the managed host. If these procedures are different, it could cause replication to stop. This is because the console could be formatting the data in one method, and the managed host is expecting it in another. To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration. Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy. |
14 | Comparing table sizes |
The comparing table size test looks at the values in q_table_size to see whether the size between the console and the managed host are close. It does a percent error calculation to determine whether the different is too great. The script, by default, alerts at anything over 100% different. |
15 | Checking for bloat |
The bloat test is used to determine whether autovacuum is not working on certain tables. It first tests to determine whether the table is bloated, and if it is bloated, then it checks to see when the last autovacuum was run. If the last autovacuum was greater than 600 seconds, then it alerts. |
16 | 1 of 1 managed hosts had at least one problem with replication |
Summary of all the tests. |
Script Options
/opt/qradar/support/replicationVerify.pl
---------------
Usage:
TEST OPTIONS:
-a | --all Run all tests (default, if no options are passed).
-b | --bloat Check all replicated tables to see if the last autovacuum was too long ago.
-c | --count Compare table counts between console and managed hosts.
-n | --hostname Check for valid IP address from "hostname -i" test.
-p | --proc Comparison of the replication stored procedures between console and managed hosts.
-s | --schema Comparison of the schema between console and managed hosts.
-z | --size Compare table sizes between console and managed hosts.
EXTRA OPTIONS:
--ip "<list>" Quoted and comma separated list of IP addresses (i.e. "10.0.0.1,192.168.10,172.16.3.4").
--pctErr <#> Percent Error. Used in conjunction with size test. (default = 100%)
--vacuumTime <#> Time in seconds. Used in conjunction with bloat test. (default = 600 seconds)
-d | --details Provides more details. Can specify -d multiple times for more information. 3 levels (details, debug, devel)
-h | --help Displays this dialog.
More details available for each test if you pass the test flag with the help flag (i.e. -h -b or -h -a)
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwsyAAA","label":"Admin Tasks"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
10 July 2023
UID
ibm11086555