IBM Support

QRadar: Validate the configuration database is sychnonized with replicationVerify.pl

Troubleshooting


Problem

How can you validate the QRadar configuration database is synchronized across the environment? The replicationVerify.pl script verifies the replication process is working, and verifies the databases are the same on all managed hosts.
Before you begin
Incremental replication happens from the Console to the Managed Hosts every minute as changes occur. A full replication happens every 2 hours. Since data can accumulate quickly on all managed hosts, it is not uncommon for tables to not fully replicate before you use replicationVerify.pl, even after Deploy Full Configuration completes. This script is intended for use as a guide to your replication process.
We also recommend parsing the qradar.error logs for evidence of the message "Database is out-of-sync with the console. We will attempt to begin with a full dump next interval" on the managed host in question. We typically see this message accompanying replication issues on managed hosts.

Environment

The replicationVerify.pl script displays a list of tests and their results. This script gives a general idea of what is happening in the deployment with regards to replication.  To get more details about what is happening, you can run the script with a details flag.  You can use the -d flag multiple times depending on whether you want details or debug output.
Example:
-d  = detail
-d -d = debug
Note: A list of flags is listed in Resolving the issue.

Resolving The Problem

  1. Log in to the Console by using the root account via SSH connection.
  2. Run the following command:
    /opt/qradar/support/replicationVerify.pl

Example:
Good system
/opt/qradar/support/replicationVerify.pl
connecting to console DB
Collecting list of managedhosts
Gathering Console's table definitions for replicated tables
Gathering Console's replication stored procedures
Gathering Table Sizes of replicated tables on console
checking console for Bloat [OK]
comparing MH to console's replication setup
192.168.12.41 tests:
    comparing schema [OK]
    comparing counts [OK] comparing output of 'hostname -i' [OK]
    comparing Stored Procedures [OK]
    comparing table sizes [OK]
    checking for bloat [OK]
Bad system
/opt/qradar/support/replicationVerify.pl
connecting to console DB
Collecting list of managedhosts
Gathering Console's table definitions for replicated tables
Gathering Console's replication stored procedures
Gathering Table Sizes of replicated tables on console
checking console for Bloat [OK]
comparing MH to console's replication setup
192.168.12.41 tests:
    comparing schema [ERROR] 1 tables with different column config between console and MH
    comparing counts [WARN] 1 tables with different counts between Console and MH
    comparing output of 'hostname -i' [ERROR] hostname not proper in /etc/hosts for 192.168.12.41
    comparing Stored Procedures [ERROR] 1 differences in stored procedures
    comparing table sizes [WARN] 6 tables where the sizes are different
    checking for bloat [WARN] 1 tables with potential bloated 1 of 1 managed hosts had at least one problem with replication rerun the script with the -d option for more details on the problems use the --ip option to target the host(s) that had problems

Understanding the detailed output

1 Connecting to console DB Connecting to the console's database.
2 Collecting list of managedhosts Getting the list of the managed hosts to test against.
3 Gathering Console's table definitions for replicated tables Collecting the list of tables involved in the replication.
4 Gathering Console's replication stored procedures Collecting the stored procedures on the console.
5 Gathering Table Sizes of replicated tables on console Collecting the tables sizes for the console.
6 Checking the console for Bloat [OK] Checking the console for bloat.
7 Comparing MH to console's replication setup Most of the work starts here.
8 <IP address> tests The host being tested.
9 Comparing schema
[ERROR] columns for public.managedhost  do not match between MH and Console
[ERROR] console columns:
public managedhost
id NO bigint
ip NO character varying
hostname YES character varying
status NO character varying
isconsole NO boolean
appliancetype YES character varying
creationdate YES timestamp without time zone
updatedate YES timestamp without time zone
qradar_version NO character varying
primary_host YES bigint
secondary_host YES bigint
haoptions YES character varying
[ERROR] mh columns:
public managedhost
id NO integer
ip NO character varying
hostname YES character varying
status NO character varying
isconsole NO boolean
appliancetype YES character varying
creationdate YES timestamp without time zone
updatedate YES timestamp without time zone
qradar_version NO character varying
primary_host YES bigint
secondary_host YES bigint
haoptions YES character varying
[ERROR] 1 table with different column config between console and MH
The comparing schema test is complaining about the public.managedhost table. Next, it prints the summary of columns for both the console and the managed host. It requires a line by line comparison to see where the problem is. In this case, the id column is a bigint on the console, and it is an integer on the managed host.
This is likely caused by either a patch failing on one of the systems, or a system not patched. Verify all systems are at the same patch level.

For information on verifying that systems in your deployment are properly patched to the same version,
If any systems in your deployment are not at the same QRadar version, rerun the patch on those managed hosts.
10 Comparing counts
[ERROR] asset.asset property Count is different console=60000 mh=60303
[WARN] 1 table with different counts between Console and MH
The comparing count comparison. It basically does a select count(*) of the table on both the console and the managed host, and it displays the table the counts for both the console and the managed host.
This could be because a recent Console update has not been pushed to the managed hosts. The update will go in the next replication bundle.

To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration.

Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy.
11 [ERROR] Managed hosts state did not sync in the with the console’s TX - can not test table counts since MH never synched with console’s transaction number
This error message means that the console and managed host were on different Transactions. The script waits for up to 60 seconds for them to be synced again. If they cannot be synchronized in 60 seconds, it times out and moves on, since it cannot do a COUNT test comparison when they Transaction IDs are not the same.
Try to rerun the script for just the troublesome host by using --ip <IP address> to see whether it can synchronize the transactions. If it still cannot get in sync, then it is possible the host is too far behind to catch-up.

To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration.

Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy.
12 Comparing the output of 'hostname -i'
[ERROR] 'hostname -i' doesn't return the proper value. returned 192.168.12.41 192.168.12.41, expecting 192.168.12.41
[ERROR] hostname not proper in /etc/hosts for 192.168.12.41
Comparing the 'hostname - i' output points out there is something wrong with the managed host with its /etc/hosts file. Clean up this file to resolve this error.
13 Comparing stored procedures
[ERROR] Stored Procedure replicate_fake_proc, p_relname,p_schemaname,p_threshold,p_build_triggers, 25 25 1186 16 are different between console and mh
[ERROR] 1 difference in stored procedures
The comparing test looks at the stored procedures, used for replication, on the console to see whether they are the same as the ones on the managed host. If these procedures are different, it could cause replication to stop. This is because the console could be formatting the data in one method, and the managed host is expecting it in another.

To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration.

Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy.
 
14 Comparing table sizes
[WARN] asset.asset Size is different (Console=4.00 MB| MH=11.01 MB) percent Error = 175.20%
[WARN] asset.assetproperty Size is different (Console=18.26 MB| MH=39.53 MB) percent Error = 116.52%
[WARN] asset.vulninstancestatistics Size is different (Console=81.39 MB| MH=164.74 MB) percent Error = 102.40%
[WARN] public.dsmevent Size is different (Console=51.99 MB| MH=104.00 MB) percent Error = 100.03%
[WARN] public.vuln Size is different (Console=26.09 MB| MH=52.22 MB) percent Error = 100.15%
[WARN] q_catalog.productversionvariant Size is different (Console=4.90 MB| MH=9.82 MB) percent Error = 100.16%
[WARN] 6 tables where the sizes are different
The comparing table size test looks at the values in q_table_size to see whether the size between the console and the managed host are close. It does a percent error calculation to determine whether the different is too great. The script, by default, alerts at anything over 100% different.
15 Checking for bloat
[WARN] asset needs autovacuum (last autovacuum: 2017-06-08 08:36:46.452545-04)
[WARN] 1 table with potential bloated
The bloat test is used to determine whether autovacuum is not working on certain tables. It first tests to determine whether the table is bloated, and if it is bloated, then it checks to see when the last autovacuum was run. If the last autovacuum was greater than 600 seconds, then it alerts.
16 1 of 1 managed hosts had at least one problem with replication Summary of all the tests.
Script Options
# /opt/qradar/support/replicationVerify.pl -h
/opt/qradar/support/replicationVerify.pl
---------------
Usage:
        TEST OPTIONS:
        -a | --all              Run all tests (default, if no options are passed).
        -b | --bloat            Check all replicated tables to see if the last autovacuum was too long ago.
        -c | --count            Compare table counts between console and managed hosts.
        -n | --hostname         Check for valid IP address from "hostname -i" test.
        -p | --proc             Comparison of the replication stored procedures between console and managed hosts.
        -s | --schema           Comparison of the schema between console and managed hosts.
        -z | --size             Compare table sizes between console and managed hosts.
        EXTRA OPTIONS:
             --ip "<list>"      Quoted and comma separated list of IP addresses (i.e. "10.0.0.1,192.168.10,172.16.3.4").
             --pctErr <#>       Percent Error.  Used in conjunction with size test. (default = 100%)
             --vacuumTime <#>   Time in seconds.  Used in conjunction with bloat test.  (default = 600 seconds)
        -d | --details          Provides more details. Can specify -d multiple times for more information. 3 levels (details, debug, devel)
        -h | --help             Displays this dialog.
        More details available for each test if you pass the test flag with the help flag (i.e. -h -b or -h -a)

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"Component":"Postgres database","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.2;7.3","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
15 November 2022

UID

ibm11086555