IBM Support

QRadar: Reference Set Management takes a very long time load when opened, often leading to Tomcat restarting

Troubleshooting


Problem

In QRadar® 7.4.0 to 7.4.3, Reference Set Management or the Reference Data Management app takes a long time to load. This loading can take as little as 2 minutes, and as long as 30 minutes to complete. It can be accompanied with Tomcat crashing.

Symptom

The Reference Set Management Window opens, but remains blank for many minutes. Reference Data Management App opens, but Reference Sets are not viewable and showing loading indications for many minutes.
image 8164

Environment

IBM QRadar SIEM 7.4.0 to 7.4.2
IBM QRadar SIEM 7.4.3 is affected, but far less likely

Diagnosing The Problem

On the console, the file /var/log/qradar.error might contain warnings from the TxSentry process, similar to the following:
Jan 27 09:29:28 <IP address> [hostcontext.hostcontext] [<GUID>/SequentialEventDispatcher] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][<IP address>/- -] [-/- -]Found a process on host <IP address>: tomcat, pid=29335, TX age=657 secs
Jan 27 09:29:28 <IP address> [hostcontext.hostcontext] [<GUID>/SequentialEventDispatcher] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][<IP address>/- -] [-/- -]    TX on host <IP address>: pid=29335 age=657 IP=127.0.0.1 port=36360 locks=8 query='SELECT a1.key1, a1.key2, a1.data, a1.source, a1.first_seen, a1.last_seen, a1.domain_info FROM ( SELECT  rdk.key1, rdk.key2, rde.data, rde.source, rde.first_seen, rde.last_seen, rdk.domain_info   FROM reference_data_key rdk, reference_data_element rde  WHERE rdk.id = rde.rdk_id AND rdk.rd_id = $1   ORDER BY rde.first_seen, rdk.key1, rdk.key2, rde.data ) AS a1 INNER JOIN ( SELECT DISTINCT rdk.key1  FROM reference_data_key rdk, reference_data_element rde  WHERE rdk.id = rde.rdk_id AND rdk.rd_id = $2  ) AS b1 ON b1.key1 = a1.key1 ORDER BY a1.first_seen, a1.key1, a1.key2, a1.data'
Or

Jan 27 09:29:28 <IP address> [hostcontext.hostcontext] [<GUID>/SequentialEventDispatcher] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][<IP address>/- -] [-/- -]    Lock acquired on host <IP address>: rel=reference_data_element age=657 granted=t mode=AccessShareLock query='SELECT a1.key1, a1.key2, a1.data, a1.source, a1.fi'
Jan 27 09:29:28 <IP address> [hostcontext.hostcontext] [<GUID>/SequentialEventDispatcher] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][<IP address>/- -] [-/- -]    Lock acquired on host <IP address>: rel=reference_data_key_pkey age=657 granted=t mode=AccessShareLock query='SELECT a1.key1, a1.key2, a1.data, a1.source, a1.fi'

Resolving The Problem

As the problem is a combination of multiple smaller issues, there are multiple workarounds that can help improve performance and stability of Reference Sets or Tomcat.
1. Update to 7.4.3
This issue is far less pronounced with QRadar 7.4.3, with any Fix Pack installed. Often, this version alone is enough to resolve the issue without tuning changes to the configuration, thus it is recommended one update to QRadar 7.4.3 if possible.
2. Apply Time-To-Live on larger Reference Sets
Large Reference Sets contribute to the issue. When possible, reducing the number of elements in larger sets is advised. One way to achieve this reduction of element counts is to set a Time-To-Live (TTL) on the set so that elements that which not been used for a specified time period are automatically cleared from the Set. This setting depends on your use case, as some elements are needed for longer periods of time.
  1. Log in to QRadar as an administrator.
  2. Click the Admin tab.
  3. Click the Reference Set Management.
  4. Review any large reference sets (Greater than 100,000 elements)
  5. If these sets do not have a TTL set, consider setting one for a number of days based on your needs. The default setting is not to use TTL and instead "Lives Forever" is checked.
  6. Once Set, the TTL starts removing old elements in the background and system performance might gradually improve.
3. For remaining Large Reference Sets, apply Tomcat memory overrides
If setting TTL does not improve the situation or is not possible for a particular use case, the remaining larger sets can be assigned a larger cache override to help improve performance. This override works by assigning more memory to the reference sets on Tomcat startup, and reducing the time spent reading and writing from disk.
  1. Connect to the QRadar Console by using SSH as an administrator
  2. Find the Reference Data Spillover Threshold value by using the following command: grep "ReferenceData.spillover.threshold" /opt/qradar/conf/spillovercache.properties image-20220110110046-2
  3. Find the largest, most likely to be cause sets by using this command: psql -Uqradar -c "select id,name,time_to_live,current_count from reference_data order by current_count desc limit 20;"
    This command returns the ID, name, TTL value, and element count of the top 20 sets by elements. Note the names and IDs and counts of sets with large element counts. Typically the sets that are 10x of the Reference Data Spillover Threshold value are good candidates for tuning.
    image-20220110110706-4
  4. Open the file /opt/qradar/conf/spillovercache.properties by using the text editor of your choice.
  5. Add an override for the reference sets that far exceed your Reference Data Spillover Threshold, in the following format:
    RefData_<ID goes here>.spillover.threshold=<Count, rounded up>

    For the example MillionsSet Reference set, I would add the following line to the file:
    RefData_30_domain_2147483647.spillover.threshold=400000
  6. This change does not have to be done for all sets, but instead the ones that exceed the Reference Data Spillover Threshold the most. The sets with counts fewer the Reference Data Spillover Threshold value are to be ignored.
  7. Once some overrides have been applied, restart Tomcat by using
    systemctl restart tomcat
  8. See whether performance or stability is better.
4. Increasing Transaction Sentry Timeout settings.
As a temporary workaround to Tomcat instability, administrators can increase the Transaction Sentry Timeout setting in QRadar in order to allow more time for the
  1. Log in to QRadar as an administrator.
  2. Go to the Admin tab.
  3. Click the System Settings.
  4. Select Advanced
  5. Under Transaction Sentry Settings, increase Transaction Max Time Limit to maximum value 30 minutes.
  6. Run a Deploy Full Configuration at your next convenience.
image-20220110104632-1
 




If the provided steps do not significantly help with the issue, it is recommended to open a case with IBM Support. To open a case, click the link: Open a case
In order to speed up resolution, it is recommended to provide the following information to the case:
1. The QRadar log files. Information on collecting the QRadar log files can be found here
2. The threadTop, pg_stat, and qlocks.out outputs. To collect these outputs, run the following command from the console:
mkdir -r /store/ibmsupport/refdump; for i in {1..40}; do (date; /opt/qradar/support/threadTop.sh -p 7779 --full)>> /store/ibmsupport/refdump/tomcat.out; (date; psql -U qradar -c "select * from q_locks" )>> /store/ibmsupport/refdump/qlocks.out; (date; psql -U qradar -c "select * from pg_stat_activity where state='active'")>> /store/ibmsupport/refdump/pg_stat.out; sleep 5 ; done
This command takes about 5 minutes to run and generates 3 files to /store/ibmsupport/refdump/
tomcat.out, qlocks.out, and pg_stat.out
Provide these files to the new case.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtiAAA","label":"Performance"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.4.0;7.4.1;7.4.2;7.4.3"}]

Document Information

Modified date:
31 January 2022

UID

ibm16412231