IBM Support

QRadar: Troubleshooting Deploy Changes from the command line

Troubleshooting


Problem

This article is intended to help customers monitor and troubleshoot their deployment issues.

Symptom

Deploys can report "Timed Out" but continue in the background and finish successfully
Networking bandwidth and disk space issues can also affect deployments.

Cause

  • Timeout
  • Performance
  • Disk space issues
  • Service issues 
  • Bandwidth
  • Tunnels or connection issues

Resolving The Problem

There are two types of deployments administrators can complete in the user interface:

  • Admin tab > "Deploy Changes"
    "Deploy Changes" is an incremental deployment that sends administrative changes to the managed hosts in the QRadar deployment and does not impact core services
  • Admin tab > Advanced > "Deploy Full Configuration"
    "Deploy Full Configuration" rebuilds the full configuration and restarts services on each managed host.
    NOTE: some businesses require a change request, or have policies and procedures before you process a "Deploy Full Configuration", such as notifying users.

The process for monitoring the Deployment from the command line is the same for both.

Monitoring the logs

After you select deployment from the web UI, monitor from the logs:
tail -f /var/log/qradar.log | grep -i deploy
The logs can help determine where the "deploys" are failing or why they are timing out.

Files Generated During Deployment

The log files show the deployment "Initiating" and say which .zip files are being created.  Sample messages are shown.
Deploy: Global Set Builder is creating Zip file zipfile_GEN.full.zip, fullDeploy:true, firstTime:false
Deploy: Global Set Builder is creating Zip file zipfile_QVM.full.zip, fullDeploy:true, firstTime:false, qvmFile:false
The deployment files are created in /store/configservices/configurationsets/:
ls -tail /store/configservices/configurationsets/
total 509256
    585562 drwxr-xr-x  2 nobody nobody      4096 Oct 28 07:21 .
    585548 -rw-r--r--  1 nobody nobody        69 Oct 28 07:21 x.xxx.xxx.x.deploymentToken.txt
   1631905 -rw-r--r--  1 root   root          64 Oct 28 07:21 x.xxx.xxx.x_zipfile_GEN.full.zip.chk
    696725 -rw-r--r--  1 root   root          64 Oct 28 07:21 x.xxx.xxx.x_zipfile_QVM.full.zip.chk
   1631906 -rw-r--r--  1 root   root          64 Oct 28 07:21 x.xxx.xxx.x_zipfile_QVM.zip.chk
   1631904 -rw-r--r--  1 root   root          64 Oct 28 07:21 x.xxx.xxx.x_zipfile_GEN.zip.chk
    260428 -rw-r--r--  1 nobody nobody      1682 Oct 28 07:17 globalset_list.xml
    260427 -rw-r--r--  1 nobody nobody        64 Oct 28 07:17 zipfile_QVM.full.zip.chk
    260426 -rw-r--r--  1 nobody nobody        22 Oct 28 07:17 zipfile_QVM.full.zip
    260425 -rw-r--r--  1 nobody nobody        64 Oct 28 07:17 zipfile_GEN.full.zip.chk
    260423 -rw-r--r--  1 nobody nobody 260705034 Oct 28 07:17 zipfile_GEN.full.zip
    714988 -rw-r--r--  1 nobody nobody        64 Oct 28 07:17 zipfile_QVM.zip.chk
    993314 -rw-r--r--  1 nobody nobody        22 Oct 28 07:17 zipfile_QVM.zip
    993313 -rw-r--r--  1 nobody nobody        64 Oct 28 07:17 zipfile_GEN.zip.chk
    993312 -rw-r--r--  1 nobody nobody 260705034 Oct 28 07:17 zipfile_GEN.zip
    

Monitor the files on the managed host. The files increase in size until they match the size on the Console.
globalset_list.xml Contains the deployment token and an entry for each of the hosts that require a deployment.
zipfile_* Contains the files to be deployed. The relevant files are copied out to each of the managed hosts.
zipfile*.chk Contains the sha256sum of the generated .zip file.
IP*.chk Created for each managed host. Used to ensure integrity during transfer.
*.deploymentToken.txt     Contains the deployment token from the globalset_list.xml.

Watching the Progress of the Deployment

During the deployment, a status file is created for the Console and each of the Managed Hosts.
ls -tail /store/tmp/status/deploy* && watch -n 2 "more /storetmp/status/deploy* | cat | sed 's/:::::::::::::://' | sed '/^$/d'"
Ensure that the date and time stamp is the current time. The size of this file indicates the progress of the deployment for each host. The following status codes are equivalent to the following status message:
  • 21 = Initiating Deployment
  • 11 = In Progress
  •  7 = Success
  •  9 = Timed Out
  •  5 = Error
The size of the file is the number of characters the file contains. Additionally, these files can also be read with:
cat /store/tmp/status/deploy*

Check the Deployment on the Managed Host

The Managed Host checks the Console for a deployment request every 10 seconds.  Once it finds the request, it does the following.
  • Downloads the globalset_list.xml from the Console to /store/configservices/configurationsets directory.
  • Checks if the Deployment Token from the globalset_list.xml file matches /store/configservices/configurationsets/<IP>.deploymentToken.txt where <IP> is the private IP address of the host.
  • Downloads either the incremental or full Global Set archive (.zip files) depending on the deployment type.
  • If the host contains the QVM processor component, QVM files are downloaded.

Check the configuration and database change downloads by:
  1. On the Console, note the date, time, and size of the deployment files.
    ls -tail /store/configservices/configurationsets/
  2. Open a session to a Managed Host:
    ssh <MH_IP>
    Where <MH_IP> is your hosts IP address. Ensure you are not prompted for a password or get an ECDSA message at the login prompt. On the Managed Hosts, monitor the files that are copied across.
  3. Monitor files growth:
    ls -tail /store/configservices/configurationsets/
  4. Compare with the size of the files on the Console. If the files stop growing, then the issue is likely due to networking issues.
  5. Once the files are the same size as the Console, the deployment completes and changes to Success status.
  6. If the Managed Host was previously reporting timeout, it now displays "Success". On the Console, check the status of the progress file in /store/tmp/status. A size of 7 indicates success.
    ls -tail /store/tmp/status/deployment*

Hostcontext

The hostcontext service shows various scripts that are run while the deployment is in progress. If hostcontext is not stable or replication is failing, deployment can fail. Check how long hostcontext was active. If it shows a few minutes, monitor to ensure the time increases by running systemctl command a few times. If it shows failed, there is an issue. 
    systemctl status hostcontext
● hostcontext.service - hostcontext daemon
    Loaded: loaded (/usr/lib/systemd/system/hostcontext.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/hostcontext.service.d
            └─timeout.conf, ulimit.conf
    Active: active (running) since Sun 2022-10-30 01:01:15 GMT; 1 weeks 5 days ago     <<<<<========
  Main PID: 37360 (java)
    Tasks: 229
    Memory: 17.5G
    CGroup: /system.slice/hostcontext.service
            ├─19560 /bin/sh /opt/qradar/bin/check_sar.sh 5 /store/tmp/sar_report.1668163030432
            ├─19564 sar -S -d -p -r -u -q -I SUM -n DEV -n EDEV 5 1
            ├─19565 grep -v drbd
            ├─19566 grep -E -v ^([0-9]{2}:[0-9]{2}:[0-9]{2})\s+(AM|PM)\s+(rhel|rootrhel|storerhel|docker)
            ├─19567 iostat -p -m -x -y 5 1
            ├─19568 grep -v -E ^drbd
            ├─19569 grep -v -E ^dm-
            ├─19571 sadc 5 2 -z -S 768
            └─37360 /bin/java -Dapplication.name=hostcontext -Dapp_id=hostcontext -Djava.library.path=/opt/qradar/lib -Dapplication.baseURL=file:///opt/qradar/...

Preparing incremental database dump as transaction 0000000000000043026
Replication incremental transaction for 3 relations, 0 JMS messages: Duration: 1169 ms
Preparing incremental database dump as transaction 0000000000000043027
Replication incremental transaction for 2 relations, 0 JMS messages: Duration: 1177 ms
Preparing incremental database dump as transaction 0000000000000043028
Replication incremental transaction for 2 relations, 0 JMS messages: Duration: 1201 ms
Preparing incremental database dump as transaction 0000000000000043029
Replication incremental transaction for 2 relations, 0 JMS messages: Duration: 1251 ms
Preparing incremental database dump as transaction 0000000000000043030    <<<<<========      
Replication incremental transaction for 2 relations, 0 JMS messages: Duration: 1187 ms
    
Look at the "Replication incremental database dumps". These files are downloaded and applied every minute. Check hostcontext on the Console also and compare the transaction versions. 

Bandwidth Test

Deployments fail when there is insufficient bandwidth between the Console and the Managed Host.

To test the bandwidth, create a 1GB file on the Console.
fallocate -l 1G /store/1gbfile
Copy it to the Managed Host and wait for it to complete:
scp /store/1gbfile <MH_IP>:/store/
1gbfile                                                      100% 1024MB  93.1MB/s   00:11
The bandwidth in the example is 93.1MBs. Refer to Bandwidth for managed hosts or supported bandwidth.

Tomcat connection

Each of the managed hosts needs to be able to talk to Tomcat on the console.
To check this connection:
/opt/qradar/bin/test_tomcat_connection.sh
Starting up...
Connected to tomcat
If test_tomcat_connection.sh is unable to connect, check hostcontext and host tokens.

Disk Space

If there are space issues, deployments fail.
In the logs, messages that relate to "critical disk space" are visible.
[hostcontext.hostcontext] [ConfigChangeObserver Timer[1]] com.q1labs.configservices.util.ConfigServicesUtil: [ERROR] [NOT:0000003000][/- -] [-/- -]Deployment is blocked due to critical disk space issue
[hostcontext.hostcontext] [ConfigChangeObserver Timer[1]] com.q1labs.hostcontext.configuration.ConfigChangeObserver: [INFO] [NOT:0000006000][/- -] [-/- -]Setting deployment status to Error
Check the diskSpace on all the servers:
/opt/qradar/support/all_servers.sh -Ck "df -Th"
In addition, QRadar 101 Community site, Disk Space 101, has more information.

Performance

During the deployment, it can be useful to monitor the performance of the system and identify any bottlenecks. Some useful commands are top, iotop, and sar. The sar command gives block device IO Activity.  The -p option shows in "pretty" format and gives the device name. Without the -p, the block device names are displayed by using the major and minor numbers.
sar -pd 1 5
Linux 3.10.0-1160.71.1.el7.x86_64 (q1csdesx-250.uk.ibm.com)     11/11/2022      _x86_64_        (40 CPU)
09:15:49 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
09:15:50 AM       sda      3.00      0.00     56.00     18.67      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-root      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-storetmp      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-tmp      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-home      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-opt      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-varlogaudit      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-varlog      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM rootrhel-var      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM storerhel-transient      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:15:50 AM storerhel-store      3.00      0.00     80.00     26.67      0.00      0.00      0.00      0.00
The top command is also useful.
top - 09:21:33 up 13 days, 19:43,  1 user,  load average: 1.15, 1.12, 1.25
Tasks: 867 total,   1 running, 865 sleeping,   0 stopped,   1 zombie
%Cpu(s):   0.8/0.4     1[                                                                                                    ]
KiB Mem : 35.1/13182942+[|||||||||||||||||||||||||||||||||||                                                                 ]
KiB Swap:  0.0/25165820 [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  3866 root      20   0  199524  85744   1452 S   8.9  0.1 827:54.35 /bin/bash --login /opt/qradar/perf/systemStabMon -interval 23
22285 root      20   0       0      0      0 Z   5.6  0.0   0:00.17 [date] <defunct>
  8419 root      10 -10   37.5g   8.7g  16616 S   3.0  6.9  80:09.82 /bin/java -Dapplication.name=ecs-ep -Dapp_id=ecs-ep -Djava.library.path=/opt/qradar/lib -Da+
32196 root       0 -20   30.5g   5.2g  16308 S   3.0  4.2  39:55.04 /bin/java -Dapplication.name=ecs-ec -Dapp_id=ecs-ec -Djava.library.path=/opt/qradar/lib -Da+
21826 root      20   0  163212   3668   1964 R   1.3  0.0   0:00.22 top
33294 root       0 -20   24.9g   2.4g  17164 S   1.3  1.9 284:46.84 /bin/java -Dapplication.name=ecs-ec-ingress -Dapp_id=ecs-ec-ingress -Djava.library.path=/op+
37360 root      20   0   17.7g 521116  16672 S   1.3  0.4   1339:24 /bin/java -Dapplication.name=hostcontext -Dapp_id=hostcontext -Djava.library.path=/opt/qrad+
22292 root      20   0  162856   3148   1820 S   1.0  0.0   0:00.03 top -b -n 1
  3998 root      20   0  111908   8308   4480 S   0.7  0.0  29:36.27 /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
  6394 postgres  20   0  256184   3568    668 S   0.7  0.0  44:29.15 postgres: stats collector
19427 qvmuser   39  19   11.4g 610648  16328 S   0.7  0.5  40:12.90 /bin/java -classpath .:/opt/qradar/conf:/opt/qvm/console/meta:/opt/qvm/console/conf:/opt/qr+
    9 root      20   0       0      0      0 S   0.3  0.0  91:36.24 [rcu_sched]
  1153 root      20   0       0      0      0 S   0.3  0.0   6:19.85 [xfsaild/dm-1]
  1283 root      20   0       0      0      0 S   0.3  0.0   4:48.29 [xfsaild/dm-5]
19754 postgres  20   0  561672   3344   1452 S   0.3  0.0  14:07.47 postgres: autovacuum launcher
22578 root      20   0 3538720  77376  26324 S   0.3  0.1  25:57.97 /usr/bin/dockerd
28930 nobody    20   0  218428  51840   6684 S   0.3  0.0  26:05.46 /usr/bin/python3.6 /usr/bin/celery worker -A app.celery_worker.config -Q celery --loglevel=+
    1 root      20   0  192240   5308   2644 S   0.0  0.0  65:57.50 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
    2 root      20   0       0      0      0 S   0.0  0.0   0:01.55 [kthreadd]
    4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/0:0H]
    6 root      20   0       0      0      0 S   0.0  0.0   1:03.19 [ksoftirqd/0]
    7 root      rt   0       0      0      0 S   0.0  0.0   0:05.82 [migration/0]
The following key sequences and be used to display CPU and Memory activity in a status bar format:
ctmt c - cpu 
t - toggle 
m - memory 
t - toggle 
For online help use h.
Help for Interactive Commands - procps-ng version 3.3.10
Window 1:Def: Cumulative mode Off.  System: Delay 3.0 secs; Secure mode Off.

  Z,B,E,e   Global: 'Z' colors; 'B' bold; 'E'/'e' summary/task memory scale
  l,t,m     Toggle Summary: 'l' load avg; 't' task/cpu stats; 'm' memory info
  0,1,2,3,I Toggle: '0' zeros; '1/2/3' cpus or numa node views; 'I' Irix mode
  f,F,X     Fields: 'f'/'F' add/remove/order/sort; 'X' increase fixed-width

  L,&,<,> . Locate: 'L'/'&' find/again; Move sort column: '<'/'>' left/right
  R,H,V,J . Toggle: 'R' Sort; 'H' Threads; 'V' Forest view; 'J' Num justify
  c,i,S,j . Toggle: 'c' Cmd name/line; 'i' Idle; 'S' Time; 'j' Str justify
  x,y     . Toggle highlights: 'x' sort field; 'y' running tasks
  z,b     . Toggle: 'z' color/mono; 'b' bold/reverse (only if 'x' or 'y')
  u,U,o,O . Filter by: 'u'/'U' effective/any user; 'o'/'O' other criteria
  n,#,^O  . Set: 'n'/'#' max tasks displayed; Show: Ctrl+'O' other filter(s)
  C,...   . Toggle scroll coordinates msg for: up,down,left,right,home,end

  k,r       Manipulate tasks: 'k' kill; 'r' renice
  d or s    Set update interval
  W,Y       Write configuration file 'W'; Inspect other output 'Y'
  q         Quit
          ( commands shown with '.' require a visible task display window )
Press 'h' or '?' for help with Windows,
Type 'q' or <Esc> to continue
On VM systems the "st" parameter (end of 3rd line) can indicate issues with underlying VM resources. The Steal Time (st) indicates the amount of CPU 'stolen' from the virtual machine by the hypervisor for other tasks.
top - 09:55:31 up 1 day, 17:17,  2 users,  load average: 0.73, 0.92, 1.11
Tasks: 758 total,   4 running, 754 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.0 us,  3.8 sy,  0.1 ni, 83.1 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 65806276 total, 16518864 free, 25496820 used, 23790592 buff/cache
KiB Swap: 25165820 total, 25165820 free,        0 used. 36488324 avail Mem
The iotop command is useful to see read/write activity. The online help provides more information.
iotop -h

Usage: /usr/sbin/iotop [OPTIONS]

DISK READ and DISK WRITE are the block I/O bandwidth used during the sampling
period. SWAPIN and IO are the percentages of time the thread spent respectively
while swapping in and waiting on I/O more generally. PRIO is the I/O priority at
which the thread is running (set using the ionice command).

Controls: left and right arrows to change the sorting column, r to invert the
sorting order, o to toggle the --only option, p to toggle the --processes
option, a to toggle the --accumulated option, i to change I/O priority, q to
quit, any other key to force a refresh.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -o, --only            only show processes or threads actually doing I/O
  -b, --batch           non-interactive mode
  -n NUM, --iter=NUM    number of iterations before ending [infinite]
  -d SEC, --delay=SEC   delay between iterations [1 second]
  -p PID, --pid=PID     processes/threads to monitor [all]
  -u USER, --user=USER  users to monitor [all]
  -P, --processes       only show processes, not all threads
  -a, --accumulated     show accumulated I/O instead of bandwidth
  -k, --kilobytes       use kilobytes instead of a human friendly unit
  -t, --time            add a timestamp on each line (implies --batch)
  -q, --quiet           suppress some lines of header (implies --batch)
The -o option shows processes that are currently performing IO.
iotop -o
Total DISK READ :       0.00 B/s | Total DISK WRITE :     336.89 K/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:    1200.93 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
24327 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.08 % [kworker/2:3]
33527 be/4 postgres    0.00 B/s   12.48 K/s  0.00 %  0.01 % postgres: fusionvm fusionvm 127.0.0.1(55058) idle
19753 be/4 postgres    0.00 B/s    6.24 K/s  0.00 %  0.01 % postgres: walwriter
15905 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % java -Dapplication.name=hostcontext -Dapp_id=hostc~.jar:/opt/qradar/jars/guice-jmx- [pool-19-thread-]
15760 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % java -Dapplication.name=hostcontext -Dapp_id=hostc~.jar:/opt/qradar/jars/guice-jmx- [pool-9-thread-2]
19755 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.00 % postgres: stats collector
  3998 be/4 root        0.00 B/s    3.12 K/s  0.00 %  0.00 % syslog-ng -F -p /var/run/syslogd.pid
13368 rt/2 root        0.00 B/s   15.60 K/s  0.00 %  0.00 % java -Dapplication.name=ecs-ep -Dapp_id=ecs-ep -Dj~tgnosis ecs-ep.ecs 220 noconsole [Ariel Writer#ev]
  6394 be/4 postgres    0.00 B/s    0.00 B/s  0.00 %  0.00 % postgres: stats collector
  1361 be/3 root        0.00 B/s    3.12 K/s  0.00 %  0.00 % auditd
Many more performance commands and utilities are available.

Gathering Log Files

When you open a case for failed deployments, and include a date and time stamp for the time frame of the deployment.
Support can focus on the relevant section of logs, by using the time frames.
  1. Obtain the system time and date:
    date
  2. In the UI, start a deployment: Admin - Deploye Changes Note: to perform a partial deployment: "Admin" > "Deploy Changes"
    To perform a full deployment: "Admin" > "Advanced" > "Deploy Full Configuration"
    There is a difference between the two types of deployments.
  3. Once the deployment finishes, take another date and time stamp:
    date
  4. Open a support case, include the start and end time for the deployment, and include fresh logs from the Console as well as any Managed Hosts with deployment issues.

Related Information

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtNAAQ","label":"Deployment"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
03 June 2024

UID

ibm16832804