Question & Answer
Question
Answer
Client responsibilities
IBM does not warrant that its products are defect free; however, IBM does endeavor to fix its products to work as designed. It is important to note that clients play a key role in this effort.
Providing informationIBM Support is available to provide assistance and guidance, and as part of this relationship, clients need to provide IBM Support information about their systems and details about the failing components in order for IBM to quickly and accurately resolve the problem.
This information includes (but is not limited to):
- Capturing documentation at the time of a failure
- Applying a trap or trace code to a system
- Formatting the output from the trap or trace (if needed)
- Sending documentation or trace information (in hardcopy or digital copy) to the remote support center.
- For more information, see IBM Support General Guidelines and Limitations.
What is the log_scrubber.py script?
The log_scrubber.py script is a support tool designed to scrub numerous different kinds of personal identifiable information (PII) from logs. The latest version log_scrubber.py is designed to help users with strict data handling requirements.
Without logs, QRadar Support might be limited in their ability to raise issues to development or duplicate issues experienced by users. If your security policy strictly prohibits sharing logs with IBM Support, you might need to request a WebEx session with QRadar Support so we can investigate the issue. If you do not allow log uploads to your case and do not allow WebEx sessions, you might need to engage IBM Expert Labs for an onsite visit or contact your IBM Account Manager to discuss options based on your security policies.
Note: The log_scrubber.py utility replaces scrub.pl to help users sanitize their logs. Administrators with automatic updates enabled have the latest version of log_scrubber
How do I run the script?
The log_scrubber utility is located in the /opt/qradar/support directory. The usage is visible by using either the -h option flag or the long argument –help. The usage gives a synopsis on all the arguments and has examples in the epilogue, which can prove useful when first running the script.
- Help output example
usage: /opt/qradar/support/log_scrubber.py [-h] [-l] [-p PROCESS_NUMBER] [--flat] [--dryrun] [--ID ID] [--push_token] [INPUT_FILE [INPUT_FILE ...]] The tool takes a get logs tar.gz and scrubs sensitive data from it. The scrubbed file will be saved as ..scrubbed.tar.gz The tool can also be used to scrub plain text files when "--flat" flag is provided. positional arguments: INPUT_FILE file(s) to scrub, can be exactly one get logs tar.gz or a list of text files (when --flat is specified) optional arguments: -h, --help show this help message and exit -l list available PII finders in JSON (-ll for more info) -p PROCESS_NUMBER, --process PROCESS_NUMBER number of concurrent processes (1-8) --flat flag to handle text file(s) --dryrun generate map files only --ID ID pass in a comma seperated list of finder IDs to use (visible from -l) --push_token push scrubber token to MHs Examples: log_scrubber.py /store/LOGS/logs_myhostname_20221124_5af06248.tar.gz log_scrubber.py --flat example/example.txt example/example2.txt log_scrubber.py --dryrun -p 3 --flat /root/example/* --ID 0,1,2
Procedure
As the log_scrubber tool is run from the command line, you must have root access to the QRadar appliance.
- Use SSH to log in to the QRadar Console as the root user.
- Optional. If you need logs from a non-Console appliance, open an SSH session to the QRadar host.
- Type the following command: /opt/qradar/support/get_logs.sh
The script informs you that the log was created and provides the name and the location, which is always the /store/LOGS/ directory.INFO: Gathering install information... INFO: Collecting DrQ output... INFO: Collecting system files... INFO: Collecting old files... INFO: Collecting Cert metadata... INFO: Collecting accumulator information with collectGvStats.sh v1.8... INFO: Collecting deployment info with deployment_info.sh v0.7... INFO: Collecting thread dumps from running java processes... INFO: Collecting database information... INFO: Collecting rpm version information... INFO: Collecting QVM files... INFO: Fetching Salesforce information... INFO: Collecting additional qflow information... INFO: Extracting rule information... INFO: Compressing collected files... The file /store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz (53M) has been created to send to support
- To scrub the log bundle, type:
/opt/qradar/support/log_scrubber.py /store/LOGS/{logs_filename.tar.gz}
For example,/opt/qradar/support/log_scrubber.py /store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz
Summary of [/store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz] Total: 1010 files Scrubbed: 993/1010 Skipped: 17/1010 Mapping files are under [/store/ibm_support/scrub/logs_qradarconsole1_e579fe7e.tar.gz_1670856374_map] Scrubbed get_logs is saved as [/store/LOGS/logs_qradarconsole1_e579fe7e.scrubbed.tar.gz]
- A new file is created in /store/LOGS that indicates the file is scrubbed.
- Download the scrubbed log from the /store/LOGS directory.
Note: Administrators need to download and attach the scrubbed file to your case. The mapping file in /store/ibm_support is not required and only used to troubleshoot PII scrubbing issues in the tool. - Create a case with IBM Support (sign-in required).
- Complete in the required fields.
- Attach the scrubbed log file to your case.
Results
The case status updates to IBM is working as we review the issue and the attached logs. The scrubbed log file bundle remains in the /store/LOGS directory until the files are deleted by an administrator.
What PII can this script scrub?
The `-l` argument shows this information on any deployment. More PII types might be added to the script in the future. Updates to log_scrubbed are delivered through QRadar automatic updates as part of the supportability tools RPM file. If you are not running the latest version of the log_scrubber tool, you might have different PII options available.
Listing the data types
The list option flag -l alleviates this complexity and displays the PII currently known to the tool. The following output shows the PII for an initial release of the tool:
# /opt/qradar/support/log_scrubber.py -l
{"ID": 0, "Name": "IPv6"}
{"ID": 1, "Name": "IPv4"}
{"ID": 2, "Name": "Hostname"}
{"ID": 3, "Name": "Username"}
{"ID": 4, "Name": "Domain"}
Listing verbose data types
The script also accepts -ll as an argument. This argument gives verbose output on how each type of PII is found by the script, thus allowing any user to investigate whether it works for their use case. As with the previous output is in JSON format. As mentioned, the output is verbose, a recommendation would be to use a tool like `jq` to beautify the output. For example,
# /opt/qradar/support/log_scrubber.py --ID 1 -ll | jq .
{
"ID": 1,
"Name": "IPv4",
"Type": "regex",
"Patterns": [
{
"Name": "IPv4",
"Pattern": "\\b(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\.(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\.(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\.(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\b"
}
]
}
Why am I seeing a license when first running the script?
The log_scrubber utility is provided as a convenience to help users with strict PII requirements. The license states that the responsibility to uphold your data requirements is still your own.
What logs can I scrub with this tool?
By default, log_scrubber.py expects a "get_logs" archive. However, the tool can also accept a text file or list of files passed with the "–flat" parameter. For example,
/opt/qradar/support/log_scrubber.py --flat /var/log/qradar.log
Scrubbing /var/log/qradar.log.
File is scrubbed. Output is saved as /storetmp/scrub/qradar.log_1670935514
Mapping file is saved as /store/ibm_support/scrub/qradar.log_1670935514.map
/opt/qradar/support/log_scrubber.py /store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz
Scrubbing /store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz
No need to scrub PERFDATA_LOGS.tar.gz, does not contain PII.
INFO: The following files were found to be empty: ['datanode.properties', 'krb5.conf', 'message-mapping.AhnLabPolicyCenterJdbc.properties', 'message-mapping.ObserveITJdbc.properties', 'manager.log', 'host-manager.log', 'iem-cron.log', 'access.log', 'ssl_error_log', 'qflow.debug', 'xforce_scaserver_updates.20221212.txt']
WARN: Scrubbing failed for the following files, they will be omitted from the output: ['var/log/httpd/ssl_request_log.1.gz', 'var/log/messages.1.gz', 'var/log/messages.2.gz', 'var/log/qradar.old/qradar.log.1.gz', 'var/log/qradar.old/qradar.log.2.gz']
Summary of [/store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz]
Total: 1010 files
Scrubbed: 993/1010
Skipped: 17/1010
Mapping files are under [/store/ibm_support/scrub/logs_ip-128-58_20221212_e579fe7e.tar.gz_1670856374_map]
Scrubbed get_logs is saved as [/store/LOGS/logs_ip-128-58_20221212_e579fe7e.1670856374.scrubbed.tar.gz]
How do I view the output?
This section is divided into two sections as we have "Scrubbed Files" and "Mapping Files".
The scrubbed files are the scrubbed log files themselves that is uploaded to support.
The mapping files are the files that contain a dictionary of scrubbed PII and their replacements in the log file.
These files must be kept by the user.
Scrubbed Files
In the previous "–flat" example, the output was stored in /storetmp/scrub/qradar.log_1670935514. Notice the output file was stored in /storetmp/. Meaning that files are deleted within 24 hours by disk maintenance unless moved. If you need to copy log files off the deployment for support you can, then forget about it as it is deleted automatically. Viewing the file would look something like this:
# head -5 /storetmp/scrub/qradar.log_1670935514
Hostname_ID_<unique ID> OutOfMemoryMonitor[30169]: Starting out-of-memory monitoring (enabled: yes)...
Hostname_ID_<unique ID> abrtd_lockfile_watcher[30879]: /usr/bin/find: ‘/store/jheap’: No such file or directory
Hostname_ID_<unique ID> .symlinkPythonTools.sh[6714]: Running .symlinkPythonTools.sh correctly setting up the symlinks
Hostname_ID_<unique ID> .symlinkPythonTools.sh[6715]: Removing all the symlinks within the /opt/qradar/support/ directory
Hostname_ID_<unique ID> .symlinkPythonTools.sh[6718]: Cleaning up support directory
# perl -pe 's/_ID_\w+//g' /storetmp/scrub/qradar.log_1670935514 | head -5
Hostname OutOfMemoryMonitor[30169]: Starting out-of-memory monitoring (enabled: yes)...
Hostname abrtd_lockfile_watcher[30879]: /usr/bin/find: ‘/store/jheap’: No such file or directory
Hostname .symlinkPythonTools.sh[6714]: Running .symlinkPythonTools.sh correctly setting up the symlinks
Hostname .symlinkPythonTools.sh[6715]: Removing all the symlinks within the /opt/qradar/support/ directory
Hostname .symlinkPythonTools.sh[6718]: Cleaning up support directory
Mapping Files
The mapping file "/store/ibm_support/scrub/qradar.log_1670935514.map" looks like this:
# cat /store/ibm_support/scrub/qradar.log_1670935596.map
{'::ffff:127.0.0.1': 'IPv6_ID_<unique ID>', '::': 'IPv6_ID_<unique ID>', '192.168.0.1': 'IPv4_ID_<unique ID>', '127.0.0.1': 'IPv4_ID_<unique ID>', '192.168.0.1': 'IPv4_ID_<unique ID>', '255.255.255.255': 'IPv4_ID_<unique ID>', '0.0.0.0': 'IPv4_ID_<unique ID>', 'exampleAIO.lab': 'Hostname_ID_<unique ID>', 'exampleEP.lab': 'Hostname_ID_<unique ID>', 'exampleEP': 'Hostname_ID_<unique ID>'}
IPv6_ID_<unique ID> [ecs-ec.ecs-ec] [ecs-ec/EC/TCP_TO_EP:TakeFromQueue] com.ibm.si.ec.destinations.StoreForwardDestination(ecs-ec/EC/TCP_TO_EP): [WARN] [NOT:0000004000][IPv4_ID_<unique ID>/- -] [-/- -]IO
Error
grep -wPo "\'[^']*\': \'<PII hash from support>\'" <filename from support>
# grep -wPo "\'[^']*\': \'IPv4_ID_<unique ID>\'" qradar.log_1670935514
'192.168.0.1': 'IPv4_ID_<unique ID>'
You would then know that your issue is on the system with IP "192.168.0.1".
Navigating the mapping files is similar for the output from a get_logs archive. The main caveat is that each output file has its own mapping file. Meaning that there are ~1000 mapping files for each set of logs. The decision to have separate mapping files instead of one central "dictionary" file was a design decision made to improve performance. If support provides you with an ID, but no associated file a recursive grep can be used in the mapping directory.
For example,
# grep -rwPo "\'[^']*\': \'Hostname_ID_<unique ID>\'"
DB_Dumps/serverhost.20221213.sql_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
DB_Text/serverhost.20221213.txt_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
etc/httpd/conf/httpd.conf_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
etc/sysconfig/network_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
journalctlDump/ip6tables.service.log_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
...
Why is the script scrubbing some text that is not PII?
This situation is more noticeable with some deployments more than others. As seen earlier, PII such as "Domain" and "Username" are scrubbed. If, by chance, you created a user or domain called "windows" then the script will scrub every instance of "windows" in the provided logs. This situation can be troublesome for support. In this scenario, you would want to rename that user/domain. The script does provide an "–ID" argument that can be useful in this case. The IDs for each PII type are shown with "-l" as seen earlier. The "–ID" parameter can then be used to only scrub certain types of PII.
For example, if "Username" was an issue in any of the previous examples then you run:
/opt/qradar/support/log_scrubber.py --ID 0,1,2,4 /store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz
New! How to add customized values in log_scrubber.py
Scenarios exist where the default PII scrubbed by the tool do not meet all of the security use cases for administrators. To address these issues, an update to the log_scrubber.py utility adds functionality for custom strings to scrub for more values from the logs. For example, the following table describes the custom substitutions added in the QRadar log after the log_scrubber.py tool runs.
Value to scrub from logs | Replaced in logs with value |
---|---|
user-lastname | custom_scrub_1 |
email@address.com | custom_scrub_2 |
server.example.com | custom_scrub_3 |
databasename | custom_scrub_4 |
Procedure
Each time you update your custom_scrub.conf file on the Console, you can use the all_servers command to ensure the file is copied to all managed hosts in the deployment.
- Log in to the QRadar Console as the root user.
- Navigate to the /opt/qradar/support/data/log_scrubber/ directory.
Tip: To ensure the file is created on all managed hosts where you might collect logs, run the following command from the QRadar Console:/opt/qradar/support/all_servers.sh -k "/opt/qradar/support/log_scrubber.py -h"
- In a text editor, edit the custom_scrub.conf file.
- Add values one per line to the file that you want to scrub from the logs. For example,
user-lastname email@address.com server.example.com databasename
- Save your changes to custom_scrub.conf.
- To clone your updated custom_scrub.conf file to all hosts in the deployment, type:
/opt/qradar/support/all_servers.sh -p /opt/qradar/support/data/log_scrubber/custom_scrub.conf -r /opt/qradar/support/data/log_scrubber/
- To scrub the log bundle, type:
/opt/qradar/support/log_scrubber.py /store/LOGS/{logs_filename.tar.gz}
/opt/qradar/support/log_scrubber.py /store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz
Results
The logs are scrubbed and the administrator values are replaced with "custom_scrub_#" in the logs. The data replaced corresponds to the line number in the file.
Was this topic helpful?
Document Information
Modified date:
14 June 2023
UID
swg21676850