Using Data Privacy for Diagnostics

Data Privacy for Diagnostics Analyzer provides the capability to post process the following memory dump types that are taken on a z15® or later processor:
  • SVC
  • Stand-alone
  • SLIP
  • SYSMDUMP (from V2.5)
  • Transaction (from V2.5)

Post processing is used to redact pages that have been tagged as being sensitive by the applications that created the pages, as well as untagged pages that will be scanned and detected as containing sensitive data per the Data Privacy for Diagnostics Analyzer, which requires a minimum of IBM® 64-bit SDK for z/OS® Java™ Technology Edition version 8.0. This redacted version of the original memory dump is written to a new memory dump data set without modifying the original memory dump data set. Retain both memory dumps for as long as it takes to diagnose the reported problem.

Append Dump Directory records (BLSADDIR) are removed when generating a redacted stand-alone memory dump. Additional processing is required for stand-alone memory dumps that contain captured memory dumps. If the captured memory dumps are required by vendors, the memory dumps must first be extracted (IPCS COPYCAPD) from the original stand-alone memory dump, then processed separately. Do not depend on captured memory dumps being available within a redacted stand-alone memory dump.

Note: A stand-alone memory dump can contain one or more SVC memory dumps that are captured in memory, but were not written to a data set. It is recommended that you extract these SVC dumps by using IPCS COPYCAPD, if captured on z15 or later processors, and post-process them before sending them to IBM for further analysis to ensure that sensitive data is properly protected.

The following functions are provided:

REDACT
You can redact any data that is tagged as sensitive=yes without further analysis.
Note: You cannot perform the ANALYZE function on a memory dump that has already been redacted by this process.
You can request this processing by using either:
  • IPCS option 5.6, specifying the ANALYZE function and BYPASS DP ANALYSIS=Y
  • Use sample job SYS1.SAMPLIB(BLSJDPFD)
ANALYZE
Any pages tagged sensitive by the applications that own that data as well as any untagged pages detected as containing sensitive data.
Note: You cannot perform the ANALYZE function on a memory dump that has already been redacted by this process.
You can request this processing by using either:
  • IPCS option 5.6, specifying the ANALYZE function and BYPASS DP ANALYSIS=N
  • Use sample job SYS1.SAMPLIB(BLSJDPA).
REPORT
You can create human readable reports for a memory dump that has been processed by the Data Privacy for Diagnostics Analyzer. These reports, once created, are in the directory/reports/dump-name/run-number directory in the file system that is used for DPA processing. You can request this processing by using either:
  • IPCS option 5.6, specifying the REPORT function.
  • Use sample job SYS1.SAMPLIB(BLSJDPR).
FEEDBACK
You may provide feedback for a memory dump that has been processed by the Data Privacy for Diagnostics Analyzer. After looking through the reports and understanding the pages that have or have not been flagged as sensitive, you can provide feedback to help the Data Privacy for Diagnostics Analyzer improve its sensitive data detection. More information is covered on providing feedback later in this chapter. After updating configuration files and indicating what tagging can be improved, you can request this processing by using either:
  • IPCS option 5.6, specifying the FEEDBACK function
  • Use sample job SYS1.SAMPLIB(BLSJDPF).
INGEST
You may ingest data to help the Data Privacy for Diagnostics Analyzer determine what sensitive data exists in your environment. Data can be ingested from dictionaries, databases, or other sources. This data is added to the knowledge base information and will be used in future analysis runs. More information is covered on providing ingested data later in this chapter. After updating configuration files and indicating what tagging can be improved, you can request this processing by using either:
  • IPCS option 5.6, specifying the INGEST function
  • Use sample job SYS1.SAMPLIB(BLSJDPI).
EXTRACT
You can extract any built-in or custom identifiers from the Analyzer to a file so that the user may see the exact criteria for determining the sensitivity of the data with the ANALYZE function. The output file contains either the pattern or entire dictionary depending on the type of identifier to help ensure that the Data Privacy for Diagnostics Analyzer is correctly marking data as sensitive or nonsensitive. More information is covered on extracting identifiers later in this chapter. After updating configuration files and indicating which identifiers can be written to a file, you can request this processing by using either:
  • IPCS option 5.6, specifying the EXTRACT function
  • Use sample job SYS1.SAMPLIB(BLSJDPX).

Generally, you want to start by performing the ANALYZE function on a memory dump. This function works only on memory dumps captured on a z15 or later processor. After creating the redacted version of the memory dump, you will want to check the memory dump to understand what has been redacted. Reports are available to help you understand why pages have been redacted. You can look at these reports to see whether the data has been properly identified as sensitive. Some reports are written in concise form and must be formatted by using the REPORT function. After running the REPORT function, you may want to give feedback to Data Privacy for Diagnostics Analyzer regarding some of the data that it either found as sensitive but was not sensitive, or feedback on data that was sensitive but not detected as sensitive. The FEEDBACK function allows you to perform this task. The cycle of ANALYZE / REPORT / FEEDBACK provides a way to train the Data Privacy for Diagnostics processing in order to produce memory dumps with the right level of redaction for your environment.

Another function that can be used is the INGEST function. This allows you to import data from databases and files, and lets you create custom information that can be used by the Data Privacy for Diagnostics Analyzer processing to help identify sensitive data.Start of change The creation of custom identifiers that are tailored to an installation's data privacy requirements is imperative to attain the most accurate redaction of SPI (or other sensitive information); far surpassing the redaction with using the generic built-in identifiers provided with the Analyzer.End of change

In order to display the exact criteria that the ANALYZE function is using to determine data sensitivity, one might use the EXTRACT function to write out any built-in or custom identifiers to a file such that when that particular identifier is requested in the ANALYZE configuration, the user knows exactly which tokens or what pattern will be used to mark data as sensitive or nonsensitive.

Figure 1. Data Privacy for Diagnostics Usage Cycle

Using the Data Privacy for Diagnostics Analyzer Dialog within IPCS

When IPCS is used, panels are presented to allow you to specify parameters required for processing. The dialog generates appropriate JCL based on the parameters provided. If any data sets are required but not preallocated, the dialog attempts to dynamically allocate them. If dynamic allocation fails for any reason, you should be able to preallocate data sets by using other mechanisms (such as ISPF option 3.2).

Note: Not all parameters are present on all IPCS panels for each function.

The parameters that are specified on the IPCS Data Privacy for Diagnostics Analyzer panels are:

DATA SET NAME
The input memory dump data set name. This option is equivalent to the input_dataset parameter in the JCL submitted to perform the requested function.
NEW DATA SET NAME
The output (redacted) memory dump data set name. This option is equivalent to the output-dump-dataset field in the JCL submitted to perform the ANALYZE function.
TEMP DATA SET/PAT
Temporary data set names can either be a specific name or a data set name pattern. For more information on patterns, see the help pages. This option is equivalent to the output_dataset or output_dataset_prefix parameters in the JCL submitted to perform the requested function.
BYPASS DP ANALYSIS
Allows you to submit a job that will either perform analysis (N) or skip analysis (Y). If N is specified, the Data Privacy for Diagnostics Analyzer step scans the input data set looking for additional sensitive data in addition to data identified by the applications that allocated the storage marked as sensitive. If found, either token-level or page level redaction is performed based on the Allow Page Level specification. If Y is specified, this step is bypassed. The output data set identified by the NEW DATA SET NAME field will only remove data that is identified by the applications that allocated the storage marked as sensitive.
REDACTION STRING
If you are not allowing page level redaction, this redaction string is used to overlay data that is determined to be sensitive in the output memory dump. You may leave this field blank to overlay the token with X or specify a string. When longer strings are detected in the pages, the string is used in a repeated fashion. If shorter strings are found, only a portion of the redaction string may be used. This option is equivalent to the redaction_string parameter in the JCL submitted to perform the requested function.
NUMBER OF THREADS
For ANALYZE requests, large memory dumps may be processed faster by using multi-threading. You may specify 1 to 8 for the number of threads. Each thread requested processes a portion of the input memory dump, reducing the elapsed time that it takes to process the entire memory dump, however, it may also increase the simultaneous amount of resources that are required to process the request. This option is equivalent to the thread_count parameter in the JCL submitted to perform the requested function.
ALLOW PAGE LEVEL
If Y is specified, known as fast-analysis mode or page-level redaction, the entire page of storage is redacted when any sensitive data is detected. Page-level redaction may allow the analysis processing to run faster since processing stops at the first sensitive string in a page is found, however, it is possible that allowing page-level redaction may cause diagnostic data to be lost. If you find this to be true, set the value to N, known as detailed analysis mode or token-level redaction, so that data that is determined to be sensitive will be overlaid by using only the redaction string. The default value is N or token-level redaction.
SENSITIVE REPORT
If Y is specified, reports are generated in directory/reports/dump-name/runnumber/sensitive_token_log_n where n is the thread number. There will be a file per thread requested. For each string detected, data is written to these files to help you understand what has been redacted and why. Based on this information, you may decide to include or exclude types of data. When the REPORT function is requested, it consolidates these sensitive_token_log_n files into a human-readable file named sensitive_tokens.
DPfD HOME DIR
Specify the path where the Data Privacy for Diagnostics Analyzer home directory is configured, directory as previously described. Do not include the trailing '/' when specifying this path.
JAVA HOME DIR
Specify the path where Java is installed. This is used in the batch job's STDENV set up file to create the proper environment for the Java processing to run in. Do not include the trailing '/' when specifying this path. Data Privacy for Diagnostics requires a minimum of IBM 64-bit SDK for z/OS Java Technology Edition version 8.0.
JAVA OPTIONS
You may provide whatever Java options are wanted. For example, you may need to specify a minimum and maximum heap size for the JVM to successfully run a multi-threaded DPfD ANALYZE request. Using the default setup with only built-in identifiers, each thread requires approximately 512 MB to successfully load data for the run. Requesting additional threads or including additional identifiers increase the size of the heap for the JVM, so use the -Xms and -Xmx options to adjust the minimum and maximum heap size. For more information about JVM Command-Line Options, see the topic OpenJ9 command-line options in IBM SDK, Java Technology Edition 8.0.0

Data Privacy for Diagnostics requires a minimum of IBM 64-bit SDK for z/OS Java Technology Edition version 8.0.

Start of changeIf you are using IBM Semeru Runtime Certified Edition for z/OS 21 or later, the file encoding for Data Privacy for Diagnostics files must be specified explicitly via the following parameter due to changes made to the default file encoding: -Dfile.encoding=IBM-1047. For more information, see IBM Semeru Runtime Certified Edition for z/OS 21.End of change

JZOS LOAD MODULE
The dialog uses the JZOS Batch Launcher in the JCL that is submitted. Determine the correct level of JZOS installed on your system and provide the name of the appropriate load module in this parameter. Data Privacy for Diagnostics requires a minimum of IBM 64-bit SDK for z/OS Java Technology Edition version 8.0, thus the 64-bit version 8 load module for JZOS Batch Launcher is JVMLDM86. For more information, see the JZOS Batch Launcher and Toolkit Installation and Users Guide.
MIGLIB DATASET
A sort E35 exit is used to remove pages that are flagged as sensitive. This function is provided in module BLSRTE35, which is included in SYS1.MIGLIB. Should you need to override where this exit can be loaded from, provide the name of the MIGLIB that contains the load module you want to run.
TEMP ALLOC PARMS
If your environment requires specific allocation parameters for memory dump data sets, you may supply any allocation parameters that ensure that the data set is properly allocated. For example, supplying DATACLAS and STORCLAS keywords may be necessary to locate the correct storage pool and attributes.

Do not specify RECFM, DSORG, LRECL, BLKSIZE, SPACE, and TRACK as they are used to create some of the interim data sets. If you need to use one of those allocation parameters, request the ANALYZE function by the JCL instead of through IPCS.

EDIT CONFIG FILE?

If Y, allows the user to edit the configuration file pertaining to the function requested (analysis_config.json for ANALYZE or ingestion_config.json for INGEST or extract_config.json for EXTRACT) before submitting the JCL to perform the requested function. Default is N. For more information, see the analysis_config.json, extra_config.json, and ingestion_config.json sections.

RUN NUMBER
From the ANALYZE step, a run number was generated and can be found in the job output, which can be specified for this parameter when the function requested is REPORT or FEEDBACK. If a run number is not specified, the most recent ANALYZE run for the input memory dump is used.
DB2® JDBC PATH
For the INGEST function, if using a Db2® connection source in the ingestion_config.json file, this field is needed to specify the path for the Db2 JDBC Driver and License JAR files. Do not include the trailing '/' when specifying this path.