Filtering a System Data Engine data stream by using a data stream definition

For a System Data Engine data stream, you can use the WHERE clause in the custom update definition to filter the records to be processed, and use the custom template definition that is associated with the update definition to filter the fields to be streamed by Z Common Data Provider.

About this task

After you create custom definitions for System Data Engine data streams, create a custom System Data Engine data stream and then update your analytics platform so that it can process the new data stream. After that, create or update the policy in the Configuration Tool to include the new data stream. If you are adding filters to an existing data stream, you can create the custom definitions and data streams, and update your analytics platform based on the settings for that data stream.

Procedure

  1. If you do not already have one, create a partitioned data set (PDS) that is used as the user concatenation library for the custom definitions.
    For more information about how to create the data set, see step 1a in Creating a System Data Engine data stream definition.
  2. Create a custom update definition in a new or an existing data set member of the user concatenation library. If you want to filter the records to be processed, add a WHERE clause to the definition. The name of the member for the custom update definition cannot be the same as any existing member in the SHBODEFS data set.
    • You can create a new custom update definition by using the DEFINE UPDATE statement. For the language reference of the DEFINE UPDATE statement, see DEFINE UPDATE statement.
    • To filter an existing data stream, copy the update definition that is used by the data stream to a new or an existing data set member of the user concatenation library and update the definition based on your requirements.
      1. Locate the update definition for the existing data stream by using one of the following methods. The data set members for update definitions are named HBOUxxxx.
        • Because the Z Common Data Provider names the data stream with the name of the associated update definition, in your z/OS® environment, use ISPF option 3.14 or SRCHFOR ISPF command to search the data stream name.
        • Review the data stream definition in the sde.streams.json file or the ims.streams.json file in the Configuration Tool directory /usr/lpp/IBM/zcdp/v5r1m0/UI/LIB/. Check the hboin parameter that specifies all data set members for required definitions. Usually the last one is for the update definition.
      2. Copy the update definition to a new or an existing data set member of the user concatenation library.
        The following code sample shows the update definition SMF_101_1_PACKAGE in the member HBOUS101 of the data set SHBODEFS.
        SET IBM_FILE = 'SMF1011K';           
        
        DEFINE UPDATE SMF_101_1_PACKAGE      
          VERSION 'CDP.510'       
          FROM SMF_101_1 SECTION PACKAGE     
          TO &IBM_UPDATE_TARGET              
          &IBM_CORRELATION                   
          AS &IBM_FILE_FORMAT SET(ALL);       
        Copy the update definition SMF_101_1_PACKAGE to the data set member USRUS101 in the user concatenation library USERID.LOCAL.DEFS with the following changes.
        SET
        The SET statement is needed only when the target of the update definition is a file, which means the variable IBM_UPDATE_TARTGET is set to FILE &IBM_FILE. You can change it to USR1011K.
        SET IBM_FILE = 'USR1011K';
        DEFINE UPDATE
        The data streams must have unique names, so you must rename the update definition to avoid conflict with the existing data stream. You can change it to USR_101_1_PACKAGE.
        The updated USERID.LOCAL.DEFS(USRUS101) member has the following content:
        SET IBM_FILE = 'USR1011K';
        
        DEFINE UPDATE USR_101_1_PACKAGE                        
          VERSION 'CDP.510'                         
          FROM SMF_101_1 SECTION PACKAGE                       
          TO &IBM_UPDATE_TARGET                                
          &IBM_CORRELATION                                     
          AS &IBM_FILE_FORMAT SET(ALL);
      3. If you want to filter the records to be processed, add a WHERE clause to the custom update definition. For example, if you want to collect only Db2® package accounting records whose transaction name starts with MG, or the authorization ID is U@MUPJ2, add the following WHERE clause:
        WHERE (SUBSTR(QWHCEUTX,1,2) = 'MG')                  
         OR (QWHCAID = 'U@MUPJ2')                          
        The updated USERID.LOCAL.DEFS(USRUS101) member has the following content:
        SET IBM_FILE = 'USR1101K';              	    
                                                           
        DEFINE UPDATE USR_101_1_PACKAGE                        
          VERSION 'CDP.510'                         
          FROM SMF_101_1 SECTION PACKAGE                       
          WHERE (SUBSTR(QWHCEUTX,1,2) = 'MG')                  
            OR (QWHCAID = 'U@MUPJ2 ')                          
          TO &IBM_UPDATE_TARGET                                
          &IBM_CORRELATION                                     
          AS &IBM_FILE_FORMAT SET(ALL);
        For more information about the WHERE clause, see WHERE.
  3. If you want to filter the fields to be streamed, add a DEFINE TEMPLATE statement for the update definition in the same data set member of that update definition.
    Verify that the template definition is placed after the update definition. The following example shows a template definition in the member USERID.LOCAL.DEFS(USRUS101) for the update definition USR_101_1_PACKAGE to stream only a few fields in the PACKAGE section of record SMF_101_1 record.
    SET IBM_FILE = 'USR1101K';              	    
                                                       
    DEFINE UPDATE USR_101_1_PACKAGE                        
      VERSION 'CDP.510'                         
      FROM SMF_101_1 SECTION PACKAGE                       
      WHERE (SUBSTR(QWHCEUTX,1,2) = 'MG')                  
        OR (QWHCAID = 'U@MUPJ2 ')                          
      TO &IBM_UPDATE_TARGET                                
      &IBM_CORRELATION                                     
      AS &IBM_FILE_FORMAT SET(ALL);                    
    
    DEFINE TEMPLATE USR_101_1_PACKAGE FOR USR_101_1_PACKAGE    
      ORDER                                                    
      (SM101TME,                                               
       SM101DTE,                                                                                             
       QPACLOCN,                                               
       QPACCOLN,                                               
       QPACPKID,                                               
       QPACSQLC,                                               
       QPACSCB,                                                
       QPACSCE,                                                
       QPACBJST,                                               
       QPACEJST)                                               
      AS &IBM_FILE_FORMAT;
    DEFINE TEMPLATE
    The template definition name must be the same as the update definition name to replace the default template definition that streams all fields for the update definition. In the template definition, you must include the date and time fields from the SMF record header for an SMF record, or the timestamp field in the record suffix for an IMS log record. These fields are required for timestamp resolution when you ingest data to your analytics platform. In this example, the fields are SM101DTE and SM101TME.
    For the language reference of the DEFINE TEMPLATE statement, see DEFINE TEMPLATE statement.
  4. Validate the syntax of the custom update and template definitions.
    Use the following example job to verify the members for the custom update and template definitions.
    //HBOBCOL  JOB (),'DUMMY',MSGCLASS=X,MSGLEVEL=(,0),
    //         CLASS=A,NOTIFY=&SYSUID                  
    //*                                                            
    //HBOSMFCB EXEC PGM=HBOPDE,REGION=0M,PARM='SHOWINPUT=YES'      
    //STEPLIB  DD   DISP=SHR,DSN=hlq.SHBOLOAD                  
    //HBOOUT   DD   SYSOUT=*                                       
    //HBODUMP  DD   SYSOUT=*                                       
    //HBOIN    DD   DISP=SHR,DSN=hlq.SHBODEFS(HBOCCSV)          
    //         DD   DISP=SHR,DSN=hlq.SHBODEFS(HBOCCORY)        
    //         DD   DISP=SHR,DSN=hlq.SHBODEFS(HBOLLSMF)         
    //         DD   DISP=SHR,DSN=hlq.SHBODEFS(HBORS101)         
    //         DD   DISP=SHR,DSN=USERID.LOCAL.DEFS(USRUS101)        
    //         DD   *                                              
    COLLECT SMF                                                    
    WITH STATISTICS                                                                                               
    //*                                                            
    //HBOLOG   DD   DUMMY                             
    hlq
    Change hlq to the high-level qualifier for the Z Common Data Provider SMP/E target data set.
    
    //STEPLIB  DD   DISP=SHR,DSN=hlq.SHBOLOAD                  
    //HBOOUT   DD   SYSOUT=*                                       
    //HBODUMP  DD   SYSOUT=*                                       
    //HBOIN    DD   DISP=SHR,DSN=hlq.SHBODEFS(HBOCCSV)          
    //         DD   DISP=SHR,DSN=hlq.SHBODEFS(HBOCCORY)        
    //         DD   DISP=SHR,DSN=hlq.SHBODEFS(HBOLLSMF)         
    //         DD   DISP=SHR,DSN=hlq.SHBODEFS(HBORS101)     
    HBORS101
    HBORS101 contains the record definition SMF_101_1. This member must be included before the member that contains your custom update and template definitions.
    //         DD   DISP=SHR,DSN=hlq.SHBODEFS(HBORS101)        
    // DD DISP=SHR,DSN=USERID.LOCAL.DEFS(USRUS101)
    Specifies the data set member for the custom update and template definition.
    Important: Verify that the definitions are error-free by running the validation job before you create the custom data stream.
    If there is no syntax error, you see the following messages.
    HBO0201I Update USR_101_1_PACKAGE was successfully defined. 
    
    HBO0500I Template USR_101_1_PACKAGE was successfully defined. 
    
    If there are syntax errors, correct the errors according to the messages in the output file that is defined by HBOOUT.
  5. Create a custom System Data Engine data stream in the Configuration Tool based on the update definition and template definition that are created in previous steps.
    For more information, see Creating a System Data Engine data stream definition. Verify that the data stream name, the custom update definition name, and the custom template definition name are the same, and that you specify the member for the record definition before the member for the custom update and template definitions in the SHBODEFS data set members field.
  6. Update your analytics platform so that it can process the new data stream.
    • If you are ingesting data to the Elastic Stack, for each data stream, create a field name annotation configuration file, and a timestamp resolution configuration file in the Logstash configuration directory.

      If your new data stream is created based on an existing one, you can create the two files by copying and editing the files for the old data stream. In previous examples, the new data stream USR_101_1_PACKAGE is created based on the existing data stream SMF_101_1_PACKAGE, and the two configuration files are H_SMF_101_1_PACKAGE.conf and N_SMF_101_1_PACKAGE.conf in the Logstash configuration directory. Copy these two files and change the file names to H_USR_101_1_PACKAGE.conf and N_USR_101_1_PACKAGE.conf, then edit the files according to the following instructions.

      Field name annotation configuration file
      The file is named H_data_stream_name.conf, for example, H_USR_101_1_PACKAGE.conf. See the following example of the file:
      # CDPz ELK Ingestion
      #
      # Field Annotation for stream zOS-USR_101_1_PACKAGE
      #
      
      filter {
         if [sourceType] == "zOS-USR_101_1_PACKAGE" {
      
            csv{ columns => [  "Correlator", "SM101TME", 
      "SM101DTE", "QPACLOCN", "QPACCOLN", "QPACPKID", 
      "QPACSQLC", "QPACSCB", "QPACSCE", "QPACBJST", 
      "QPACEJST"]
               separator => "," }
         }
      }
      sourceType
      The value of sourceType must match the data source type of the data stream. The naming convention is zOS-data_stream_name.
      if [sourceType] == "zOS-USR_101_1_PACKAGE" 
      csv{ columns => []
      If you have a custom template definition, change the column list to match the fields and order in the template definition.
      Timestamp resolution configuration file
      The file is named N_data_stream_name.conf, for example, N_USR_101_1_PACKAGE.conf. See the following example of the file:
      # CDPz ELK Ingestion
      #
      # Timestamp Extraction for stream zOS-USR_101_1_PACKAGE
      #
      
      filter {
         if [sourceType] == "zOS-USR_101_1_PACKAGE" {
            mutate{ add_field => {
               "[@metadata][timestamp]" => "%{SM101DTE} %{SM101TME}"
              }}
      
            date{ match => [
                   "[@metadata][timestamp]", "yyyy-MM-dd HH:mm:ss:SS"
              ]}
         }
      }
      sourceType
      The value of sourceType must match the data source type of the data stream. The naming convention is zOS-data_stream_name.
      if [sourceType] == "zOS-USR_101_1_PACKAGE"
      add_field =>
      For an SMF record, you must specify the date and time fields in the SMF record header. In this example, the fields are SM101DTE and SM101TME.
      "[@metadata][timestamp]" => "%{SM101DTE} %{SM101TME}"
      For an IMS log record, you must specify the timestamp field in the record suffix. For example, the timestamp field in the IMS_07 record suffix is DLRSTCK.
      "[@metadata][timestamp]" => "%{DLRSTCK}"
      match =>
      For an SMF record, use the following time format.
      "[@metadata][timestamp]", "yyyy-MM-dd HH:mm:ss:SS"
      For an IMS log record, use the following time format.
      "[@metadata][timestamp]", "yyyy-MM-dd HH:mm:ss.SSSSSS"
      Restart Logstash after you create the files for the new data stream. Refer to Logstash documentation for more information about the configuration files.
    • If you are ingesting data to Splunk, define the layout of the data stream to the Splunk server by creating the props.conf file in the Splunk_Home/etc/apps/ibm_cdpz_buffer/local directory on the Splunk server.
      If your new data stream is created based on an existing one, you can create the file by copying and editing the content for the old data stream. Based on previous examples, open the props.conf file in the Splunk_Home/etc/apps/ibm_cdpz_buffer/default directory and copy the section for SMF_101_1_PACKAGE. Paste the content to the props.conf file in Splunk_Home/etc/apps/ibm_cdpz_buffer/local and edit it according to the following example. If the props.conf file exists, append the content to the file.
      #
      # USR_101_1_PACKAGE (zOS-USR_101_1_PACKAGE)
      #
      
      [zOS-USR_101_1_PACKAGE]
       TIMESTAMP_FIELDS = SM101DTE, SM101TME, timezone
       TIME_FORMAT = %F %H:%M:%S:%2Q %z
       FIELD_NAMES = "sysplex","system","hostname","","","sourcename",
      "timezone", "Correlator", "SM101TME", "SM101DTE", "QPACLOCN", 
      "QPACCOLN", "QPACPKID", "QPACSQLC", "QPACSCB", "QPACSCE", 
      "QPACBJST", "QPACEJST"
       INDEXED_EXTRACTIONS = csv
       KV_MODE = none
       NO_BINARY_CHECK = true
       SHOULD_LINEMERGE = false
       category = Structured
       disabled = false
       pulldown_type = true
       TRUNCATE = 20000
      [zOS-USR_101_1_PACKAGE]
      You must specify the data source name of the data stream. The naming convention is zOS-data_stream_name.
      TIMESTAMP_FIELDS
      For an SMF record, you must specify the date and time fields in the SMF record header. In this example, the fields are SM101DTE and SM101TME.
      TIMESTAMP_FIELDS = SM101DTE, SM101TME, timezone
      For an IMS log record, you must specify the timestamp field in the record suffix. For example, the timestamp field in the IMS_07 record suffix is DLRSTCK.
      TIMESTAMP_FIELDS = DLRSTCK, timezone
      TIME_FORMAT
      For an SMF record, use the following time format.
      TIME_FORMAT = %F %H:%M:%S:%2Q %z
      For an IMS log record, use the following time format.
      TIME_FORMAT = %F %H:%M:%S.%6Q %z
      FIELD_NAMES
      If you have a custom template definition, change the column list to match the fields and order in the template definition. If the column Correlator exists, do not remove it.
      In the Splunk user interface, you must also configure the file to data source type mapping for the new data stream. The file that the Data Receiver saves is named zOS-data_stream_name-*.cdp. For example, the data stream USR_101_1_PACKAGE has the file that is named CDP-zOS-USR_101_1_PACKAGE-*.cdp.

      Restart the Splunk server after you make the changes.

      Refer to Splunk documentation for more information.

  7. Create or update the policy to add the new System Data Engine data stream.
    1. In the Configuration Tool primary window, select the policy that you want to update.
    2. Click the Add Data Stream icon Add Data Stream icon in the Policy Profile Edit window.
    3. Find and select the new data stream from the list in the select data stream window.
    4. Assign a subscriber for each new data stream.
    5. In the Policy Profile Edit window, click SYSTEM DATA ENGINE to ensure that values are provided for USER Concatenation and CDP Concatenation fields, and click OK. Complete the field USER Concatenation with the data set name of your user concatenation library. Based on previous examples, USERID.LOCAL.DEFS should be specified for the field.
    6. Click Save to save the policy.
    Important: Each time that the associated update definition or template definition is changed, you must edit and save the policy in the Configuration Tool so that the changes are reflected in the policy.
  8. Restart the Data Streamer and the System Data Engine.

Results

The records and fields are filtered according to your configuration.