Interactive performance reports with Monitor III

The interactive Monitor III reporter runs in a TSO/E session under ISPF and provides system or sysplex performance reports in the following ways:

Displays your current system status in real-time mode
Shows previously collected data that is still available in either in-memory buffers or pre-allocated VSAM data sets

You can use Monitor III to quickly identify storage and processor delays for your active Spark workload. To start an interactive Monitor III session, enter the TSO/E command RMF and select Monitor III from the RMF - Performance Management panel. From the RMF III Primary Menu, you can select the specific performance metrics that you want to see. To further filter the report by job class and service class, you can issue the following command:

report_name job_class,service_class

where:

report_name

The short name of the report

job_class

One of the following job class names:

ALL (or A)
ASCH (or AS)
BATCH (or B)
OMVS (or O)
STC (or S)
TSO (or T)

service_class

The service class name

Example: To get the Storage Delays report for the ODASSC1 OMVS service class, enter this command:

STOR O,ODASSC1

The following Monitor III reports are of particular interest for monitoring Spark workloads:

Storage Delays report
Common Storage report
Storage Frames report
Storage Memory Objects report
Processor Delays report
Processor Usage report
zFS File System report

Based on the performance measurements that you observe from these reports, you can fine tune the resource assignments for your Spark workload. For instance, you can modify the number of cores and amount of memory for your executors in the spark-defaults.conf configuration file. (For more information, see Configuring memory and CPU options.) Or, if you use WLM to manage your Spark workload, you can adjust the importance and performance goals of your Spark workload. (For more information, see Configuring z/OS workload management for Apache Spark.)

Storage Delays report

The Storage Delays report (STOR) displays storage delay information for all jobs. Here you can find out if your Spark jobs suffer any delays due to memory constraints. A non-zero value in the DLY % column indicates that there is a delay due to memory constraints. Figure 1 shows an example of this report.

Figure 1. Example of the RMF Storage Delays report

                      RMF V2R2   Storage Delays                    Line 1 of 7
                                                                               
Samples: 60      System: SYS1  Date: 03/13/17  Time: 15.55.00  Range: 60    Sec
                                                                               
             Service  DLY    ------- % Delayed for ------   -- Working Set --  
Jobname   C  Class     %     COMM  LOCL  SWAP  OUTR  OTHR   Central  Expanded  
                                                                               
ODASW1A   O  ODASSC1   0       0     0     0     0     0     36823            
ODASM1A   O  ODASSC1   0       0     0     0     0     0     38759            
ODASX1A   O  ODASSC1   0       0     0     0     0     0      183K

Common Storage report

The Common Storage report (STORC) provides information about the use of common storage (CSA, ECSA, SQA, and ESQA) within a system. You can use this report to identify whether Spark is using an excessive amount of common storage (such as for memory-mapped files). Figure 2 shows an example of this report.

Figure 2. Example of the RMF Common Storage report

                        RMF V2R2   Common Storage              Line 323 of 340
                                                                               
Samples: 60      System: SYS1  Date: 03/13/17  Time: 22.42.00  Range: 60    Sec
                                                                               
                                    ---- Percent ----   ------- Amount --------
System Information                  CSA ECSA SQA ESQA     CSA  ECSA   SQA  ESQA
 IPL Definitions                                        1856K  501M 4384K   63M
 Peak Allocation Values              21   25  28  133    393K  124M 1248K   84M
 Average CSA to SQA Conversion        0    7                0   34M            
 Average Use Summary                 21   25  14  131    385K  124M  618K   83M
 Available at End of Range           79   75  86   23   1471K  377M 3766K   15M
                                                                               
 Unalloc Common Area: 5028K                                                    
                                                                               
               Service        ELAP  -- Percent Used -   ----- Amount Used -----
Jobname  Act C Class    ASID  Time  CSA ECSA SQA ESQA     CSA  ECSA   SQA  ESQA
ODASX1A      O ODASSC1 0329  12.1M   0    0   0    0       0     0     0   160
ODASW1A      O ODASSC1 0326   8.3H   0    0   0    0       0     0     0   160
ODASM1A      O ODASSC1 0326   8.3H   0    0   0    0       0     0     0   160

Storage Frames report

The Storage Frames report (STORF) displays detailed frame counts, auxiliary slot count, and page-in rate for each address space. For instance, it tells you the average number of frames used by each Spark process (ACTV column) and the paging rate (PGIN RATE column). Keeping the paging rate as close to zero as possible helps improve performance. For instance, increasing the memory limit for the resource group with which Spark address spaces are associated may help lower the paging rate. Figure 3 shows an example of this report.

Figure 3. Example of the RMF Storage Frames report

                      RMF V2R2   Storage Frames                    Line 1 of 7
                                                                               
Samples: 60      System: SYS1  Date: 03/13/17  Time: 15.55.00  Range: 60    Sec
                                                                               
           Service    -- Frame Occup.-- - Active Frames - AUX   PGIN           
Jobname  C Class   Cr TOTAL  ACTV  IDLE  WSET FIXED   DIV SLOTS RATE           
                                                                               
ODASX1A  O ODASSC1    183K  183K     0  183K   716     0     0    0           
ODASM1A  O ODASSC1   38759 38759     0 38759   669     0     0    0           
ODASW1A  O ODASSC1   36823 36823     0 36823   614     0     0    0

Storage Memory Objects report

The Storage Memory Objects report (STORM) displays information about the use of memory objects for each active address space and within the system. A memory object is a contiguous range of virtual addresses that is allocated by jobs in units of megabytes on a megabyte boundary. This report can help you assess the total amount of memory that Spark is using. It also shows the fixed and pageable 1M frames used by Spark address spaces. Spark generally does not require the use of fixed large frames, and it might have a negative impact on the overall system health if Spark JVMs are tuned to use them. Figure 4 shows an example of this report.

Figure 4. Example of the RMF Storage Memory Objects report

                        RMF V2R2   Storage Memory Objects          Line 1 of 7
                                                                               
Samples: 60      System: SYS1  Date: 03/13/17  Time: 15.55.00  Range: 60    Sec
                                                                               
----MemObj----   ---Frames---   --1MB Fixed--   --2GB Fixed--   -1MB Pageable- 
Fixed 1M    10   Shared  424K   Total   111K    Total     12    Initial 22056  
Fixed 2G     1   Common  280K   Common    13    %Used    8.3    Dynamic  4109  
Shared      28   %Used   39.2   %Used    3.9                    %Used     100  
Common     631                                                                 
-------------------------------------------------------------------------------
           Service       --Memory Objects- --1M Frames- 2G-Fr ------Bytes------
Jobname  C Class    ASID Total  Comm   Shr Fixed Pgable Fixed Total  Comm   Shr
                                                                               
ODASM1A  O ODASSC1 0618   351     0     0     0    69      0 13.7G     0     0
ODASX1A  O ODASSC1 0634   347     0     0     0   541      0 45.8G     0     0
ODASW1A  O ODASSC1 0345   334     0     0     0    65      0 13.1G     0     0

Processor Delays report

The Processor Delays report (PROC) displays all jobs that were waiting for or using the processor during the reporting interval. Here you can see if your Spark jobs suffer any delays due to processor constraints. Figure 5 shows an example of this report.

Figure 5. Example of the RMF Processor Delays report

                      RMF V2R2   Processor Delays                 Line 1 of 10
                                                                               
Samples: 60      System: SYS1  Date: 03/13/17  Time: 15.55.00  Range: 60    Sec
                                                                               
            Service  CPU  DLY USG EAppl  ----------- Holding Job(s) -----------
Jobname  CX Class    Type  %   %    %     %  Name      %  Name      %  Name    
                                                                               
ODASX1A  O  ODASSC1 CP     3   2 0.360    
                     IIP    0  23 24.30                                        
ODASM1A  O  ODASSC1 CP     2   0 0.000    
                     IIP    2   0 0.012    
ODASW1A  O  ODASSC1 CP     2   0 0.000    
                     IIP    0   0 0.023

Processor Usage report

The Processor Usage report (PROCU) displays all jobs that were using a general-purpose or special-purpose processor during the reporting interval. You can use this report to understand the CPU usage of your Spark jobs. Combined with the Processor Delay report, you can assess whether you need to change the performance goals or importance of your Spark workload. Figure 6 shows an example of this report.

Figure 6. Example of the RMF Processor Usage report

                      RMF V2R2   Processor Usage                   Line 1 of 5
                                                                               
Samples: 60      System: SYS1  Date: 03/13/17  Time: 15.55.00  Range: 60    Sec
                                                                               
            Service    --- Time on CP % ---   ----- EAppl % -----              
Jobname  CX Class      Total    AAP    IIP       CP    AAP    IIP              
                                                                               
ODASX1A  O  ODASSC1   0.360  0.000  0.218    0.360         24.30              
ODASW1A  O  ODASSC1   0.000  0.000  0.000    0.000         0.023              
ODASM1A  O  ODASSC1   0.000  0.000  0.000    0.000         0.012

zFS File System report

The zFS File System report (ZFSFS) measures zFS activity on the basis of single file systems. With this report, you can monitor the I/O rates and response times associated with the file systems that Spark uses. Figure 7 shows an example of this report.

Figure 7. Example of the RMF zFS File System report

                     RMF V2R2   zFS File System  - SVPLEX3         Line 23 of 46
                                                                                 
  Samples: 120     Systems: 4    Date: 03/09/17  Time: 15.07.00  Range: 120   Sec
                                                                                 
  ------ File System Name --------------------              I/O  Resp Read  XCF  
                   System    Owner     Mode    Size Usg%   Rate  Time  %    Rate 
                                                                                 
  OMVSSPA.SPARK.PLX3.ZFS                                                         
                   *ALL      D0        RO     1440M 60.5  0.000 0.000  0.0 0.000