by Steven LaFalce, Nicholas Marion, and Neil Shah
In our first blog we created a Spark application on z/OS that identified the top 10 producers of SMF type 30 records. Just a quick refresher on SMF - System Management Facility (SMF) is a base z/OS component; which collects and records various SMF record types (numbers) to record different events. For example, SMF type 30 records provide accounting information and SMF type 80 records are produced during Resource Access Control Facility (RACF) and Public Key Infrastructure (PKI) processing. Now, back to our first blog - we developed an application using the IBM z/OS Platform for Apache Spark to help the z/OS systems programmer easily identify potential offenders for SMF flooding situations (and by SMF flooding situations we mean the SYS1.MANx SMF data sets are switching much faster than normal or the SMF offloaded datasets are consuming more disk space than before). The z/OS systems programmer can invoke this application from z/OS JCL (i.e. batch) and it will identify the top 10 producers (jobnames or user IDs) of SMF type 30 records. Now keep in mind, the underlying assumption was that we knew exactly which SMF record type flooded the system - type 30 records. But, what if it was a different SMF record type? Now what‽
For this new proof of concept (PoC), our goal was to generalize the scenario and develop an application that determines which SMF record type caused the flood, and then identifies the offending jobnames and user IDs. We developed this application using IBM Open Data Analytics for z/OS, or IzODA for short (the follow-on product to IBM z/OS Platform for Apache Spark). This product brings in an additional analytics stack, Python and Anaconda, along with updated versions of z/OS Spark and the Optimized Data Layer (ODL), formerly known as MDS.
Illustration 1: IzODA components
We did our initial development in Jupyter Notebook, using the Python scripting language and ODL, all running on z/OS! Our Jupyter Notebook session was launched from our z/OS system and then accessed via a web browser. For a quick overview on Jupyter Notebooks, see: https://jupyter-notebook.readthedocs.io/en/stable/
Since Jupyter Notebook supports interpreted scripting languages, blocks of Python code were run within the cells of the Jupyter Notebook, with no need to compile. This allowed us to quickly develop and test our code along the way. After we completed our development in Jupyter Notebook, we deployed the code to run on z/OS from a batch environment.
You can download our code here: https://github.com/IzODA/examples/tree/master/SMF/SnakesOnZOS
If you have trouble opening the Jupyter Notebook code, use this link to see the raw code: https://github.com/IzODA/examples/blob/master/SMF/SnakesOnZOS/SMF-python-functions.latest.ipynb?short_path=a27360b
Now, we'll describe some of the sections of code.
Anaconda Packages
First, we imported the following Anaconda Packages:
Importing allows us to use different libraries and packages. We import pandas as pd, where pd is an alias for an instance of the pandas package (just so we don't have to reference 'pandas' all the time).
#importing necessary packages
import pandas as pd
import dsdbc
Establish Connection
Next we want to establish a connection with our ODL server so that we can obtain our SMF records from our z/OS datasets. In order to establish our connection, we need to provide the dsdbc connection with our credentials. There are several ways to do this:
We chose the second option – reading a USS file which contains the user ID and password.
#Retrieve credentials to access ODL server
def get_credentials():
with open('/u/userid/python/user_info.txt') as f:
user = f.readline().rstrip()
password = f.readline().rstrip()
return user, password
user, password = get_credentials()
#Setup ODL
conn = dsdbc.connect(SSID="AZKS", user=user, password=password)
cursor = conn.cursor()
Build and Execute SQL Statement
Now that we have a connection established with our ODL server, we want to obtain the data. The interaction between ODL and the dsdbc driver is accomplished through SQL statements. These SQL statements provide the server with the data fields you're looking for as well as the source of the data.
Thanks to the available SMF_RULE (see hlq.SAZKXVTB(AZKSMFT1)) we can easily specify the SMF dataset inside of our SQL statement instead of hard coding the SMF dataset name in the ODL server VTB rule as we had to do in the past. In order for ODL to correctly parse the dataset name, we just need to replace the periods “.” with underscores “_”.
In our application, the SQL statement is: SELECT SMF_RTY FROM SMF_FILE__ + dataset_name, where SMF_RTY (SMF record type, eg SMF30) is the data field, SMF_FILE is the associated virtual table mapping, and dataset_name is the SMF dataset name. You can see there is a double underscore “__” between the two, this allows MDS to parse the SQL statement and discover the mapping and dataset name. The python command used to execute the query is pd.read_sql, where read_sql is the pandas command to execute the query. The resulting table is now in a pandas DataFrame format.
# User to enter full input dataset name
dataset_name = input("Enter the input dataset name: ")
# Replace ' . ' in dataset name with ' _ '
dataset_name= dataset_name.replace('.', '_')
#Load input dataset
def get_input():
i = pd.read_sql('SELECT SMF_RTY FROM SMF_FILE__' + dataset_name, conn)
return i
Count and Sort
Next, we want to count the SMF record types and sort the results (by volume) in decreasing order. The groupby method provides an easy way to aggregate identical SMF record types. Combining the groupby method with the size method provides the total number of SMF records for each type. Also, since we want to create a new column containing the total number of SMF records for each type, we used the reset_index method with the name 'COUNT' to provide the column with a meaningful heading. Finally, to sort the DataFrame, we used the sort_values method.
#Count and sort the input dataset by record type
def sort_input():
si = get_input().groupby('SMF_RTY').size().reset_index(name='COUNT').sort_values('COUNT',ascending=False)
return si
Analyze and Display Results
The count and sort analysis enabled us to identify the SMF record type with the highest volume; we now have to determine the jobnames and user IDs with the highest volumes. After converting the top (highest) entry in the DataFrame to a string, we used logical (elif) statements to handle the appropriate record type. We then displayed the jobnames and user IDs for the record type. If the record type did not contain a jobname or user ID field, for now, we simply printed out the SMF record type name.
The SMF record type with the highest volume was incorporated in a string statement with the appropriate virtual table mapping and combined with the input dataset name to create a unique SMF record type dataset, 'm'. We then issued another SQL command to load the jobname and user ID data from that unique SMF record type dataset. We counted and sorted the jobnames and user IDs similar to how we analyzed the SMF record types. As you can see in the code, we didn't specify every possible SMF record type; we picked the more frequent SMF record types encountered in our environment. As an added metric, we calculated the percentages of both jobnames and user IDs with the highest volumes of the total.
#Identify the record type with the highest count
#def get_top_record():
t = sort_input().iloc[0]
t = str(t[0])
print("The largest SMF record type =",t,"with " + str(sort_input().iloc[0]['COUNT']) + " records")
print()
if t == '30':
m = 'SMF_0' + t + '00_SMF' + t + 'ID__' + dataset_name
s = pd.read_sql('SELECT SMF' + t + 'JBN, SMF' + t + 'RUD ' 'FROM ' + m, conn)
ji = s.groupby('SMF' + t + 'JBN').size().reset_index(name='COUNT').sort_values('COUNT',ascending=False)
print("Sorted by jobname:")
print(ji.head().to_string(index=False))
print()
ui = s.groupby('SMF' + t + 'RUD').size().reset_index(name='COUNT').sort_values('COUNT',ascending=False)
print("Sorted by user ID:")
print(ui.head().to_string(index=False))
print()
percent_job = round(100*ji.iloc[0]['COUNT']/(len(get_input())-2))
print("The number of " 'SMF' + t + 'JBN' " entries is " + str(percent_job) + " percent of the total.")
percent_user = round(100*ui.iloc[0]['COUNT']/(len(get_input())-2))
print("The number of " 'SMF' + t + 'RUD' " entries is " + str(percent_user) + " percent of the total.")
elif t == '80':
.
.
.
Running on z/OS
Once our development was completed with Jupyter Notebook, we wanted to run this 'program' directly from z/OS via batch job/JCL. We had to make a couple of changes to accomplish that.
Jupyter Notebook is interactive, so our code prompted for the SMF input dataset name. When running from z/OS batch we don't have that ability, so we modified the code to read the dataset name from a USS file:
#Load full input dataset name file
def get_file():
with open('/u/userid/python/data_file.txt') as f:
dataset_name= f.readline().rstrip()
return dataset_name
dataset_name= get_file()
After we uploaded (ftp) our code to z/OS, we attempted to run it and got an error:
SyntaxError: Non-UTF-8 code starting with '\x8e' in file ...
We realized that the Python code was in ASCII, so we converted it to EBCDIC using the iconv command:
iconv -f ISO8859-1 -t IBM-1047 file1.ASCII > file1.EBCDIC
and we got further. Next, we also had to remember that python on z/OS requires input files to be in ASCII (i.e. the USS files we used for user ID /password and the SMF dataset file). We had to convert those to EBCDIC, edit it and then convert back to ASCII (or just use ISPF 3.17 option EA to edit in ASCII -- which was much easier).
We added a few print statements, mainly for diagnostic purposes, but that's all we needed to do!
Now, the z/OS systems programmer can run this batch job against SMF data and the application will identify the major consumers of SMF records, without having to 'manually post process' the SMF data, i.e. execute the SMF dump program (PGM=IFASMFDP), then determine (eyeball) the SMF record type with the largest count and then run the appropriate program to analyze that specific SMF record type –- saving hours of time!
Here is our sample output (and notice this time it was type 80 records that flooded our system!):
+----------------------------------+
| z/OS SMF record analysis program |
| Version 6.0 - June 21, 2018 |
+----------------------------------+
Input SMF dataset name = MVSSPT.LPAR1.SMF.D18171
Top 5 SMF record types:
REC_TYPE COUNT
80 6058440
30 5282013
42 2195441
14 1949155
20 1217461
The largest SMF record type = 80 with 6058440 records
Sorted by jobname:
SMF80JBN COUNT
FTPD 2041802
OMVS 593107
FTPD1 172387
USER11 165618
BUILD4 147435
Sorted by user ID:
SMF80USR COUNT
TCP 1371174
JOHNDOE 916430
MARYANN 812077
HANK 733466
OMVS 702587
The number of SMF80JBN entries is 11.0 percent of the total.
The number of SMF80USR entries is 7.0 percent of the total.
About the Authors:
Steven LaFalce is a Data Scientist in the IzODA Ecosystem for IBM Z
Nicholas Marion is the IBM Open Data Analytics for z/OS Service Team Lead for IBM Z
Neil Shah is a z/OS systems programmer for IBM Services