by Connor Hayes, Erin Farr and Neil Shah
Act I - The Crime
A typical day as a systems programmer - instant messages, phone calls, meetings, but this time it was a little different. Our z/OS SMF datasets (i.e. SYS1.MANx) were filling up as fast as our dump process could unload them. Who or what was causing the spike? Was it a runaway batch job, runaway TSO user ID, runaway z/OS UNIX process, or perhaps something more nefarious? These are the things (or at least one of the things), that keep sysprogs up at night! So, now what? I looked for runaway batch jobs, runaway TSO user IDs, any clues in SYSLOG, anomalies in our monitoring tools, but I couldn't find any smoking gun. If I had Tivoli Decision Support for z/OS perhaps this would help me identify the offending user/jobname.
And by the way, this wasn't the first time this SMF flood had happened. In prior instances, to attempt to find the cause, I ran the RACF utility IRRADU00 to see if that would provide us with any insights. I also opened a PMR to Level 2 Service to see if they could help; which of course they did - with some homegrown programs, they were able to format out the records nicely and help me identify the offenders. But, this takes time, and in the meantime my SMF datasets keep filling up and the DASD usage for the SMF data is filling rapidly. I need to know who is causing the spike right now (I sound like my kids)!
Act II - Calling Spark
What can help when you have more data than a human can process and the clock is ticking? You know the answer to who and what is causing the issue is INSIDE the SMF data, but it's that SMF data which is becoming intractable and growing exponentially!
What if you had the ability to analyze large amounts of data in parallel and in memory? What if that included z/OS data, like SMF, and can be accessed using common interfaces like java and JDBC calls? That is what Spark on z/OS provides. Think of Spark like a run-time environment for analytics and parallel processing of large amounts of data. Spark on z/OS includes an optimized data layer (MDS), which provides a Spark programmer the ability to use JDBC calls to access SMF records, VSAM datasets, IMS data, and more, just like they would any other data source on a distributed platform.
Let's see if I can apply Spark to the issue at hand.
The SMF 30 data contains common address space work, such as the job name of our offender. I can use Spark to count the number of occurrences of each jobname, and list the top 10 jobnames.
Act III - The Investigation
The optimized data layer, also called "MDS" (Mainframe Data Service), is designed to take all of the hard work out of reading and formatting z/OS-specific data like SMF. It includes mappings for a multitude of SMF records like the one I needed: SMF30. MDS provides a separate virtual table for each section inside a record and I would need to use the table for the Identification section. I was interested in two fields in particular from this section: SMF30JBN, which lists the name of the job for which that record was written, and SMF30RUD which gives the RACF user ID associated with that job.
Once Spark with MDS was installed on our z/OS system, I now had all of the tools I needed to map the SMF data to a relational database format and access it with simple JDBC calls. I had to extract all of the jobnames and user fields from each of the Identification sections, and find the most commonly occurring jobs or users, but how do you actually accomplish this? In order to start processing data with Spark you first have to build a Spark application and then submit it to Spark via the spark-submit command.
I chose to build the application on my local workstation using the Scala IDE for Eclipse and then FTP the completed jar to the remote system because this allows you to take advantage of plugins that can make building a spark-submit application a lot easier. Building one of these applications requires you to use of one of two specific build tools. You can either use sbt (Simple Build Tool) or Apache Maven, both of which are open source. These tools build a "fat" jar file which includes any code dependencies the project has as part of the build process. The IDE also formats the project's source files into the required directory structure. Ultimately, which tool you use is a matter of preference. I chose to use Maven as a matter of convenience. The Eclipse/Scala IDE supports a Maven plugin that formats your project for you and provides an interface that greatly simplifies adding dependencies.
Here's a list of useful references:
Scala IDE documentation
SMF30 Identification Section Mapping
Maven Install Documentation
Maven: Introduction to the Standard Directory Format
Maven: POM Reference
Maven: Guide To Installing 3rd Party Jars
IBM z/OS Platform for Apache Spark Manuals
Here's how I built my Spark Scala application.
Create a new Maven project in Eclipse:
In the menu, select File->New->Other->Maven->Maven Project, then right click on the project and select Scala->add Scala nature.
All Maven projects contain a file called pom.xml that contains the configuration information for Maven to build your project (see Figure 1). You can think of it like a Maven-specific Makefile. You will need to identify Spark-core and SparkSql as dependencies for your project in this pom.xml file so that Maven can pull these binaries from its central repository onto your workstation and build them into your project. You will also need to add the MDS driver to your dependencies. The MDS driver is used to read z/OS data sources into your application, and is a Type 4 JDBC driver. Because the driver jar is shipped with Spark on z/OS it has to be manually added as a dependency and to your local Maven repository. Remember you will have to do a Maven Update after you make any changes to the pom.xml file.
Figure 1 - Portion of pom.xml file
Building The Application:
Once your project is set up and your pom.xml contains the necessary dependencies, it is time to start building a Scala object. This Scala object contains the code for connecting to the Spark cluster, as well as the code for performing whatever analysis you want. To build the application you can right click on the project and select "Run as" -> "Maven install". This will run the Scala compiler as well as include any dependencies listed in your pom.xml. You can then submit the compiled jar file to the Spark cluster via the spark-submit command on z/OS.
To start building this class, create a new Scala object in your main.Scala folder.
Note: The application must be a Scala Object and not a Scala class as Scala classes are not actually instantiated.
In general, the flow for creating a Spark application that uses MDS is going to follow the same basic pattern as building any Spark application:
1. Create a Spark Session. (Figure 2, line 75)
The sparkSession is the main entry point into Spark and is the point from which Spark SQL can make JDBC calls. Before the sparkSession can actually retrieve any data you have to tell it what kind of data you want. You need to set its type to JDBC and configure it to use the MDS-specific JDBC driver to get SMF data from MDS. You will also need to supply a JDBC URL with your credentials, and specify the data set you want to access and the name of the virtual SMF table that you want to map over it. For this example I chose the SMF_03000_SMF30ID which is the Identification Section of the SMF30 Record.
Figure 2 - Creating and configuring a Spark session
2. Load the data into a Dataframe. (Figure 3, line 45)
SparkSQL includes the DataFrame class which lets you format the data into tables where they can be accessed and manipulated with SQL queries. You can create a DataFrame for any SMF data set of a single record type. With an instantiated DataFrame the programmer can then perform any kind of analysis on any of its fields.
3. Perform your Analysis (Figure 3, lines 47-49, 52-54)
DataFrames contain a select method that allows us to return only the fields you need, in our case, SMF30JBN and SMF30RUD. They also contain methods for performing joins, groupBys, OrderBys, and just about every other SQL operation. I wanted to find the the runaway jobs causing this massive spike and also the users responsible for them. So my first goal was to select the names of the highest occurring jobs and display them; which is shown in lines 47-49 in Figure 3. Then, I added code to display the top user IDs (lines 52-54).
Figure 3 - Load data and perform analysis
Act IV - The Verdict
I created the Spark application and ran it against the SMF type 30 records using the spark-submit command, as follows:
spark-submit --class "com.ibm.zos.smf.SparkSMF.SMF30Jobs" --master
local --jars /u/myuser/dv-jdbc-3.1.201609290024.jar --driver-
While I specified the --master option as "local", you can also use this to point to the master node of your Spark cluster.
Here is the output from our Spark application - listing the top 10 jobnames:
Figure 4 - Top 10 jobnames
And then we ran the application specifying a different class to list the top 10 user IDs:
Figure 5 - Top 10 user IDs
From this view it is pretty easy to tell which user caused the spike. The count (or number of occurrences) for the top user ID is greater than the second highest by over seventeen million. More incriminating is that the number of occurrences (19,605,051) just so happens to equal the sum of all occurrences of the top 10 jobs. To circle back to the original problem, all of these 'jobs' came from one user ID, and since the suffix of the jobnames is a random number, these were z/OS UNIX processes, so apparently the user's script went into some kind of loop.
So now you know how to use Spark not only to analyze SMF data, but solve a real business problem quickly by chasing down a user causing an issue.
If you'd like to give Spark a try, you can download it by ordering IBM z/OS Platform for Apache Spark V1.1.0 (Program Number 5655-AAB) from ShopzSeries. It contains both Apache Spark and the optimized data layer (MDS) for access to z data.
Note: APAR PI70302 delivers Spark V2.0.2; which was used for this application.
About the authors:
Connor Hayes is a co-op on the development team for the IBM z/OS Platform for Apache Spark.
Erin Farr is the development Team Lead for IBM z/OS Platform for Apache Spark.
Neil Shah is a z/OS Systems Programmer for IBM Global Technology Services.
Special thanks to Dale Voon from Rocket Software for his expert review.