This week's guest blogger is Ravi Kumar. Ravi is a Senior Managing Consultant at IBM (Analytics Platform, North American Lab Services). Ravi is a Distinguished IT Specialist (Open Group certified) with more than 23 years of I/T experience. He has a Masters degree in Business Administration (MBA) from University of Nebraska, Lincoln. He had contributed to 7 other redbooks in the areas of Database, Analytics Accelerator and Information Management tools. His social profile can be
viewed at: http://www.linkedin.com/in/ravikalyanasundaram
IBM SPSS Modeler is a powerful analytic tool that supports all phases of data analytics process, including data preparation, model building, deployment, and model maintenance. You can leverage SPSS Modeler to build analytical models, which can be used in statistical analysis, data mining and machine learning. The data scientists can work with user-friendly SPSS Modeler client interface to access mainframe data with the same level of ease as that of data from any other platform they are accustomed to. SPSS Modeler can also take advantage of in-database transformation and in-database modeling using IBM DB2 Analytics Accelerator for z/OS (IDAA) as the data analytics hub on z/OS.
Until recently, z Systems did not offer an efficient solution in the area of complex mathematical processing. So, in the past, you may have resorted to the idea of offloading operational data (that is a snapshot from a prior point in time) from z Systems to a distributed platform in order to implement machine learning, and those solutions often resulted in obsolete and unreliable results in addition to the unwanted security exposures.
Now, with IBM DB2 Analytics Accelerator you can enable Machine Learning on your OLTP applications that produce and consume z Systems data, simultaneously accelerating the execution of data transformation and analytical modeling processes with the power and performance of MPP (Massively Parallel Processing) architecture in IBM Netezza appliance. All without offloading data from z Systems to distributed environments (which by the way, also eliminates a potential data breach situation).
In-transactional scoring using the Predictive models created with the above approach can scale with your DB2 for z/OS transactional environment. This is accomplished through in-database scoring using SPSS Scoring Adapter for DB2 for z/OS, which perform real-time scoring on your predictive models to quickly reveal what's interesting in your data. When the predictive model is published in SPSS, the Scoring Adapter for DB2 z/OS uses PACK/UNPACK functions for efficient parameter move and can create an SQL statement with HUMSPSS.SCORE_COMPONENT UDF. This generated SQL statement can be embedded in your OLTP application. The other popular alternative is to generate scoring model in open-standard PMML (Predictive Model Markup Language) format. The score can then be combined with your business rules to make real-time decisions on your DB2 for z/OS data from within your mainframe applications. You may also resort to vendor tool called Zementis that uses the generated PMML to implement in-application scoring in CICS and Java applications accessing DB2 for z/OS.
The above approach easily enables your OLTP and batch applications accessing mainframe data with early machine learning capability to learn hidden patterns in your operational data using mathematical modeling algorithms that are readily available with IDAA (as INZA stored procedures that entirely runs on the Accelerator). With IDAA V5.1, you can utilize five major predictive analytics algorithms viz., K-Means, Naive Bayes, Decision Tree, Regression Tree, and Two-step.
Unsupervised Learning algorithms like K-Means and Two-step uses descriptive statistics to analyze the natural patterns and relationships that occur within your operational data on DB2 for z/OS. Unsupervised learning models can identify clusters of similar records and/or relationships between different fields within an accelerated DB2 for z/OS table. For example, K-Means and Two-Step clustering algorithms (available through stored procedures like INZA.KMEANS and INZA.TWOSTEP) can enable Machine Learning in areas like market segmentation, geostatistics, market basket analysis (by association learning) and so on.
Supervised Learning uses historic/training data to construct decision trees and the constructed tree is then used to predict future values. Classification technique can be used to identify which group or type a new record, that is being inserted into your DB2 for z/OS table, belongs to based on key characteristic values on its fields. Regression technique can be used to predict future values for a given field based on past historic values. Algorithms like Naive Bayes, Decision Tree, and Regression Tree can be used to solve classification and regression problems. Thus the predictive models using supervised learning algorithms (available through stored procedures like INZA.DECTREE, INZA.REGTREE, and INZA.NAIVEBAYES) can be used to predict whether a customer will buy or leave, credit card fraud, up-selling opportunities, voters responsiveness to different types of election campaigns and so on.
Summary: Neuroscientists say that pattern recognition and emotional tagging help humans with quick decision making. Algorithms are a big part of machine learning and these algorithms can aid the executives with more and more evidence based decision making using hot operational data on z/OS. The executives can now combine modern machines' processing power with their own ingenuity to avoid flawed decisions that are sometimes caused by emotional tagging.