I am especially pleased today to introduce two guest authors: Changhai Ke and Thomas Baudel. Changhai was developer #1 on the JRules rule engine -- there is very little about rules, inference and decision management he doesn't know! Thomas is an expert in visualization technology, and has recently taken on IP review/patent responsibilities.
I hope you enjoy this deep-dive on how we integrated JRules and SPSS in the recently released Analytics SupportPac.
Rules and predictive analytics are two complementary forms of knowledge. In a decision support system, or more widely in decision management, these two techniques may collaborate to produce better decisions.
Rules express deliberate heuristics to take decisions. They are most often expressed by human experts or business people, and find their roots in the human experience, in the policies, regulation rules, or any sort of guidance.Below is an example of business rule, which accepts or rejects a loan:
Predictive analytics come from the data mining. Previous transactions may have gathered a significant quantity of historical data. From this data, data mining techniques are used to analyze it and build a model, which is then used to predict the future, hence the name “predictive analytics”, or “predictive models”, or even “predictive modeling”.
For example, still in the area of credit loans, it is useful to build a model that predicts if a borrower may default on a loan. This is often expressed in terms of likelihood: what is the likelihood of this person to default, taking into account the historical data.
Saying “taking into account the historical data” is not quite accurate. Indeed, only the predictive model is executed to predict an outcome (here the likelihood to default). The historical data have been used, at design time, to build the model. The model is only composed of some algorithms (mathematical formulas...), which are executed to produced a result.
For example, a model to predict a default may use easily quantifiable customer data:
- The age of the person
- The yearly income
- The education level
- The number of years in the current job
- The number of years in the current address / residency
- The current level of debt (debt to income ratio)
Along with these data, we also need to know if a person has defaulted in the past x years, so that we can build a model. In this particular case, a Bayesian network method is often used. The Bayesian method builds a correlation between the input factors listed above, with the result (the defaulter value), in a statistical robust and executable form.
Once the model is built, one can execute the model by providing the input data. Some input data can be even by marked as optional.
It is essential that these two decision techniques collaborate, so that one unified decision system can produce the best result, combining human knowledge/experience and statistical insight. We have seen above that rules are closer to human reasoning, they are often the façade of a decision system, and drive usage of predictive models.
The JRules Analytics SupportPac, released early 2011, provides an intergration between the JRules rules engine and SPSS. It binds a predictive model deployed in IBM SPSS, such that this model can be invoked from JRules. JRules Rule Studio first asks for the description of the predictive model, in terms of input and output data, and then builds a XOM class to map to that model. In JRules, a XOM class is an executable class, which can be a concrete Java class, or a virtual class which builds a bridge to another data model.
Here the XOM class is the interface to SPSS.
Typically, the rule engine executes Java XOM classes: the Java classes are reflected, instances built, and the engine modifies them using getter and setters methods. However there may be other sources of data, such as databases, XML, web services. These sources are not Java classes, while we also want the rule engine to process instances of these models. The XOM has therefore been “virtualized”: a XOM class is no longer necessarily backed by a Java class, it can be virtual, as long as it specifies how its getter and setters are implemented, how its instances are identified, and how to execute its methods. This virtual XOM has been used as the basis for the ILOG XML binding technology, and is now also used for SPSS model binding, as implemented in the Analytics SupportPac.
As an illustration, below is the XOM class that results from the binding with a SPSS model, for predicting the defaulter of credit loans.
When you build an instance of this class, and fill in the input fields, you call getScore() to invoke the SPSS model. Then you can read the output fields, defaulter and MarkovProb. Note that MarkovProb is a probability between 0 and 1, to express the confidence in the produced result.
After this XOM class is produced, you can use it to implement any virtual BOM method. For example we add a “predictDefaulter()” method to the class Borrower, and we implement this class using the XOM class produced by the SPSS binding capability:
The BOM method can then be verbalized and used in the business rule language by business users. We can improve our previous rule by invoking this method for example. To express a stricter credit policy, or a better credit policy, we can invoke the model to predict the likelihood for the borrower to default, then accept or reject the credit loan, based on the result.
We can modify our initial rule:
This rule becomes stricter, as a person who is not predicted as a defaulter is accepted. We can implement a finer policy by using the probability produced by the model.
It is very useful to let rules determine the context in which predictive models are to be invoked. In turn, the results of predictive modeling help refine the rules, giving the rules more accuracy. This is one of the foundational pillar of decision management.
The approach presented in this article aims at embedding predictive models in rules. Here, the predictive models are encapsulated in JRules BOM classes, and made accessible to rule authors. The embedding is transparent to rule authors and pertinent in many cases. Other couplings will also make sense, depending on the application architecture. In the example described, we employ two decision technologies, and integrate them to produce a coherent business decision.