Contents


Think big! Scale your business rules solutions up to the world of big data

Build an app that uses Business Rules and BigInsights for Apache Hadoop services on IBM Bluemix

Comments

To learn how big data analysis can be integrated into business rules apps, build an example airline passenger app that uses Business Rules and BigInsights for Apache™ Hadoop® services on IBM® Bluemix™.

Traditional business rule applications process records of a few megabytes of data at a time. Records are usually processed as client server requests or in a batch, one record at a time. As solutions move to the cloud, and applications apply rules to terabytes of data, these traditional approaches cannot keep up. To scale business rules solutions up to the world of big data, consider using the Business Rules and the BigInsights for Apache Hadoop services in IBM Bluemix.

The sample RulesAdaptor application in this tutorial opens up the possibility for data scientists to analyze big data with business rules.

This tutorial describes a generic application called RulesAdaptor that uses services in IBM Bluemix to integrate the business rules of IBM Operational Decision Manager (ODM) and the big data capabilities of Apache™ Hadoop®. This application opens up the possibility for data scientists to analyze big data with business rules.

Data scientists analyze structured and unstructured data to provide meaningful answers to business questions. Using business rules to analyze data has the following benefits:

  • There is no need to rely on developers to create technical scripts in code, such as Jaql, Pig, and Java.
  • Changes to business rules are governed and controlled.
  • Business rules can be easily understood by both business and technical users.

The Integrating Hadoop and IBM Operational Decision Management V8 developerWorks tutorial describes how to integrate the classic rule engine of IBM ODM with Hadoop to create a high performance rule engine for big data. With the RulesAdaptor application described in this tutorial, you can explore extending the solution to run on Bluemix, which retrieves and invokes the IBM Business Rules service through a REST API.

The following graphic illustrates the architecture of the RulesAdaptor application, and how it works with the business rules and data at run time:

architecture of the RulesAdaptor                 application
architecture of the RulesAdaptor application

The following descriptions correlate to the numbers in the graphic:

  1. Defining the ruleset signature to correspond to the data:
    The RulesAdaptor application operates on comma-separated value (CSV) files that are converted to a hash map and ingested by the rule engine. In the RulesAdaptor application, you specify the CSV column names, and in the Rule Designer you create a ruleset signature correlating to the column names. It is mandatory that the ruleset signature is based on one or more of the following native Java types:
    • Integer
    • Double
    • Float
    • String
    • Boolean
    • Date
  2. Building rules, based on the ruleset signature created in step 1.
  3. Deploying the decision service to Bluemix:
    The rules are deployed to the Business Rules service on Bluemix.
  4. Uploading the data to BigInsights for Apache Hadoop to be processed:
    The input data is uploaded to the input folder. The data must be in CSV format.
  5. Configuring and running the RulesAdaptor application:
    The RulesAdaptor application is configured to point to the decision service. CSV input column names should correspond to the input data in step 1. Setting the Bluemix mode to true makes Hadoop run business rules through the Rule Execution Server on Bluemix. Setting the Bluemix mode to false makes each map job run the rules in a dedicated rule engine. A dedicated engine is higher performance but requires additional licensing.
  6. Viewing the results that are stored in the output folder:
    After completion, the output folder contains a single CSV file that contains the results.

This tutorial shows you how to deploy an example rule application to the Business Rules service and run it from the BigInsights for Apache Hadoop service.

The tutorial example creates an airline passenger profiler that vets passengers before they fly. The profiler applies business rules against Passenger Name Records (PNRs). A PNR is created when an airline ticket is purchased, as shown in the following simplified example:

Booking RefpassportNumberdateOfBirthcustomerNameflightDaterouteflightNumber
8972897498P6313429291991-02-23Celia Beck2014-12-31BOCLAXLHRBA944

The ruleset signature corresponding to the PNR is shown in the following screen capture. Note that the order of the parameters is unimportant, but the parameter name and type is critical and must match the data. Dates should be in yyyy-mm-dd format such as 2015-12-31. The ruleset signature does not need to apply to all columns in the CSV file and can use a subset of the columns.

The ruleset signature corresponding to the PNR
The ruleset signature corresponding to the PNR

The Tutorial 1 example project applies two stateless profiling rules.

  1. The first stateless rule checks whether the passenger is flying from Bocas International Airport to JFK and is aged between 20 and 30, as shown in the following screen capture:Rule that checks whether                 the passenger is flying from Bocas International Airport and is aged                 between 20 and 30
    Rule that checks whether the passenger is flying from Bocas International Airport and is aged between 20 and 30
  2. The second rule checks whether the passport is blocked. Blocked passports are stored in a simple list within the rule, as shown in the following screen capture: Rule that checks whether the passport is on a watch list
    Rule that checks whether the passport is on a watch list

The Tutorial 2 example project uses Apache HBase™ to create stateful rules. In the following screen capture of a decision table, you can see that row one checks the passenger is between ages 16 and 60, has been away for at least one month, has a profile score of at least 5, and has flown from Los Angeles (LAX) to Mexico City (MEX) and back again.

Screen capture of example decision table
Screen capture of example decision table

In this scenario the passenger is assigned a Response Code D254 which is sent to the airport authorities at LAX. The LAX authorities then apply a stop and search on this passenger.

What you need for your application

After you download the sample code from GitHub, complete the following steps:

  • Click the Clone or download button and download all files as a .zip file.
  • Open the .zip file and extract the quickstart directory to your local machine.

Tutorial example project 1: Run Hadoop jobs using the Bluemix Rule Execution Server

This first tutorial project shows integration of IBM ODM and Hadoop using the Rule Execution Server REST API. Advantages of this approach are that the integration is simple. There is no need to include the rule engine libraries within the Hadoop job. Also, the licensing is simpler because it is managed by the Rule Execution Server. A disadvantage is that the rules are run on a different server (the Rule Execution Server server), so performance is slower.

The following diagram shows the execution architecture:

Diagram of rule execution architecture
Diagram of rule execution architecture
1

Create Bluemix services

  1. Log in to your IBM Bluemix account, go to the catalog, and add the Business Rules and BigInsights for Apache Hadoop services.
  2. Open the Business Rules service, click the Connection Settings tab, and take note of the user name and the short and long (Basic Auth) passwords, as shown in the following example. You can click Show Password to see the text. Screen capture of the Business Rules Connection Settings tab
    Screen capture of the Business Rules Connection Settings tab
2

Log in to the Rule Execution Server console

  1. Click the Open Console link and log in using the user name and short password that you noted in the previous step.
  2. Go to the Explorer tab and select Deploy RuleApp Archive, as shown in the following example.Screen capture of Rule Execution Server Explorer tab
    Screen capture of Rule Execution Server Explorer tab
  3. In the Deploy RuleApp Archive window, click the Choose file button and select the validatePnrApp.jar file that was downloaded with the sample code. Leave the default versioning policy and click Deploy.
  4. Make a note of the deployed ruleApp version. When deploying the first time, it should be /validatePnrApp/1.0/validatePnr/1.0.
3

Log in to BigInsights for Apache Hadoop

  1. Go to back to the Bluemix dashboard and select the BigInsights for Apache Hadoop service.
  2. Click Manage Cluster and then select the cluster name, which shows you details of your Hadoop cluster, as shown in the following example:Screen capture of the the Manage Cluster                             window for the Apache Hadoop service
    Screen capture of the the Manage Cluster window for the Apache Hadoop service
  3. Make a note of your SSH Host and then login from any Unix shell using the following command:
    ssh [user]@[SSH Host], where [user] is your user name and [SSH Host] is the host name of your cluster.

    Following the same example as the screen capture, the command is
    ssh neddy@bi-hadoop-prod-4162.bi.services.us-south.bluemix.net

  4. When prompted, enter the password that you defined for your Hadoop cluster.
4

Import the quickstart folder

From your local Unix shell, upload the contents of the quickstart folder to your account on Bluemix:
scp *.* [user]@[SSH Host]:/home/[user]

5

Create HDFS directories

  1. Within your account home folder on bluemix, enter the following commands to create the HDFS directory and add data:
    hdfs dfs -mkdir input
  2. Copy the .csv file into the input directory:
    hdfs dfs -put *.csv input
6

Configure the RulesAdaptor application

Edit run,sh and enter the following parameters to configure the RulesAdaptor application. (For this Tutorial 1 example project, you only need to change the parameters indicated with the [square brackets].)

  • Input directory: input
  • Output directory: output
  • Columns: passportNumber,customerName,dateOfBirth,flightNumber,flightDate,route
  • Ruleset version: /validatePnrApp/1.0/validatePnr/1.0
  • Rule Execution Server host: [Host name of your Bluemix Rule Execution Server]
  • Rule Execution Server user: resAdmin
  • Rule Execution Server password: [Business Rules Password]
  • Rule Engine password: "[IBM Business Rules Basic Auth password]"
  • Bluemix mode: true
  • HTTPS: true

See Step 1 to determine the Rule Execution Server host, the Rule Execution Server password, and the Rule Engine password. Rule Execution Server host is the host name on its own without the port number or any part of the Rule Execution Server URL. The Rule Engine password must be enclosed in quotes.

7

Run the RulesAdaptor application

Run the OdmRuleAdaptor: ./run.sh.

The Hadoop map/reduce job is started, and multiple map jobs are created.

Because you are running in Bluemix mode, each map job accesses the same Rule Execution Server decision service. The service is multi-threaded and capable of servicing requests in parallel.

If the example runs successfully, you should see the map/reduce job detect three profiled passengers:

Passenger: Perry T. Adkins fits profile on flight TJ164 route BOCLAX flying on 2014-12-23
Passenger: Jacqueline Russo on watch list.  Flight BZ795 flying at 2014-11-29 passport F607631362
Passenger: Harris Tweed on watch list.  Flight FF594 flying at 2015-08-20 passport D574829235

If the job failed, check the credentials you entered. If they seem correct, check the Yarn logs. See the troubleshooting section at the end of this article for more details.

Tutorial example project 2: Run rules in embedded mode

The second tutorial example project extends the first by running rules in embedded mode. At initialization time, the rules are extracted from the Rule Execution Server, and at execution time the rules are run against an embedded rule engine within each map job.

The benefit of this approach is that the rules are run at the speed of a native Java Hadoop job. Additionally, the rules can access the features of the Hadoop stack, including Apache HBase. In this tutorial, we use HBase as a high-performance database to store passenger information.

Diagram of rule execution architecture of running rules                             in embedded mode
Diagram of rule execution architecture of running rules in embedded mode
1

Log in to the Rule Execution Server console

Follow step 2 in the procedure you used for the tutorial 1 example project, but this time deploy validatePnrAppHbase.jar and select Replace Ruleset version so that the version remains 1.0.

2

Create the database table

  1. Start the HBase shell from UNIX by typing hbase shell.
  2. Input the following commands:
    create 'passengers', 'cf'
    exit
3

Edit runhbase.sh

  1. Edit runhbase.sh.
  2. Change the HBASE_HOME directory to point to the HBase configuration on your server. For example:HBASE_HOME=/usr/iop/4.2.0.0/hbase.
  3. Ensure you have the parameters enclosed in square brackets, as shown in the following list:
    • Input directory: input
    • Output directory: output
    • Columns: passportNumber,customerName,dateOfBirth,flightNumber,flightDate,route
    • Ruleset version: /validatePnrApp/1.0/validatePnr/1.0
    • Host: [Host name of your Bluemix Rule Execution Server]
    • Rule Execution Server user: resAdmin
    • Rule Execution Server password: [Business Rules Password]
    • Rule Engine password: [IBM Business Rules Basic Auth password]
    • Bluemix mode: false
    • HTTPS: true
    Note that the Bluemix Mode is set to false to enable local execution of the rules.
4

Run the RulesAdaptor application

Run the script: ./runhbase.sh.

The Hadoop map/reduce job is started, and multiple map jobs are created. You run this example in embedded mode, so each map job accesses an embedded decision engine. The Java Execution Object Model(XOM) within the odmhadoop.jar accesses HBase to store the passenger state.

If all goes well, you see two profiled passengers:

T023: Benjamin Wellcorn fits profile on flight BA343 route DAMLHR flying 2017-04-04
D345: Nigel T. Crowther fits profile on flight BA123 route AMSLHR flying 2017-03-04

If the job failed, see the troubleshooting section at the end of this article for more details.

You can use the HBase shell to view profiled passengers by their passport number. Enter the following commands from the Unix command line:

hbase shell
scan 'passengers'

To clear the database, run the following commands from the HBase shell:

disable 'passengers'
drop 'passengers'

Troubleshooting IBM ODM Applications on Hadoop

  • Make sure you are using the correct Rule Execution Server credentials. There are two Rule Execution Server passwords a long one and a short one. Ensure that the long one (the Basic Auth password) is surrounded by quotes like this: "Basic cmVzQWRtaW46MWh2MGZ0bG0wajl5cw=="
  • Ensure that the host name you are providing in the run script does not include any part of the Rule Execution Server URL. For example, if the Rule Execution Server URL is https://brsv2-7a461af4.ng.bluemix.net/res, the host is brsv2-7a461af4.ng.bluemix.net.
  • Use the Ambari logs to view problems. Click Launch Console. Then in the Bluemix Ambari console, navigate to YARN > Resource Manager UI. Navigate down to your map jobs to view the output.
  • Test your decision service using a simple web service call or using DVS tests before running on Hadoop. This approach helps you work out simple rule problems before moving into Hadoop, where problems are harder to diagnose.
  • When testing your application on Hadoop, use small datasets.
  • Develop on UNIX, not Windows. Windows can cause file incompatibility issues.
  • Use Logger output from the rules and view this in the Hadoop logs.
  • Use the HBase shell to view the state of your persisted data.

Conclusion

This tutorial presented a solution for integrating Business Rules and BigInsights for Apache Hadoop services on IBM Bluemix. It demonstrated a BigInsights for Apache Hadoop application called RulesAdaptor that combines these technologies. The first example project walked through using IBM Bluemix in the cloud to run the RulesAdaptor application against passenger name records, using the Rule Execution Server REST API. In the second example project, you ran rules in embedded mode and used Apache HBase to store passenger data.

Now that you successfully ran your first sample IBM ODM and Apache Hadoop programs, you can start to look at developing your own. Start by importing the sample code with Rule Designer and then build and deploy the rules to get a feel for what it is doing. Try the Business Rules service, try the BigInsights for Apache Hadoop service, and contact the author with any questions.

Acknowledgements

The author would like to thank Jonathon Carr, Pierre Berlandier, and Duncan Clark for reviewing this tutorial.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Middleware, Cloud computing, Cognitive computing
ArticleID=988510
ArticleTitle=Think big! Scale your business rules solutions up to the world of big data
publish-date=03162017