Think Big with Decision Composer on IBM Cloud

Build a codeless app that combines Decision Composer and Apache Hadoop on IBM Cloud

Comments

Traditionally, business-rule applications process a few megabytes of data as client server requests, one record at a time. Generally, this works fine, but as solutions are increasingly becoming cloud based, the data is measured in terabytes, not megabytes. Apache Hadoop, a distributed data processing framework, evolved to meet these demands. But for data scientists familiar with Hadoop, another requirement has emerged: being able to model, build, and test decisions quickly against these large data sets. This is where Decision Composer and its graphical decision modelling environment stands out. Note: At the time of this writing, Decision Composer is experimental.

"This tutorial opens up the possibility for data scientists to analyze big data using graphical decision models instead of code."

By combining Decision Composer and Hadoop on IBM Cloud, you can scale your rule solutions up to the world of big data and rapidly develop decision models to analyse large data sets in the cloud without any coding.

This tutorial provides the glue you need to integrate decision models created in Decision Composer and run these models against big data on Hadoop. It opens up the possibility for data scientists to analyze big data using graphical decision models instead of code and brings the following benefits:

  • There is no need for technical scripting. You build and test graphical decision models in Decision Composer and then run them against big data.
  • Decision Composer automatically versions and controls changes to decision models.
  • Both business and technical users can understand decision models.

The following figure illustrates the integration architecture of Decision Composer, the Business Rules service, and Hadoop:

information architecture
information architecture

The following descriptions correlate to the numbers in the figure:

  1. Build and test your decision model using Decision Composer.
  2. Deploy the executable decision model to the Business Rules service.
  3. Upload data to the Hadoop input folder.
  4. Configure and run the Hadoop job:
    The job is configured to point to the Business Rules service and the Hadoop input and output directories. Set the Cloud mode to true to execute the decision model through the Business Rule service. Set the Cloud mode to false to execute the rules in a dedicated rule engine within each Map job. A dedicated engine is higher performance but requires additional licensing.
  5. View and analyze the Hadoop execution results in the output folder.

What you'll need

You will see the following page in your browser:

GitHub repo
GitHub repo

Complete these steps:

  1. Click the Clone or download button and download as a file.
  2. Open the .zip file and extract the quickstart directory to your local machine.

Example 1: Run the sample project using REST

Create the Decision Composer project

  1. Log into IBM Cloud and go to the Decision Composer URL: https://decision-composer.mybluemix.net
  2. Within the Decision Composer initial screen, select the Load Project option: New project, Import sample(s), Load project (highlighted)
    New project, Import sample(s), Load project (highlighted)
  3. Select the PNR_Check.dproject provided in the quickstart folder. The following figure shows what the project looks like: PNR check score passenger
    PNR check score passenger

This Decision Model profiles airline passengers before they fly. The model applies scoring rules against Passenger Name Records (PNRs). If the score reaches a defined threshold, then the passengers are stopped.

The model defines input data (green ellipses) that feed into decision nodes (blue rectangles). The Score Passenger decision node contains a decision table to score passengers and is shown in the following figure:

Decision model table showing score of passengers
Decision model table showing score of passengers

The Stop Passenger decision node includes a single action rule that determines whether a passenger should be stopped based on their score:

the stop                     passenger decision node rule
the stop passenger decision node rule

You can test this Decision service within Decision Composer:

  1. You are currently in Design mode. Switch to Test mode by clicking Run.
  2. Within the Saved inputs panel (shown below), click Stop Fred. You should see the following screen with the Input prepopulated: saved                     inputs
    saved inputs
  3. Now, click the Run Decision icon (the circle containing an arrow) to execute. After a few moments, you should see the following screen showing that the passenger should be stopped: profile match for Fred Flanders stop passenger
    profile match for Fred Flanders stop passenger

Now that you have run the decision service within Decision Composer, it is time to run the same service within Hadoop.

Create Business Rules service

  1. Log in to your IBM Cloud account, navigate to the Catalog, and add the Business Rules service.
  2. Open the Business Rules service and click the Connection Settings tab. Take note of the username and the short and long (Basic Auth) passwords, as shown in the following example. You need to click Show Password to see the text.
Business rules connections settings
Business rules connections settings

Deploy the Decision service

  1. Go back to the Decision Composer Project view, click the three dots icon button to the right of the Passenger Name Record Check Project, and click Deploy: Deploy Passenger                     Name Record Check Project
    Deploy Passenger Name Record Check Project
  2. Enter the RES URL, the RES username, and the Res password as determined in the previous step. You should see the following screen (but it should contain your RES URL and login credentials): Deploy Passenger Name Record Check Project
    Deploy Passenger Name Record Check Project
  3. Click Deploy.
  4. Make a note of the deployed ruleset path. When deploying the first time, it should be: /PNRCheckRuleApp/1.0/PNRCheck/1.0

Create an instance of Hadoop using the Analytics Engine

  1. Go to back to the IBM Cloud dashboard, go to the Catalog, and search for the Analytics Engine service.
  2. Select the service and click Configure. Note that Hadoop is included as one of the default components. Click Create: Software package AE 1.0 Spark selected
    Software package AE 1.0 Spark selected
  3. After the service is created, click the Service Credentials tab in the left panel, select New Credential, and click Add.
  4. After the credentials are created, click View Credentials, which shows the credentials specific to your Analytics cluster:
     "cluster": {
        "cluster_id": "20171107-205445-503-IAPgshwm",
        "user": "clsadmin",
        "password": "XXXXXXXXXXX",
        "service_endpoints": {
          "ambari_console": "https://chs-mfw-836-mn001.bi.services.us-south.bluemix.net:9443",
          "livy": "https://chs-mfw-836-mn001.bi.services.us-south.bluemix.net:8443/gateway/default/livy/v1/batches",
          "oozie_rest": "https://chs-mfw-836-mn001.bi.services.us-south.bluemix.net:8443/gateway/default/oozie",
          "notebook_gateway": "https://chs-mfw-836-mn001.bi.services.us-south.bluemix.net:8443/gateway/default/jkg/",
          "webhdfs": "https://chs-mfw-836-mn001.bi.services.us-south.bluemix.net:8443/gateway/default/webhdfs/v1/",
          "ssh": "ssh clsadmin@chs-mfw-836-mn003.bi.services.us-south.bluemix.net",
          :
        }
      }
  5. Make a note of the user, password, and ssh (highlighted in the previous figure).
  6. Verify your ssh credentials from any UNIX shell using this command: ssh [user]@[SSH Host], where [user] is your user name and [SSH Host] is the host name of your cluster. Using the text from the screen capture, the command is: ssh clsadmin@chs-mfw-836-mn003.bi.services.us-south.bluemix.net
  7. When prompted, enter the password.

Import the quickstart folder

From your local UNIX shell, upload the contents of the quickstart folder to your home account on IBM Cloud: scp *.* [user]@[SSH Host]:/home/wce/clsadmin

Create HDFS directories

  1. Within your account home folder on IBM Cloud, enter the following commands to create the HDFS directory and add data:
    hdfs dfs -mkdir input
  2. Copy the .json file into the input directory:
    hdfs dfs -put pnr.json input

Configure the Hadoop Job

Edit run,sh and enter the following parameters to configure the Hadoop Job. For this Example 1 project, you are only required to change the parameters within the [square brackets].

  • Input directory: input
  • Output directory: output
  • Ruleset version: /PNRCheckRuleApp/1.0/PNRCheck/1.0
  • Rule Execution Server host: [Host name of your IBM Cloud Rule Execution Server]
  • Rule Execution Server user: resAdmin
  • Rule Execution Server password: [Business Rules Password]
  • Rule Engine password: "[IBM Business Rules Basic Auth password]"
  • Cloud mode: true
  • HTTPS: true

See the section on creating the Business Rule Service to determine the Rule Execution Server host, the Rule Execution Server password, and the Rule Engine password. Rule Execution Server host is the host name on its own without the port number or any part of the Rule Execution Server URL. The Rule Engine password must be enclosed in quotes.

Run the Hadoop Job

Run the Hadoop job:
./run.sh

The Hadoop map/reduce job starts, and multiple map jobs are created.

Because you are running in Cloud mode, each map job accesses the Rule Execution Server decision service via the REST API. The service is multithreaded and capable of servicing requests in parallel.

If the example runs successfully, you should see the job detect one passenger:
Profile Match for Fred Flanders on flight 2017-10-22T12:28:52. Passport L87343411

If the job failed, check the credentials you entered. If they seem correct, check the Yarn logs. See the Troubleshooting section for more details.

Example 2: Run the sample project using the embedded rule engine

In Example 1, you invoked the Business Rules service REST API for every row of data. The Business Rules service is multithreaded and can handle several map jobs running in parallel. However, REST calls to the Business Rules service are not efficient for high volume batch processing. To run Hadoop in full power, the rule engine must be embedded within the map job. Embedded mode gives better performance over Cloud mode because the rules and data are run within the same job. There is no call to the Business Rules service other than an initial fetch to get the rules and XOM.

Configure the batch job for embedded mode

Like you did in Example 1, edit run.sh and change the Cloud parameter to false:
Cloud mode: false

Run the batch job in embedded mode

Run the Hadoop Job:
./run.sh

This time, you should see the same output as in Example 1 but the execution time is significantly faster for large datasets. For the sample data provided, there is little difference in execution time because only a few rows of data are provided.

Where next

Now that you have run your first Decision Composer Model on Hadoop, you can look at developing your own models. The adaptor code provided in the sample should work with any Decision service generated by Decision Composer. I suggest that you first import the Decision Composer samples to get a feel for how things work, and then progress onto building your own model.

Contact the author if you need any assistance.

Troubleshooting ODM Applications on Hadoop

  • Make sure that you are using the correct RES credentials. There are two RES passwords: a long one and a short one. Ensure that quotes surround the long one (the Basic Auth password), like this: "Basic cmVzQWRtaW46MWh2MGZ0bG0wajl5cw=="
  • Ensure that the host name you provide in the run script does not include any part of the RES URL. For example, if the RES URL is https://brsv2-7a461af4.ng.bluemix.net/res, the host is brsv2-7a461af4.ng.bluemix.net.
  • Use the Ambari logs to view problems. Click Launch Console, and then in the Ambari console, navigate to YARN > Resource Manager UI. Drill down to your Map jobs to view the output.
  • Test your Decision service using a simple web service call or using DVS tests before running on Hadoop. This helps you iron out simple rule problems before moving into Hadoop where problems are harder to diagnose.
  • When testing your application on Hadoop, use small datasets.
  • Develop on UNIX, not Windows®, because Windows can cause file incompatibility issues.
  • Use print statements from the rules and view this in the Hadoop logs.
  • Ensure your Hadoop cluster and the Business Rules service are located in the same region.

Conclusion

In this tutorial, you explored a solution for integrating IBM Decision Composer and Hadoop on IBM Cloud. It provided a Hadoop job that executed a Profiling Decision service generated by Decision Composer. The first example ran the Decision service using the Business Rules service REST API. The second example ran the Decision service in embedded mode to deliver maximum performance.

I encourage you to extend this tutorial to use Hadoop to execute your own decision services created by Decision Composer.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing, Information Management, Cognitive computing
ArticleID=988510
ArticleTitle=Think Big with Decision Composer on IBM Cloud
publish-date=12042017