By Laurent Grateau and Pierre Feillet

Introduction

IBM® Operational Decision Manager (ODM) empowers business users and developers to collaborate when modeling, authoring, testing, and deploying business rules to automate their business policies. Customers use ODM to automate their decision making while providing governance and agility in their decision change management. Rule execution components are usually scaled out in clusters or farms of servers running on bare metal, virtual machines, or containers.
In the ODM platform, the Rule Execution Server (RES) console empowers users to deploy new versions of a decision service dynamically and notify all connected rule engines to pick up the latest version. Although you do not need a RES console instance up and running for rule execution, having the RES console can be quite useful as it offers features such as:

  • Deploying new versions of decision services
  • Notifying decision services to execution components
  • Gathering statistics for rule execution

The RES console (V8.9.0 or earlier) is a stateful web application implemented on JMX. It is a common practice to have a single RES console and have it restarted when necessary in the deployed topologies.
This tutorial shows you how to configure RES console instances for high availability (HA). The deployment pattern consists of active and passive RES console instances connected with execution units (XU) through a virtual IP address. This solution automates switching from one RES console instance to another when necessary for your decision services with transparency.
The goal is to keep deployment and notification capabilities running so that your decision services can be updated in production without interruption.
This configuration has been tested with the ODM Decision Server Standard V8.7.1 and V8.9.0 running on the WebSphere® Application Server Liberty. It is expected to work similarly with other ODM versions that support RES TCP/IP notification and other supported application servers.

What you need to build your RES console cluster for HA

Prerequisites
This tutorial is for developers and IT engineers who are familiar with the Decision Server Rules architecture.  The following components are required:

  • A primary Linux server named primary server
  • A standby Linux server named secondary server
  • Keepalived package installed on both primary and secondary servers
  • An available IP address for the VRRP named VIP address
ODM RES console high availability topology

Step 1. Installing and configuring primary and secondary RES consoles

Install and configure the primary server:

  • Install a Liberty server
    Note – This article uses Liberty for the examples, but you can apply the steps to other supported application servers.
  • Configure the RES console in TCP/IP mode. Follow the procedure described in Configuring the Rule Execution Server EAR for TCP/IP management in IBM Knowledge Center.
  • Deploy the RES console. Follow the procedure described in Deploying the Rule Execution Server management WAR in IBM Knowledge Center.
  • Open the RES console in a web browser with the URL https://<IPServer>:<LibertyPort>/res .
    • Log in to the RES console.
    • Perform a diagnostic.

Repeat the preceding steps to install the secondary RES console.

Step 2. Installing the Keepalived package

Install the open source package Keepalived to set up VRRP on the primary and secondary servers.
For more information about Keepalived, see the Keepalived documentation.
Installing the Keepalived package on Red Hat Enterprise 7.2 Linux
Run the following command to install the Keepalived package and required dependencies:
yum install keepalived
Installing the Keepalived package on Debian
Run the following command to install the Keepalived package and all the required dependencies by using Debian’s APT package handling utility:
apt-get install keepalived

Step 3. Configuring Keepalived in the primary server

1. In the primary server, copy the following content into the /etc/keepalived/keepalived.conf file:

vrrp_script chk_res {
script "/usr/bin/wget -O /dev/null -o /dev/null 'http://localhost:9080/res' "
interval 2
}
vrrp_instance VI_TEST_1 {
# see man keepalived.conf.
state MASTER
priority 200
preempt
interface <Interface>
virtual_router_id 255
advert_int 2
# Admin could be notified by mail.
# smtp_alert
virtual_ipaddress {
<VIPAddress>/22 brd <BCast> dev  <Interface> scope global
}
track_script {
chk_res
}
}

Edit the file and change the following placeholders:

  • <VIPAddress>: Your VRRP IP Address.
  • <Interface>: The network interface. (Example: EthO1). Can be retrieved with ifconfig.
  • <BCast>: The broadcast address. (Example: 20.215.255). Can be retrieved with ifconfig. Bcast field.

2. Start the Keepalived service by running the keepalived start command.
3. See the log messages and verify that there is no error.
> tail –f /var/log/messages.log
You should see something similar to the following messages:

May 11 09:44:14 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Script(chk_res) succeeded
May 11 09:44:15 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Instance(VI_TEST_1) Transition to MASTER STATE
May 11 09:44:17 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Instance(VI_TEST_1) Entering MASTER STATE
May 11 09:44:17 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Instance(VI_TEST_1) setting protocol VIPs.
…

Step 4. Configuring Keepalived in the secondary server

1. In the secondary server, copy the following content into the /etc/keepalived/keepalived.conf file:

vrrp_script chk_res {
script "/usr/bin/wget -O /dev/null -o /dev/null 'http://localhost:9080/res' "
interval 2
}
 
vrrp_instance VI_TEST_1 {
# see man keepalived.conf.
state BACKUP
priority 100
nopreempt
 
interface <Interface>
virtual_router_id 255
advert_int 2
# Admin could be notified by mail.
# smtp_alert
virtual_ipaddress {
<VIPAddress>/22 brd <BCast>  dev  <Interface> scope global
}
track_script {
chk_res
}
}

Edit the file and change the following placeholders:

  • <VIPAddress>: Your VRRP IP Address.
  • <Interface>: The network interface. (ex: EthO1). Can be retrieved with ifconfig.
  • <BCast>: The broadcast address. (ex:20.215.255). Can be retrieved with ifconfig. Bcast field.
  • <CheckpointURL>: URL where the RES console is running. Example: http://localhost:9080/res

2. Start the Keepalived service by using the keepalived start command.
3. See the log messages and verify that there is no error.
> tail –f /var/log/messages.log
You should see something similar to the following messages:

May 11 09:42:57 vtt-odm034 Keepalived_vrrp[10533]: VRRP_Instance(VI_TEST_1) Entering BACKUP STATE
May 11 09:42:57 vtt-odm034 Keepalived_vrrp[10533]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
May 11 09:42:57 vtt-odm034 Keepalived_healthcheckers[10532]: Using LinkWatch kernel netlink reflector...
May 11 09:42:57 vtt-odm034 Keepalived_vrrp[10533]: VRRP_Script(chk_res) succeeded
May 11 09:44:11 vtt-odm034 Keepalived_vrrp[10533]: VRRP_Instance(VI_TEST_1) Transition to MASTER STATE

Step 5. Testing the RES console failover

You have completed steps 3 and 4 successfully and Keepalived is running correctly on the primary and secondary servers.
Verify that the RES console is correctly configured for failover.

  1. Open the RES console. https://<VIPAddress>:<LibertyPort>/res
    • Log in to the RES console.
    • Perform a diagnostic.
  2. In the primary server, stop the Liberty server.

In the primary server, run the command tail –f /var/log/messages.log
The following messages should be displayed:

May 11 10:18:46 vtt-odm007 Keepalived_vrrp[18639]: pid 26693 exited with status 4
May 11 10:18:46 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Script(chk_res) failed
May 11 10:18:47 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Instance(VI_TEST_1) Entering FAULT STATE
May 11 10:18:47 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Instance(VI_TEST_1) removing protocol VIPs.
May 11 10:18:47 vtt-odm007 Keepalived_healthcheckers[18638]: Netlink reflector reports IP 9.20.214.11 removed
May 11 10:18:47 vtt-odm007 Keepalived_vrrp[18639]: VRRP_Instance(VI_TEST_1) Now in FAULT state
May 11 10:18:48 vtt-odm007 Keepalived_vrrp[18639]: pid 26700 exited with status 4
Stopping the RES console 1 on the primary server
In the secondary server, run the command tail –f /var/log/messages.log
The following messages should be displayed:

May 11 10:26:48 vtt-odm034 ntpd[1968]: Deleting interface #14 eth1, 9.20.214.11#123, interface stats: received=0, sent=0, dropped=0, active_time=476 secs
May 11 10:26:49 vtt-odm034 Keepalived_vrrp[10533]: VRRP_Instance(VI_TEST_1) Transition to MASTER STATE
May 11 10:26:51 vtt-odm034 Keepalived_vrrp[10533]: VRRP_Instance(VI_TEST_1) Entering MASTER STATE
May 11 10:26:51 vtt-odm034 Keepalived_vrrp[10533]: VRRP_Instance(VI_TEST_1) setting protocol VIPs.
Switching to the RES console 2 on the secondary server
Refresh your browser. You are now redirected to the RES console login page.

  1. Log in to the RES console.
  2. Perform a diagnostic.
  3. Click Update RuleApp button in the Server Info tab to refresh the RuleApps from the database. You should see the RuleApps deployed through the RES console 1.Screen shot of the the RES console showing the RuleApps button
  4. In the primary server, restart the Liberty server.
  5. The RES console in the primary server should take over and it will be active again.

Conclusion

You have learned how to configure a highly available RES console environment by using a virtual routing. The HA RES console pattern has been tested with WebSphere Application Server Liberty and the Keepalived tool. You can apply this configuration to other application servers supported by ODM and other VRRP implementations such as F5.

Further information

Configuring for high availability is also discussed in the IBM Knowledge Center for certain app servers:

Glossary of terms used in this article

High availability
High availability is the maximum system uptime. Terms stated in SLAs determine the degree of high availability in the system. A system that is designed to be highly available withstands failures that are caused by planned or unplanned outages.
Uptime
Uptime is the length of time when services or applications are available.
Failover
Failover is the process in which one or more server resources are transferred to another server or servers in the same cluster because of failure or maintenance.
Primary (active) server
A primary or active server is a member of a cluster, which owns the cluster resources and runs processes against those resources. When the application is not working, the ownership of these resources stops and is switched to the standby server.
Standby (secondary, passive, or failover) server
A standby server, also known as a passive or failover server, is a member of a cluster that is capable of accessing resources and running processes. However, it is in a state of hold until the primary server is compromised or has to be stopped.
 Active/passive configuration
An active/passive configuration consists of a server that owns the cluster resources and other servers that are capable of accessing the resources that are on standby until the cluster resource owner is no longer available.
The resources are configured to run on the active node. When the cluster is started, only the active node serves the resources. The passive nodes are running but do not have any resources in production.
 VRRP: Virtual Router Redundancy Protocol
VRRP (Virtual Router Redundancy Protocol) is a protocol that offers high availability for a network (or subnetwork).
 Keepalived
Keepalived is a Linux package that leverages VRRP to deliver high availability among Linux servers. It also delivers load-balancing services, but this article concentrates on getting started with just the virtual routing feature.

Join The Discussion

Your email address will not be published. Required fields are marked *