Install and configure a Decision Server Insights reference topology

The new IBM ODM Advanced V8.7 Decision Server Insights product is designed to enable businesses to detect and respond to risks and opportunities at the earliest possible stage. Decision Server Insights receives data, in the form of events originating from multiple sources such as business systems, social networks, mobile devices and sensors. To derive actionable insights that enable informed decision making, events are processed as they happen, using analytics capabilities and business rules that act on the data from the events and on the data stored in memory.

To accomplish this type of processing in an efficient way, Decision Server Insights requires a runtime platform that is capable of handling large numbers of events, typically in the order of tens of thousands per second, and that can scale in a linear way. The distributed data grid capabilities of WebSphere eXtreme Scale and the WebSphere Application Server Liberty Profile application server provide the right mix of non-functional capabilities to enable in-memory transactional processing at this scale.

When you complete this tutorial, you should be able to install and configure a Decision Server Insights runtime production environment that meets your non-functional requirements, including high availability, security and recoverability. This tutorial shows how to achieve non-functional capabilities in the context of the Decision Server Insights reference topology.

Motivation for a reference topology

To be clear, there is no "one-size-fits-all" topology that is capable of addressing all customer requirements. Different applications might have different levels of non-functional requirements and risk tolerance. The motivation for describing a reference topology for Decision Server Insights is twofold: 1) to address the "typical" set of non-functional requirements in a balanced way while taking into account any software limitations of IBM ODM and the software it runs on or depends on, and 2) to provide insight into the requirements, parameters and capabilities that determine and constrain the preference for a particular topology.

WebSphere eXtreme Scale basic concepts

This section introduces the minimum set of WebSphere eXtreme Scale concepts required to understand the configuration of the reference topology.

WebSphere eXtreme Scale is an "elastic" in-memory data grid. To be able to configure this grid, at the very minimum, you must be familiar with the following concepts:

  • Map: A map is a simple data structure consisting of key-value pairs.
  • Map set: A map set is a collection of logically related maps that can be partitioned and replicated over a number of servers.
  • Grid: In WebSphere eXtreme Scale, a grid is a collection of map sets that might span multiple Java virtual machines and that you can connect to and access data.
  • Partition: A partition comprises a subset of the data contained in the map set plus any replicas that each subset might have. The number of partitions (n) is a configurable attribute of a map set. The map set's data is spread across the n partitions based on the key object's hashcode() value.
  • Replica: Replication provides the redundancy mechanism by means of which high availability is achieved in a WebSphere eXtreme Scale environment. A replica is a copy of the primary data in a partition that is stored remotely with respect to the primary and other replicas. A synchronous replica is updated transactionally when the primary is updated, thus ensuring no loss of data if the primary data is lost. An asynchronous replica is not updated transactionally with respect to the primary and no guarantee is given that an asynchronous replica are identical to the primary data.
  • Shard: A shard provides the physical storage for the primary or for each replica in a partition. In a partition there is always a primary shard and there might be one or more replica shards depending on the degree of high availability needs.
  • Grid container: A grid container is a container for the shards.
  • Grid container server: A grid container server is a Java Virtual Machine that runs WebSphere eXtreme Scale and hosts one or more grid containers. Decision Server Insights uses a WebSphere Application Server Liberty profile as its grid container servers.
    Note: For Decision Server Insights, the preferred grid topology has only one grid container per grid container server. In addition, only one grid container server runs on each host machine. For this reason, this tutorial uses the term container to refer to both the grid container server and grid container, except in cases where appropriate differentiation is required. This tutorial refers to the host machine as the container host.
  • Catalog service: SSome intelligence is required to keep track of partitions and shards, to redistribute shards when a container joins or leaves the grid, to monitor the health of the grid, and to provide data location services to grid clients. These actions are made available through a catalog service. One or more catalog servers coordinate work among the servers to provide the catalog services. When there is more than one catalog server, one of them has special responsibilities and it is known as the master or primary catalog server. The group of catalog servers, together with the group of container servers that they oversee, constitute a catalog service domain.
  • Quorum: An agreement between members of the catalog server group on what needs to be done is required for grid lifecycle operations. For example, when a network brownout occurs, communication between catalog servers might be lost, and more than one catalog server becomes the primary server. This situation is known as split-brain syndrome. If quorum is enabled, grid work is suspended when a situation like this occurs, but recovery from the loss of quorum typically requires manual intervention. For details, see Catalog server quorums in the WebSphere eXtreme Scale V8.6 documentation. As of WebSphere eXtreme Scale V8.6, the introduction of majority quorum means that as long as the majority (more than half) of the members in a catalog service group remain active and aware of each other, quorum is achieved, and grid work can continue to be performed. This is one good reason to have 3 catalog servers instead of 2 (or any odd number of servers instead of an even number).
  • Catalog server cluster endpoints: Containers use catalog server cluster endpoints, which are included in the container server configuration, to establish a communications link with the catalog servers. They become part of the catalog service domain, which means they become part of the grid.

Decision Server Insights basic concepts

A detailed view of the capabilities of Decision Server Insights is outside the scope of this tutorial. For more information, see the IBM Operational Decision Manager Advanced V8.7 Decision Server Insights product documentation. This tutorial is limited to presenting to the minimum set of basic concepts that are relevant to o the reference topology.

Decision Server Insights solutions, which can be considered as a type of application, store entities (data objects) in the grid and process events that are typically correlated to those entities. The events are routed to the container where the data resides with a correlation ID. In the container, they are processed by associated rule, Java or predictive scoring (IBM SPSS Statistics) agents.

Events from external systems can be submitted to the Decision Server Insights runtime environment directly through the Solution Gateway API (using Java) or indirectly through inbound connectivity. For inbound connectivity, configure one or more Liberty profile servers as inbound connectivity servers to act as a bridge between external messaging endpoints and the Solution Gateway API. Outbound events that target external systems are routed to one or more outbound connectivity servers.

Decision Server Insights reference topology

The Decision Server Insights reference topology is a single site topology. Zoned or multi-master topologies, while desired in many instances, are outside the scope of this tutorial.


The reference topology is based on the following goals, which are conditioned by the capabilities of the software:

  • The grid must be highly and continuously available in a "normal" operation mode.
  • The grid must withstand the loss of one container server without any loss of data and without loss of access to data.
  • The grid must tolerate the controlled shutdown of one container at a time for the purpose of applying "rolling updates" or for any other maintenance activity.
  • The grid should accept and use new containers (within a limit that is determined by the configured number of partitions and replicas).
  • The grid should tolerate the simultaneous loss of 2 containers with the accepted risk of some data loss.
  • The catalog service should be highly available and rely on majority quorum.
  • The grid data must be recoverable in the case of a disaster where more than 2 containers are lost (for example, a power outage). This example, you accept the risk of some data loss.
  • Grid data must be capable of being recovered without loss in the case of a controlled shutdown of the grid (for example, if there is a need to increase the number of partitions).
  • The system should tolerate the failure of at least one inbound and one outbound connectivity server.
  • Event throughput should not suffer significantly as a result of the previous requirements.

These goals are used to define the Decision Server Insights reference topology, which is shaped in broad terms by the inbound and outbound connectivity servers, the catalog servers, container servers and a database that is used for persistence of grid data to disk storage media.

Inbound and outbound connectivity servers

Inbound and outbound connectivity servers are responsible for the event flow in and out of the system, so there should be some redundancy for these servers. A good starting point is to use one of each on two different machines, which is the minimum requirement to avoid a single point of failure. There is no rule of thumb for determining the actual number of inbound or outbound servers that are required for a particular event throughput. The number of inbound and outbound servers varies, based on many parameters, including the size of the event data, the type of event transformations that might be performed, whether or not event persistence is used for Java Messaging Service (JMS), and the type of protocol used (for example, HTTP or JMS). For best results, start with the two servers and monitor resource usage (for example, CPU and memory). For high-event rates, you can keep adding inbound and outbound connectivity servers to the same machine until the machine's ideal resource consumption threshold is reached. After the threshold is reached, you can add new machines if needed.

Catalog service

A previous section discussed the catalog service and why three or more catalog servers are required in a production environment. If you choose to use three catalog servers, they should be on separate machines for high availability reasons.

Container servers

The number of required container servers depends on the number and type of solutions running on the grid and the volume of events that they process. This number can grow or shrink, but typically there is more than one in a production environment (to support high availability requirements).

Based on the topology goals, you can infer that, in addition to the primary data, you need both a synchronous replica and an asynchronous one. This is not only to satisfy your high availability goals but also to ensure that "rolling updates", or any maintenance activity performed one container server at a time, is done in an efficient way. For example, if the primary data is lost, the synchronous replica, which is identical to the primary, immediately becomes the primary. Likewise, the asynchronous replica can then very quickly be promoted to become the synchronous replica and you don't have to wait for a primary replica to be recreated, which would be the case if you had no additional replicas.

Based on these observations, ideally you should have a minimum of 4 containers running on 4 remote container host machines. With 4 containers you can stop one container for maintenance and still have the level of redundancy required with the 3 remaining machines. This is important because the catalog service will not place a primary or replica shard that belong to the same partition in the same virtual machine or in the same IP address if the development mode flag is set to false.

The maximum number of containers that you can dynamically add to the grid is constrained by the number of partitions and the number of replicas that you configure in each partition. This constraint stems from the fact that the number of partitions can only be changed if the grid is down and also because it doesn't make sense to have more container servers than the maximum number that can be used. This number is determined by the number of partitions x (1 + the number of replicas). For example if the number of partitions is 20 and you have 2 replicas, that means that you have 20 primaries + 20 x 2 replicas (= 20*(1+2)), which results in a maximum of 60 containers that can be added to the grid and that are actually used (with one replica or primary shard on each machine).

With this constraint in mind, choosing a large number of partitions seems to facilitate growth and allow for better balancing of memory resource. However, you must balance this against the greater number of threads, increased grid communication overhead, and greater level of balancing work, whenever the grid changes. When the number of partitions is large and the number of containers is small, the amount of grid reconfiguration work can be significant when removing or adding a container.

In practice, given the recommended one container server per machine, it's a good rule of thumb to set the number of partitions to the maximum number of containers that you expect to have (C) times the number of cores of the container host (M) times 2 (so C x M x 2). This assumes that you have an equal number of cores in each container host machine. In addition, given that certain WebSphere eXtreme Scale operations are more efficient if the number of partitions is a prime number, it's a good practice to round that number to the next prime number. For example, assuming that you expect to have a maximum of 6 container server hosts, each with 4 cores, a good value for the number of partitions would be: nextPrime(6 x 4 x 2) = 53. The default product configuration for the number of partitions is 127.

Figure 1 shows a grid with 3 containers and n partitions, each with a primary and 2 replica shards. With a large value of n there will also be a large number of partitions per container. As you add containers, the partitions are distributed across the new containers thus reducing the number of partitions per container.

Figure 1. Grid with 3 containers and n partitions

Database persistence

Decision Server Insights operates on an in-memory data grid. However, in a single site topology, persistence of grid data to a database is necessary to enable recovery from a disaster or to enable any rare maintenance activities that require the grid to be shut down. You have a choice between synchronous (write-through) or asynchronous (write-behind) database persistence modes. The use of synchronous persistence needs to be pondered carefully, because it typically implies a high performance penalty. On the other hand, asynchronous persistence performs the database updates in batches and carries a small performance penalty. The frequency of database updates might vary depending on the level of accepted risk and it is set in terms of a time interval and the number of batched updates using a parameter called writeBehind. For example, a writeBehind configuration value of 'T20;C200' means that a write to the database happens every 20 seconds or every time the number of pending updates reaches 200, whichever condition happens first.

Note that in a controlled shutdown of the grid there should be no data loss, provided that the following conditions are met:

  • The flow of incoming events is first stopped.
  • All events are processed.
  • Any outstanding batched updates are written to the database.

Wait to shut down the grid until these conditions are met.

For the reference topology, this tutorial uses an asynchronous database persistence configuration. Unless it is critical to have no data loss when a disaster occurs, it's not worth paying the performance and through-put penalty of the synchronous persistence option.

WebSphere eXtreme Scale can queue the database updates, allowing for some database downtime, but ideally the database should be highly available (for example, by using DB2 high availability disaster recovery). For very high event loads, consider using a highly scalable database to ensure that the database does not become a bottleneck.

Figure 2 shows the Decision Server Insights reference topology as described.

Note that the example uses a different physical server for each Decision Server Insights server in the topology, which is a configuration that provides a high degree of isolation between components. The high availability and recoverability objectives can be met with a minimum of 3 physical servers, not counting the database. However, a topology with 4 servers would be more appropriate as a minimal high availability topology. Such a topology allows you to have 4 container servers, one in each host machine. And 3 of those machines can have a catalog server. Additionally, you should have inbound and outbound servers on at least 2 of the 4 physical servers.

Figure 2. Decision Server Insights reference topology

Installation and configuration

To install and configure the Decision Server Insights reference topology, use the following strategy:

  1. Install the IBM Installation Manager on each server to enable Decision Server Insights to be installed and updated using the Installation Manager.
  2. Use the Installation Manager to install Decision Server Insights on each of the required servers. In this tutorial example, there are 12 machines. Three of those machines are used for the catalog cluster, 4 are used as container servers and there are 2 outbound and 2 inbound servers. The remaining machine (referred to as the prototypes machine) is used for creating your server prototypes. Its purpose is to help with the deployment of new servers and nothing else. The prototypes are used to clone the servers to each of the 11 machines in the topology and to any additional machines that might be added in the future.
  3. In the prototypes machine, create and configure the catalog servers:
    1. Create a catalog server prototype using the cisCatalog template.
    2. Configure the bootstrap properties file for that server.
    3. Configure the Secure Sockets Layer (SSL).
    4. Configure the user registry and roles.
    5. Clone the catalog server to the 3 machines that will host the catalog cluster.
  4. In the prototypes machine, create and configure the container servers:
    1. Create a container server prototype using the cisContainer template.
    2. Configure the bootstrap properties file for that server.
    3. Configure the SSL.
    4. Configure the user registry and roles.
    5. Configure the grid.
    6. Configure database persistence.
    7. Clone the container server to the 4 machines used to host the 4 container servers.
  5. In the prototypes machine, create and configure the inbound and outbound servers:
    1. Create inbound and outbound connectivity server prototypes using the cisInbound and cisOutbound templates, respectively.
    2. Configure the bootstrap properties file for that server.
    3. Configure the SSL.
    4. Configure the user registry and roles.
    5. Clone the configured inbound server to 2 of the machines and the outbound server to the remaining 2.

Creating and configuring the catalog servers

To have high availability for the extreme scale catalog, create a catalog server domain consisting of 3 catalog servers.

Step 1: Create a catalog server prototype

From installation_directory/runtime/wlp/bin on the prototypes machine, run the following command to create a server from the cisCatalog template:

./server create cisCatalog --template=cisCatalog

Note that you use 2 dashes (--) for the template parameter.

Step 2: Configure the catalog file

Configure the catalog cluster endpoints in the file as shown in the following example. Change vmwtpmxxxx to your own host names.


Note: To facilitate cloning of catalog servers, avoid the ${ia.serverName} and ${} variables when specifying the catalog cluster endpoints.

Step 3: Configure security

To create a secure server environment, you must configure the following settings:

  • SSL, for secure communications.
  • A user registry, typically LDAP.
  • An administrator role for management access (Java Management Extensions).
  • Read and write roles for the REST API (for container servers only).

Note: This configuration assumes a secure network and does not address the security of servers in the grid and the communication between those servers. For more information, see the Enabling Decision Server Insights Grid Security technote.

  1. Configure the SSL.
    To enable SSL communication for your servers you must first create a keystore file. The keystore contains a certificate for the server that, in production environments, should be issued or signed by a trusted certificate authority. For detailed information, see Enabling SSL communication for the Liberty profile in the product documentation. Create a keystore containing a self signed certificate using the security utility in the installation_directory/runtime/wlp/bin directory as shown in the following example: ./securityUtility createSSLCertificate --server=cisCatalog --password=ins1ghts --validity=1000 –subject=CN=*,O=IBM,C=UK
    When the createSSLCertificate command runs, you should see output similar to the following example:
    Creating keystore /opt/IBM/ODMInsights87/runtime/wlp/usr/servers/cisCatalog/resources
    Created SSL certificate for server cisCatalog
    Add the following lines to the server.xml to enable SSL:
    </featureManager> <keyStore id="defaultKeyStore" password="{xor}NjEsbjg3Kyw=" />

    The SSL feature definition is already in the server.xml file, so you do not need to add the SSL feature definition to this file. However, the keyStore definition <keyStore id="defaultKeyStore" password="{xor}NjEsbjg3Kyw=" /> must be copied to the server.xml file. Alternatively, you can remove comments from the keystore definition already in the server.xml file and just add the password. This configuration is necessary for all servers that are part of the grid and for all connectivity servers.

  2. Configure the user registry and roles.
    For brevity, this tutorial skips the LDAP registry configuration that you typically see in a production environment. For more information see Configuring LDAP user registries with the Liberty profile in the WebSphere Application Server Network Deployment V8.5.5 documentation. Instead of LDAP, the following example uses a basic registry, which is configured in the server.xml:
    <basicRegistry id="basic" realm="DWRealm">
    <user name="admin" password="ins1ghts"/>
    <group name="DWGroup">
    <member name="admin"/>

    Still in the server.xml file, define the administrator role used for JMX access:


Step 4: Enable majority quorum

To enable quorum, add the property enableQuorum="true" to the xsServer entry in the server.xml file, as shown in the following example:

<xsServer catalogServer="true" catalogClusterEndpoints="${ia.clusterEndpoints}"
 listenerPort="${ia.listenerPort}" transport="XIO" serverName="${ia.serverName}" enableQuorum="true"/> ]

In addition, to have majority quorum, you must set the Java Virtual Machine property to 'true' in the jvm.options file (which is in the cisCatalog directory) as shown in the following example:

After the environment is configured and started, verify that quorum and majority quorum are enabled. Look for the following messages in the messages logs after starting the catalog servers:

I CWOBJ1251I: Quorum is enabled for the catalog service.
I CWOBJ0054I: The value of the ""
 property is "true"

Step 5: Clone the catalog servers

Clone 3 catalog servers from the prototype catalog server using the following command, run from /opt/IBM/installation_directory/runtime/ia/bin:

./cloneServer cisCatalog root catalog_host_name  /opt/IBM/installation_directory/runtime

Creating and configuring the container servers

Next, create and configure the container servers.

Step 1: Create a container server prototype

From installation_directory/runtime/wlp/bin, run the following command to create a server from the cisContainer template:

./server create cisContainer –template=cisContainer

Step 2: Configure the file for the container

Configure the catalog bootstrap endpoints in the container's file as shown in the following example:


During cloning, 'localhost' automatically is replaced with the relevant host name.

Step 3: Configure the SSL

For the container servers, use the same security configuration that you used for the catalog servers. Copy the resources directory in cisCatalog prototype, for example, /opt/IBM/ODMInsights87/runtime/wlp/usr/servers/cisCatalog/resources, to the cisContainer directory of the cisContainer prototype.

Add the keyStore definition (<keyStore id="defaultKeyStore" password="{xor}NjEsbjg3Kyw=" />) to the server.xml file of the container server prototype.

Step 4: Configure the user registry and roles

Configure the basic registry in the server.xml file in the same way that you did for the catalog server:

<basicRegistry id="basic" realm="DWRealm">
<user name="admin" password="ins1ghts"/>
<group name="DWGroup">
<member name="admin"/>

In the server.xml file, define the administrator role used for JMX access:


For container servers you must configure the reader (get) and writer (get, post, put and delete) security roles used by the REST API, as shown in the following example:

<authorization-roles id="iaAuthorization">
<security-role name="iaRESTWriter">
<group name="DWGroup"/>
<security-role name="iaRESTReader">
<group name="DWGroup"/>

Step 5: Customize the grid

Open the installation_directory/runtime/wlp/usr/servers/cisContainer/grids/objectGridDeployment.xml file.

You'll notice that the default configuration for the grid is already customized for the recommended 1 synchronous replica and 1 asynchronous replica.

Important: To avoid unnecessary balancing work when the grid is started, after starting the catalog servers and before starting the container servers run the installation_directory/runtime/ia/bin/serverManager suspendbalancing command. After starting all the container servers issue the serverManager resumebalancing command to ensure that grid balancing occurs only once during start up.

This tutorial example changes the default number of containers from 127 to 53.

After configuration, the 'iaMaps' map set looks something like the following example:

<mapSet name="iaMaps" numberOfPartitions="53" numInitialContainers="3" minSyncReplicas="0"
 minAsyncReplicas="0" maxSyncReplicas="1" maxAsyncReplicas="1" developmentMode="false">

The 'iaPreloadMaps' map set has a similar configuration, but note that iaGrMaps has only one partition.

Step 6: Persist entities and events

By default, data is maintained in the grid and not persisted. To ensure that your data is persisted for disaster recovery purposes you complete the following steps:

  • Create a database
  • Create the Decision Server Insights database tables using the DDL supplied in your_installation_location\runtime\ia\persistence\sql\DB2\DB2Distrib.sql. Note that your DBA might want to make some changes to the DDL based on performance considerations or specific organization standards.
  • Next, complete the following steps to enable the JDBC feature and create a datasource definition in the server.xml of the cisContainer prototype:
  1. Add <feature>jdbc-4.0</feature> to the featureManager element.
  2. Remove comments from the datasource definition in the server.xml file an supply your own properties, for example:
	<dataSource jndiName="jdbc/ia/persistence">
	         <fileset dir="/opt/DB2/java" includes="db2jcc4.jar db2jcc_license_cisuz.jar" />
	      portNumber="50000" />

Make sure you also remove comments from the ia_persistence entry:

	<ia_persistence datasourceName="jdbc/ia/persistence" maxBatchSize="10000" maxCacheAge="1000" />

Finally, configure which maps to persist in the objectgrid.xml file. The easiest way to do this is to use the template in installation_directory/runtime/ia/persistence/grids/write-behind/objectgrid.xml. This tutorial uses the write-behind template so that all maps persist asynchronously.

You might also choose to change the default batch time delay and batch size between database writes, for example:

<backingMap name="Entity.*" template="true" lockStrategy="PESSIMISTIC"
 copyMode="COPY_TO_BYTES" pluginCollectionRef="EntityPlugins"
 writeBehind="T10;C100" />

Step 7: Clone the Container server

Clone 4 container servers from the prototype container server using the following command, run from /opt/IBM/installation_directory/runtime/ia/bin:

./cloneServer cisContainer root container_host_name  /opt/IBM/installation_directory/runtime

Step 8: Create the inbound and outbound connectivity servers

Create the prototype inbound and outbound servers from the corresponding templates.

Configure the file for the server with the catalog server endpoints.

Configure security in the same way that you did for the container server (you do not need to configure the REST API authorization roles).

Remove comments from the application security feature like the following example:

Configure the inbound or outbound connectivity features as required by the solution, for example:



This tutorial described a reference topology for Decision Server Insights. You learned the necessary WebSphere eXtreme Scale and Decision Server Insights basic concepts and important configurable elements of a grid, such as the number of partitions. The tutorial walked through a set of non-functional requirements for the topology and described the topology layout and configuration choices in terms of those requirements. Finally, you learned the installation and the configuration steps for the reference topology.


The authors would like to thank David Martin, Decision Server Insights product architect, for his review of this article. We also wish to thank David Rushall, David Granshaw and Lewis Evans for their valuable input.

Downloadable resources

Related topics

ArticleTitle=Install and configure a Decision Server Insights reference topology