Deploy the DB2 pureScale Feature on Linux

Easy as 1-2-3

The IBM® DB2® pureScale™ Feature lets you scale out your database system by easily adding machines to your cluster. This article walks you through the process of deploying the DB2 pureScale Feature on SUSE Linux®. It uses a 10 gigabit Ethernet infrastructure in a two System x 3850 X5s server configuration that is connected to a DS5100 storage controller. The article also includes post-installation steps and basic information for using DB2 pureScale, such as how to add and remove members and how to make sure you're prepared for high availability and disaster recovery.

Aslam Nomani, DB2 Quality Assurance Manager, IBM

Aslam Nomani has been with the IBM Toronto Lab for 15 years with most of that time spent with the Quality Assurance team. His main area of focus has been on high availability and disaster recovery solutions. Aslam is currently the Quality Assurance Architect for the DB2 pureScale Feature. Aslam has published over 20 papers related to high availability and disaster recovery solutions.



Pandu Mutyala (pmutyala@ca.ibm.com), DB2 System Verification Test, IBM

Pandu Mutyala holds a Masters in Information Sciences and Technology from the Missouri University of Science and Technology. He has been a member of the DB2 Quality Assurance team since 2010, and is responsible for testing of DB2 pureScale software on Linux operating systems.



Yugandhra Rayanki (yrayanki@in.ibm.com), DB2 System Verification Test, IBM

Yugandhra Rayanki has been with IBM for over 6 years, working with different teams within the DB2 Quality Assurance group. He is currently working with system verification testing for DB2 pureScale and he is an IBM DB2 Certified Advanced DBA. He has special interests in the areas of HADR, backup, and recovery.



Aimin Wu (aiminw@ca.ibm.com), DB2 Software Development, IBM

Aimin Wu is the usability focal point of the IBM System Optimization Competency Center (SOCC) including management, maintenance, and monitoring. She spent the first 10 years of her IBM career with the DB2 install team, starting with Platform Installer on UNIX and Linux. She was the architect of DB2 Install: Up & Running on UNIX and Linux before she moved to the SOCC.



07 April 2011

Also available in Chinese Vietnamese Portuguese

Introduction

In today's highly competitive marketplace you need to deploy a data processing architecture that not only meets your immediate tactical needs, but also provides the flexibility to adapt to your future strategic requirements.

In December of 2009, IBM introduced the DB2 pureScale Feature for Enterprise Server Edition (also known as the data sharing feature for ESE) that brings this technology to open systems by leveraging an active-active shared-disk database implementation that is based on the DB2 for z/OS data-sharing architecture.

You get the following benefits when using the DB2 pureScale Feature:

Virtually unlimited capacity
You can scale out your system by easily adding additional machines to your cluster. The DB2 pureScale Feature can scale to 128 members and has a centralized management facility that allows for efficient scale-out capabilities. It uses a technology called Remote Direct Memory Access (RDMA) that provides a highly efficient inter-node communication mechanism that aids its scaling capabilities.
Application transparency
You can leverage your existing applications without changes. An application running in a DB2 pureScale environment does not need any knowledge of the different members in the cluster, or need to be concerned about partitioning data. The DB2 pureScale Feature will automatically route applications to the most appropriate members.
The DB2 pureScale Feature provides native support for a great deal of syntax used by other database vendors, allowing those applications to run in a DB2 pureScale environment with minimal or no changes. In fact, the benefits of the DB2 pureScale Feature can be achieved, in many cases, without having to modify your applications.
Continuous availability
The DB2 pureScale Feature provides an active-active architecture with inherent redundancy. If one member goes down, processing can continue on the remaining active members. During a failure, only the data being modified on the failing member is temporarily unavailable until database recovery completes for that set of data. This approach is in direct contrast to other competing solutions where an entire system freeze may occur during database recovery.
Reduced total cost of ownership (TCO)
The DB2 pureScale Feature reduces TCO because the interfaces handle the deployment and maintenance of integrated components, which reduces the steep learning curves associated with some of the competing technologies.

To better understand how the DB2 pureScale Feature offers these benefits, you should understand a bit more about the architecture. Figure 1 shows the different components of a DB2 pureScale configuration. Even though there are multiple advanced components, a significant portion of this configuration is transparent to the end user because the DB2 pureScale Feature deploys and manages these components.

Figure 1. DB2 pureScale Feature topology overview
Clients connect to database through integrated cluster services and data sharing architecture

Notice that there is a deployment of four members and two cluster caching facilities (CF). Clients can connect to any member, and the DB2 pureScale Feature can automatically load balance the clients across the different members based on machine usage. If any host in the configuration fails, the DB2 pureScale Feature will redirect clients to the active members on the remaining hosts.

Each DB2 member represents a DB2 processing engine. Up to 128 members can be deployed in a single DB2 pureScale configuration. The members cooperate with each other and the CF to provide coherent access to the database from any member. Members can be added and removed as processing demands change without any impact to clients. As discussed later in this article, members and CFs can coexist on the same physical machine.

A cluster services layer is integrated with the DB2 pureScale that provides failure detection, recovery automation, and a clustered file system. These technologies are integrated within the DB2 pureScale Feature and use IBM technologies optimized for DB2 software. They include IBM Tivoli Systems Automation for Multi-platforms (Tivoli SA MP), Reliable Scalable Cluster Technology (RSCT), and General Parallel File System (GPFS).

The DB2 pureScale Feature automatically deploys and configures these technologies according to a best practice pre-defined configuration. You do not need to determine how to configure the clustering technology that comes with the DB2 pureScale Feature because it is transparent to the end user.

In the DB2 pureScale configuration, the members and CFs can efficiently communicate using RDMA technology. RDMA allows one machine to read or write to the memory of another machine without requiring any processor cycles on the target machine. This mechanism, along with high-speed networks such as 10 gigabit Ethernet, allows for an very efficient transport layer for the DB2 pureScale Feature to scale. This configuration can also run across an InfiniBand network.

The CF provides a scalable and centralized locking mechanism to ensure data coherency. They act as a fast cache for DB2 pages, using RDMA technology to provide increased performance in situations where a physical disk operation may otherwise have been required. The CF, along with the efficient transport layer, allows the DB2 pureScale Feature to scale easily because each member does not have to negotiate with all other members when performing a task.

Since the DB2 pureScale Feature uses a shared-disk technology, any member can read or write to any portion of the database. If any member fails, the full set of data is still accessible from the other active members.


Deploying the DB2 pureScale Feature

Configuration overview

In the following scenario, you will deploy the DB2 pureScale Feature V9.8.3 on two physical System x 3850 X5 machines. For a list of other supported server models, refer to the DB2 documentation.

Each physical machine has the following characteristics:

  • It exists on a public network that allows for client connectivity.
  • It has a 10 gigabit Ethernet card for high-speed, low-latency communication between members and CFs. The 10 gigabit Ethernet also allows for RDMA over Ethernet.
  • It has shared connectivity to a common set of disks.

Figure 2 shows a typical configuration of the main hardware components of a DB2 pureScale Feature deployment.

Figure 2. Sample DB2 pureScale Feature hardware configuration
Clients connect over ethernet to X3850x5 joined by 10gbe switch, from there through SAN to DS5100

Table 1 lists the high-level configurations of each physical node.

Table 1. Configuration overview
Hostnamecoralinst07coralinst08
OS LevelSUSE Linux Enterprise Server 10 SP3 (x86_64) Linux Kernel 2.6.16.60-0.69.1-smpSUSE Linux Enterprise Server 10 SP3 (x86_64) Linux Kernel 2.6.16.60-0.69.1-smp
Server TypeMember 0 + Primary CFMember1 + Secondary CF
Cores88
RAM64GB64GB
BIOS FirmwareVersion: -[G0E122DUS-1.23]- required for X5 only (x3850)Version: -[G0E122DUS-1.23]- required for X5 only (x3850)
Shared disks /dev/sdd - Disk to hold shared DB2 instance files
/dev/sde - Disk leveraged for DB2 data
/dev/sdf - Disk leveraged for DB2 transaction logs
/dev/sdg - Disk leveraged by DB2 Cluster Services Layer
Note: Disks sizes vary based on specific requirements
Disk Device DriverLinux RDAC driver package for kernel 2.6
z-Bottom of Form;z-Top of Form;HTML Markup;Comment;09.03.0C05.0439
Linux RDAC driver package for kernel 2.6 z-Bottom of Form;z-Top of Form;HTML Markup;Comment;
09.03.0C05.0439
Ethernet interfaceeth0eth0
10 gigabit Ethernet card firmware2.7.700 ( fw-25408-2_7_700-DB2_59Y1905.bin)
from mellanox.com
2.7.700 (fw-25408-2_7_700-DB2_59Y1905.bin)
from mellanox.com
10 gigabit Ethernet interface Hostname coralinst07-10ge
coralinst08-10ge
10 gigabit Ethernet Interface eth4 eth4
OpenSSH openssh-4.2p1-18. 40.35 openssh-4.2p1-18. 40.35
OFED OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.1.PTF.604678 OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.1.PTF.604678

For information on how to configure the 10 gigabit Ethernet, refer to Appendix A, or refer to Appendix B to learn more about deploying an InfiniBand network.


DB2 pureScale Feature pre-installation steps

Unless otherwise specified, the commands listed in these steps are run as a user with root privileges.

  1. Ensure that SSH (without a password) is set up at the root level within all the physical machines participating in a DB2 pureScale cluster. You can validate the SSH configuration by issuing the following command from machine to machine in the cluster, and ensuring that it returns the valid hostname without any prompting:
    # ssh <target machine> hostname

    Instance level SSH will be set up by the installer during the instance setup.

  2. Have at least 10 GB of free space in the /tmp and in /var file systems of each machine.
  3. Make sure that all the file sets are installed as part of the OS installation:
    cpp, gcc, gcc-c++, kernel-source, binutils, libstdc++ 32/64 bits ones.
  4. Identify the disks to be used for the DB2 pureScale Feature and ensure each is tagged with a WWID/WWN, and are the same across all nodes.

    You can use the fdisk -l command to list all physical volumes available on a machine, along with sizes of the disks. The following example shows output from that command:


    Disk /dev/sdd: 214.7 GB, 214749020160 bytes
    255 heads, 63 sectors/track, 26108 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
  5. On systems with IBM RDAC driver, do the following:
    1. Determine the LUN mapping using the lsvdev command:
      coralinst07:~ # /opt/mpp/lsvdev
      
      Array Name      Lun    sd device
      -------------------------------------
      DS5300SVT1      0     -> /dev/sdc
      DS5300SVT1      1     -> /dev/sdd
      DS5300SVT1      2     -> /dev/sde
      DS5300SVT1      3     -> /dev/sdf
      DS5300SVT1      4     -> /dev/sdg
    2. Get a list of Storage Arrays seen by the host:
      coralinst07:~ # /usr/sbin/mppUtil -a
      Hostname    =  coralinst07
      Domainname  = N/A
      Time        = GMT 08/06/2010 16:27:59
      
      ---------------------------------------------------------------
      Info of Array Modules seen by this Host.
      ---------------------------------------------------------------
      ID              WWN                      Type     Name
      ---------------------------------------------------------------
      0      600a0b800012abc600000000402756fc FC     FASTSVT1
      1      600a0b800047bf3c000000004a9553b8 FC     DS5300SVT1
      ---------------------------------------------------------------
    3. Get a list of WWN of the disks associated to the LUNS:
      coralinst07:# mppUtil -a DS5300SVT1 | awk '/WWN/' | grep -v Restore
      Lun #0 - WWN: 600a0b800047bf3c0000803e4baca3df
      Lun #1 - WWN: 600a0b800047b9ca00008bec4baca3e1
      Lun #2 - WWN: 600a0b800047bf3c000080404baca406
      Lun #3 - WWN: 600a0b800047b9ca00008bee4baca406
    4. Get a list of WWID of the disks. The WWN and WWID are the same except for the first digit:
      coralinst07# cd /dev/disk/by-id
      scsi-3600a0b800047b9ca00008bec4baca3e1 -> ../../sdd
      scsi-3600a0b800047bf3c000080404baca406 -> ../../sde
      scsi-3600a0b800047b9ca00008bee4baca406 -> ../../sdf 
      scsi-3600a0b800047bf3c0000803e4baca3df -> ../../sdg

DB2 pureScale Feature installation steps

  1. Identify one of the nodes as the installation-initiating host (IIH). This host will run the installation program for the DB2 pureScale Feature. Ensure that the display is set to show graphical tools. In this case, you are using coralinst07 as the IIH.
  2. To launch the DB2 graphical installer, go to the ese_dsf folder on the downloaded product image, or to the root directory of the product installation DVD, and run the db2setup command:
    #./db2setup -t /tmp/db2setup.trc -l /tmp/db2setup.log
  3. From the welcome screen, as shown in Figure 3, you can view documentation before you install DB2 pureScale. For example, the Architecture Overview topic provides a high-level overview of the DB2 pureScale instance environment. To install the DB2 pureScale Feature, click Install a Product on the left pane.
    Figure 3. DB2 setup launchpad
    Welcome to DB2 Version 9.8
  4. From the Install a Product as root screen click Install New, as shown in Figure 4.
    Figure 4. DB2 setup launchpad - Install a product
    install a product as root
  5. The DB2 Setup wizard provides a set of intuitive instructions to help you navigate through the remainder of the deployment, as shown in Figure 5. Click Next to continue the installation. You can optionally click View Features to see which features will be installed.
    Figure 5. DB2 setup wizard
    Welcome to the DB2 setup wizard
  6. Read the license agreement, as shown in Figure 6. Click Accept if you accept the licensing terms, and then click Next.
    Figure 6. Software License Agreement
    Welcome to the DB2 setup wizard
  7. As shown in Figure 7, you can choose to install the pureScale Feature immediately, set up a response file to install it later, or install it immediately and keep your settings in a response file. For this example you want to install it immediately without creating a response file, so select DB2 Enterprise Server Edition with the pureScale Feature, and then click Next.
    Figure 7. Installation type
    Select installation, response file creation, or both
  8. From the Select the installation directory screen shown in Figure 8, you provide information for the installation directory for the DB2 binaries. Click Next after determining the installation path. In most cases, you can accept the default path.
    Figure 8. Installation directory
    Select the installation directory
  9. From the Set up a DB2 Instance screen, as shown in Figure 9, you can either create a DB2 instance, or wait until after the install is done. For this example you want to create an instance, so select Create a DB2 instance and click Next.
    Figure 9. Set up a DB2 instance
    Set up a DB2 instance
  10. From the Set user information for the DB2 instance owner screen, as shown in Figure 10, type the information for the instance owner, and then click Next. If an existing user is selected as the DB2 instance owner or the DB2 fenced user, the user must exist on all the hosts with the same UID, GID, group name, and $HOME path. The hosts should not share the instance owner's $HOME directory because that will be local to each host. If a new user creation is selected, the defined new user must not exist in any of the hosts.
    Figure 10. Instance owner information
    Set user information for the DB2 instance owner
  11. From the Set user information for the fenced user screen, as shown in Figure 11, enter the information about the fenced user and click Next.
    Figure 11. Fenced user information
    Set user information for the fenced user
  12. From the Set up a DB2 Cluster File System screen, as shown in Figure 12, you will use one of the predefined disks to create a shared file system used by the DB2 pureScale instance environment for instance files that are shared across all machines. The file system will be mounted as /db2sd<timestamp> and the directory /db2sd<timestamp>/<instance_name> will be the default database path (as defined by the DFTDBPATH configuration parameter).

    Another small disk will be leveraged for automatic internal cluster recovery purposes. Provide the full path to the disks and then click Next.

    Figure 12. DB2 Cluster File System setup
    Set up a DB2 cluster file system
  13. From the Host List screen, as shown in Figure 13, add the remaining hosts that should be part of the DB2 pureScale cluster. By default, the IIH will already be included. For each host that you need to add, click Add.
    Figure 13. Host list selection
    Screen cap shows the host list
  14. After you click Add, as shown in Figure 14, you are prompted to enter a host name (the output when running the hostname command). Type the host name and then click OK. Repeat this step to add each host that you need.
    Figure 14. Remote host name input
    shows popup for entering host name
    After each additional host is entered, the DB2 installation program will validate each new host, as shown in Figure 15.
    Figure 15. Installation settings progress indicator
    Validation of installation settings in progress
  15. After you add all the hosts, you will see the list of hosts, as shown in Figure 16. A check mark beside each host confirms that it has been validated. When you are satisfied with the configuration identified in the bottom portion of the window, click Next. At this time, the DB2 installation program will do one additional validation of the passwordless SSH configuration.
    Figure 16. Host list confirmation
    shows host list

    If there is more than one physical machine in the configuration, the installation wizard will automatically assign each CF to a different machine.

  16. If you want to have both a member and a CF installed on the same host, from the Host List screen, click the Advanced button, and then click Manually assign Cluster Caching Facilities (CF), as shown in Figure 17.
    Figure 17. CF assignment
    Advanced screen

    As shown in Figure 18, under Preferred Primary CF, click Configure the host as both a CF and a DB2 member. Similarly, you can select the check box under Preferred Secondary CF to create a secondary CF, and a DB2 member, on a different host in the cluster.

    Figure 18. Preferred primary and secondary CF assignment
    advanced screen showing preferred primary and secondary CF assignment

    You should have two CFs identified on two distinct physical machines to avoid creating a single point of failure.

  17. After the DB2 installation program validates the passwordless SSH, it will show a summary of the inputs before proceeding with the installation, as shown in Figure 19. If you are satisfied with the inputs, click Finish to start the installation.
    Figure 19. Installation settings confirmation
    start copying files
  18. During the actual installation, a progress monitor bar shows you the installation progress, as shown in Figure 20. This step will take several minutes because the DB2 pureScale Feature and the selected components are being deployed to all hosts in the configuration.
    Figure 20. Installation progress indicator
    shows progress monitor bar
  19. Upon successful completion of the DB2 pureScale Feature installation, the Setup has completed successfully screen provides additional information on where to find log and setup files, as shown in Figure 21. Click Finish.
    Figure 21. Installation completion
    Setup has completed successfully
    The installation of the DB2 pureScale Feature across both hosts is complete with a DB2 instance ready for your use.

DB2 pureScale Feature post-installation steps

With a DB2 pureScale Feature for Enterprise Server Edition instance ready for use, you should create a file system to use for the data and the logs. You can create a file system using the db2cluster command. As root, complete the following steps:

  1. Create one file system for data and one file system for logs:
    #<DB2 Install Path>/bin/db2cluster -cfs -create -filesytem data -disk /dev/sde
    #<DB2 Install Path>/bin/db2cluster -cfs -create -filesytem log -disk /dev/sdf

    The DB2 Install Path in this deployment would be /opt/ibm/db2/V9.8. The data and log file systems will be created under /db2fs by default, and will be accessible on all hosts in the DB2 pureScale instance.

  2. Modify the owner of the file system to be the DB2 instance owner so it has full access to this file system. In this case, db2sdin1 is the instance owner's name and db2iadm1 is the instance owner's group name.
    #chown db2sdin1:db2iadm1 /db2fs/data
    #chown db2sdin1:db2iadm1 /db2fs/log
  3. Start the DB2 instance by issuing the db2start command. You can see the state of the DB2 pureScale instance at any point by using the db2instance command.
    > db2start
    04/19/2010 11:02:08 0 0 SQL1063N DB2START 
    processing was successful.
    04/19/2010 11:02:08 1 0 SQL1063N DB2START 
    processing was successful.
    SQL1063N DB2START processing was successful.

    You can view the state of a DB2 pureScale cluster using the db2instance -list command.
  4. Create the database and move the logs to the log file system. The following commands must be run from member hosts, not CF hosts.
    > db2 create db testdb on /db2fs/data
    > db2 update db cfg for testdb using newlogpath /db2fs/log
  5. Catalog client connections to any active pureScale members and connect to the database.

Using the DB2 pureScale Feature

There are many advantages to the DB2 pureScale Feature. This next section provides use cases that demonstrate that added value. The simple deployment of DB2 pureScale Feature already demonstrated that it can help to reduce total cost of ownership.

Adding and removing members

The DB2 pureScale Feature allows the ability to add members to the configuration quickly and without any data redistribution requirements. The DB2 installation binaries are automatically stored on the IIH and therefore do not requiring access to the original installation media when members are being added. You can simply add a member by stopping the instance and running the following command from the IIH:

db2iupdt -d -add -m ServerX:ServerX-10ge db2sdin1

Similarly, a member can be removed by running the following command from the IIH:

db2iupdt -d -drop -m ServerX:ServerX-10ge db2sdin1
You can transparently start or quiesce members to the application in such a way that the application is unaware a change has occurred.

Automatic workload balancing

The DB2 pureScale Feature provides the ability to dynamically distribute a workload across all the active members, based on the utilization characteristics of the different machines. Multi-threaded CLI applications will, by default, have connection level workload balancing without any changes. This workload balancing can be modified such that it applies at the transaction level as opposed to the connection level. For multi-threaded Java applications, enableSysplexWLB=true can be changed in the connection string to take advantage of transaction level workload balancing.

As additional members are started, clients will automatically route to the new member without any interruption of service. Also, members can be stopped, as per the instructions under stealth maintenance, without the application knowing this operation has even occurred.

You can also configure clients to have a preference to which member it should connect to. This feature is referred to as client affinity and can be beneficial if a partitioned workload already exists.

To take advantage of DB2 pureScale features such as transaction level workload balancing or client affinity, the minimum client level should be 9.7 fix pack 3 or the correlating JCC level. To correlate the JCC levels included in the various fix pack levels, see the Resources section for more details.

Stealth maintenance

In many cases it is critical to apply maintenance to a system, but you don't want any negative impact to the client applications. Stealth maintenance lets all transactions on a member complete and then transparently routes that application to another member. For example, to drain member 1, you can run the following command:

db2stop member 1 quiesce

In some cases you may encounter situations where a user session for which a unit of work (UOW) was started but was not committed or rolled back. Unless a timeout value is specified, the db2stop quiesce will have to wait for that UOW to be completed before stopping that member. For situations like this, you can specify a timeout value, for example - ten minutes, which would allow the application ten minutes to complete the UOW. If, after ten minutes, the UOW is not completed, then the DB2 software will automatically force off that application. For any applications that completed within the ten minutes, they will have been automatically re-routed to the active members when they completed their UOW. To drain member 1 with a ten minute timeout, you can run the following command:

db2stop member 1 quiesce 10

High availability

One of the significant value propositions of the DB2 pureScale Feature is the high availability characteristics integrated into the architecture. All necessary resources are automatically monitored by the DB2 pureScale cluster services and restarted as needed. Applications that are connected to a failing member will automatically be re-routed to an active member where the application can re-issue any failed transactions. Applications connected to a non-failing component will not be impacted.

One of the distinguishing factors of the DB2 pureScale Feature compared to most competing technologies is that no cluster-wide freeze occurs when a member fails. In fact, only data in the process of being updated on the failing member is temporarily unavailable until recovery is complete. Applications on active members trying to access the locked data on the failing member will be briefly in a lock-wait state, and by default will not receive any errors. The recovery will be completed quickly so that data availability through a member failure will look similar to the hypothetical representation shown in Figure 22.

Figure 22. Typical data availability pattern during member recovery
Graph shows only data in-flight updates locked during recovery

Disaster recovery

While the DB2 pureScale Feature inherently brings a local high availability solution, many customers will also require a disaster recovery solution to meet their business continuity requirements. The DB2 pureScale Feature uses remote disk mirroring technology and is also designed to work with database replication products, as shown in Figure 23.

Figure 23. Typical disaster recovery setup
diagram, production instance site A mirrored to DR instance site B

If the entire primary site running a DB2 pureScale instance fails, the remote site can be used to allow business operations to continue. The DB2 pureScale Feature can use the traditional database backup, restore and roll-forward functionality for disaster recovery solutions.


Conclusion

Using CF and RDMA technologies, the DB2 pureScale Feature for Enterprise Server Edition provides a database solution that can scale effectively to meet the growing and dynamic needs of different organizations, including your most demanding customers. Additional members can be added to the DB2 pureScale environment to meet the demands of peek processing times without any impact to existing applications.

The DB2 pureScale Feature automatically balances the workload across all DB2 members in the cluster without any application changes, taking full advantage of the additional processing capacity. If a DB2 member fails, applications will be automatically routed among the other active members. When the failed member host returns, applications will be transparently routed to the restarted member.

The DB2 pureScale Feature's design and capabilities can help reduce total cost of ownership compared to other solutions, allowing for a simplified deployment and maintenance model. The installation of the DB2 pureScale Feature manages the deployment and configuration of all the bundled software components to all the hosts in the DB2 pureScale environment. Once the DB2 pureScale Feature environment is up and running, its operating status is monitored and maintained easily from any of the active members.

Contributors

We would like to acknowledge the following additional contributors to this article:

  • Serge Boivin, DB2 LUW Information Development
  • Jason Shayer, DB2 LUW Information Development
  • Matthew Huras, Lead Architect, DB2 LUW

Appendix A: Configuring pureScale cluster to support RDMA over Ethernet (RoCE)

As discussed previously in this article, the DB2 pureScale Feature leverages a 10 gigabit Ethernet network for RDMA over Ethernet to allow optimal communication between members and CF.

RDMA over Ethernet is an alternative to InfiniBand, and is supported for System X / Linux implementations of pureScale. RDMA over Ethernet is an efficient and light-weight transport layered directly over Ethernet.

This appendix describes the high-level steps to deploy a 10 gigabit Ethernet network (for cluster inter-connect) to support RDMA over Converged Ethernet (RoCE).

  1. Deploy the 10 gigabit Ethernet switch that supports priority-based flow control and loss-less Ethernet similar to a typical switch deployment.
  2. In this configuration, a single 10 gigabit Ethernet card (MT 26448 with 2ports) is attached to each node in the cluster.
  3. Use SFP+ cables to connect the switch and hosts.
  4. Install OFED 1.5.2 (OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.2.PTF.604678). OFED (Open Fabrics Enterprise Distribution) is a device driver package for Linux that installs VERBS, Utils, uDAPL, RDMA CM, Mellanox components, and others. For detailed instructions on installing OFED, please refer to the 9.8.0.3 Information Center and corresponding tech note.
  5. Edit the network scripts as follows under /etc/sysconfig/network/ for each node to assign an IP for each interface, or use YaST to configure interfaces.
    BOOTPROTO='static'
    BROADCAST=''ETHTOOL_OPTIONS=''IPADDR='192.168.1.103/24'MTU=''NAME='MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'NETWORK=''REMOTE_IPADDR=''STARTMODE='auto'
    If you don't configure this then you will get DAT_INVALID_ADDR error.
  6. Edit the /etc/hosts file on each node for routing purposes, similar to the following:
    #10ge Network
    192.168.1.100 coralinst07-10ge
    192.168.1.101 coralinst08-10ge
  7. On each node ensure the /etc/dat.conf file has a format similar to the following:
    ofa-v2-roe u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth4 0" ""
    Here ofa-v2-roe is the device name that is used to make a connection. Dapl2.0 is DAT2.0, and eth4 is the interface name. If you don't configure this then you will get DAT_INTERNAL_ERROR.
  8. /etc/init.d/openibd restart
  9. Ensure the openibd is running using chkconfig -a openibd, and all the modules and the devices are loaded using service openibd status.

You can validate the state of the RoCE on each node by running the ibv_devinfo or ibstatus commands as a root user. Verify that the ports, or ports configured, are active and that the link is up.

For example:

port:   1
	state:                  PORT_ACTIVE (4)
	max_mtu:                2048 (4)
	active_mtu:             1024 (3)
	sm_lid:                 0
	port_lid:               0
	port_lmc:               0x00
	link_layer:             Ethernet

Also, you should perform a ping test using the addresses and hostnames defined in the /etc/hosts file. You can also do dtest, which is a utility that comes with OFED.

Run dtest -P ofa-v2-roe on host 1, and check if you are able to see a listening connection, and from host2 run dtest -P ofa-v2-roe -h <host1-10ge>. You should noticed that the connection has PASSED.

If the host where the CF or the member resides has more than 64 GB of memory, then the Mellanox HCA driver (mlx4_core) module's parameter log_mtts_per_seg must be increased from 3 (the default) to 7 for larger memory registrations.

To increase the size, issue the following command as root:

  1. On SUSE:
    			echo "options mlx4_core log_mtts_per_seg=7" >> 
    /etc/modprobe.conf.local
  2. On RHEL:
    			echo "options mlx4_core log_mtts_per_seg=7" >> /etc/modprobe.conf

For this change to take effect, you must reboot the server. To check if your change is effective on the module, enter:
<host-name>/sys/module/mlx4_core/parameters # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg

Subnet manager is not required for RoCE, but an IP should be configured on the switch so that RSCT/TSA can monitor the associated network resources.

The following example shows you how you can make the Switch IP a default gateway to the RoCE network.

Assign an IP on the RoCE switch, and use route add, or use the following code:

coralinst07:~ # route add -net 192.168.1.100 
netmask 255.255.255.0 gw 192.168.1.3 ==>

This is not persistent. To make it permanent, you will need to append to the /etc/sysconfig/network/routes file something like 192.168.1.0 192.168.1.3 eth-id-00:02:c9:08:28:10, and query using the route command. This will require network service restart, as follows: /etc/init.d/network restart.

coralinst07:~ # route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.1.0     192.168.1.3     255.255.255.0   UG    0      0        0 eth2
192.168.1.0     *               255.255.255.0   U     0      0        0 eth2
9.26.92.0       *               255.255.252.0   U     0      0        0 eth0
loopback        *               255.0.0.0       U     0      0        0 lo
default         rsb-v94-hsrp.to 0.0.0.0         UG    0      0        0 eth0

Appendix B: Configuring pureScale cluster to support InfiniBand

InfiniBand is a switched fabric communication link used for high-speed communication. Its features include high-throughput, low-latency, and also offers point to point bidirectional links. Like RoCE, OFED is also required for InfiniBand.

This appendix shows you the high-level steps to deploy InfiniBand.

  1. Deploy an IB switch (Mellanox IS5030 with 36 ports). You will need to enable the subnet manager (SM) on the switch, which also needs a separate license to activate it.
  2. In this configuration, a single pMT 26428 IB card with two ports is attached to each node in the cluster.
  3. QSFP cables should be used to connect to the switch and the hosts.
  4. Install OFED 1.5.2 (OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.2.PTF.604678). OFED (Open Fabrics Enterprise Distribution) is a device driver package for Linux which installs VERBS, Utils, uDAPL, RDMA CM, Mellanox components, and so on.
  5. Edit the /etc/dat.conf to add the following lines in the configuration file:
    ofa-v2-ib0 u2.0 nonthreadsafe default 
    libdaplofa.so.2 dapl.2.0 "ib0 0" ""

    Here, ofa-v2-ib0 is the IB device name to make a connection, and dapl.2.0 is uDAPL. If you do not configure dat.conf, then you will get DAT_INTERNAL_ERROR.
  6. Edit /etc/sysconfig/network/Ifcfg-ib0 to configure the static IP for ib0 interfaces. For example, for ifcfg-ib0, use the following:
    DEVICE=ib0
    BOOTPROTO='static'IPADDR='10.1.1.154' #replace with IP address IB port is to use
    NETMASK='255.255.255.0' #Change if IB is have address in a large range like class 
                             CSTARTMODE='onboot'
    WIRELESS='no'
  7. Add the netname and the IP configured in /etc/hosts on all the machines in the cluster. For example:
    10.1.1.130 host130-ib0
    10.1.1.131 host231-ib0
  8. Run /etc/init.d/openibd restart to load the modules.
  9. Ensure that the openibd is running using chkconfig -a openibd, and that all the modules and devices are loaded using service openibd status.

You can validate the state of the InfiniBand on each node by running the ibv_devinfo or ibstatus commands as a root user. Verify that the ports, or ports configured, are active and the link is up.

For Example, for port 1:

state:                  PORT_ACTIVE (4)
max_mtu:                2048 (4)
active_mtu:             2048 (4)
sm_lid:                 1
port_lid:               21
port_lmc:               0x00
link_layer:             IB

You should also perform a ping test using the addresses and hostnames defined in the /etc/hosts file.
You should also run dtest, which is a utility that comes with OFED. Run dtest -P ofa-v2-ib0 on host 1, and ensure you see a listening connection. From host2, run dtest -P ofa-v2-ib0 -h <host1-ib0>. You will see the connection has PASSED.

If the host where the CF or the member resides has more than 64 GB of memory, the Mellanox HCAdriver (mlx4_core) module's parameter log_mtts_per_seg must be increased from 3 (the default) to 7 for larger memory registrations.

To increase the size, issue the following command as root:

  1. On SUSE:
    			echo "options mlx4_core log_mtts_per_seg=7" >> 
    /etc/modprobe.conf.local
  2. On RHEL:
    			echo "options mlx4_core log_mtts_per_seg=7" >> 
    /etc/modprobe.conf

For this change to take effect, you must reboot the server. To check if your change is effective on the module, run:

		<host-name>/sys/module/mlx4_core/parameters # cat /sys/module/mlx4_core
/parameters/log_mtts_per_seg

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Linux
ArticleID=644493
ArticleTitle=Deploy the DB2 pureScale Feature on Linux
publish-date=04072011