In today's highly competitive marketplace you need to deploy a data processing architecture that not only meets your immediate tactical needs, but also provides the flexibility to adapt to your future strategic requirements.
In December of 2009, IBM introduced the DB2 pureScale Feature for Enterprise Server Edition (also known as the data sharing feature for ESE) that brings this technology to open systems by leveraging an active-active shared-disk database implementation that is based on the DB2 for z/OS data-sharing architecture.
You get the following benefits when using the DB2 pureScale Feature:
- Virtually unlimited capacity
- You can scale out your system by easily adding additional machines to your cluster. The DB2 pureScale Feature can scale to 128 members and has a centralized management facility that allows for efficient scale-out capabilities. It uses a technology called Remote Direct Memory Access (RDMA) that provides a highly efficient inter-node communication mechanism that aids its scaling capabilities.
- Application transparency
- You can leverage your
existing applications without changes. An application running in a DB2
pureScale environment does not need any knowledge of the different
members in the cluster, or need to be concerned about partitioning
data. The DB2 pureScale Feature will automatically route applications
to the most appropriate members.
The DB2 pureScale Feature provides native support for a great deal of syntax used by other database vendors, allowing those applications to run in a DB2 pureScale environment with minimal or no changes. In fact, the benefits of the DB2 pureScale Feature can be achieved, in many cases, without having to modify your applications. - Continuous availability
- The DB2 pureScale Feature provides an active-active architecture with inherent redundancy. If one member goes down, processing can continue on the remaining active members. During a failure, only the data being modified on the failing member is temporarily unavailable until database recovery completes for that set of data. This approach is in direct contrast to other competing solutions where an entire system freeze may occur during database recovery.
- Reduced total cost of ownership (TCO)
- The DB2 pureScale Feature reduces TCO because the interfaces handle the deployment and maintenance of integrated components, which reduces the steep learning curves associated with some of the competing technologies.
To better understand how the DB2 pureScale Feature offers these benefits, you should understand a bit more about the architecture. Figure 1 shows the different components of a DB2 pureScale configuration. Even though there are multiple advanced components, a significant portion of this configuration is transparent to the end user because the DB2 pureScale Feature deploys and manages these components.
Figure 1. DB2 pureScale Feature topology overview
Notice that there is a deployment of four members and two cluster caching facilities (CF). Clients can connect to any member, and the DB2 pureScale Feature can automatically load balance the clients across the different members based on machine usage. If any host in the configuration fails, the DB2 pureScale Feature will redirect clients to the active members on the remaining hosts.
Each DB2 member represents a DB2 processing engine. Up to 128 members can be deployed in a single DB2 pureScale configuration. The members cooperate with each other and the CF to provide coherent access to the database from any member. Members can be added and removed as processing demands change without any impact to clients. As discussed later in this article, members and CFs can coexist on the same physical machine.
A cluster services layer is integrated with the DB2 pureScale that provides failure detection, recovery automation, and a clustered file system. These technologies are integrated within the DB2 pureScale Feature and use IBM technologies optimized for DB2 software. They include IBM Tivoli Systems Automation for Multi-platforms (Tivoli SA MP), Reliable Scalable Cluster Technology (RSCT), and General Parallel File System (GPFS).
The DB2 pureScale Feature automatically deploys and configures these technologies according to a best practice pre-defined configuration. You do not need to determine how to configure the clustering technology that comes with the DB2 pureScale Feature because it is transparent to the end user.
In the DB2 pureScale configuration, the members and CFs can efficiently communicate using RDMA technology. RDMA allows one machine to read or write to the memory of another machine without requiring any processor cycles on the target machine. This mechanism, along with high-speed networks such as 10 gigabit Ethernet, allows for an very efficient transport layer for the DB2 pureScale Feature to scale. This configuration can also run across an InfiniBand network.
The CF provides a scalable and centralized locking mechanism to ensure data coherency. They act as a fast cache for DB2 pages, using RDMA technology to provide increased performance in situations where a physical disk operation may otherwise have been required. The CF, along with the efficient transport layer, allows the DB2 pureScale Feature to scale easily because each member does not have to negotiate with all other members when performing a task.
Since the DB2 pureScale Feature uses a shared-disk technology, any member can read or write to any portion of the database. If any member fails, the full set of data is still accessible from the other active members.
Deploying the DB2 pureScale Feature
In the following scenario, you will deploy the DB2 pureScale Feature V9.8.3 on two physical System x 3850 X5 machines. For a list of other supported server models, refer to the DB2 documentation.
Each physical machine has the following characteristics:
- It exists on a public network that allows for client connectivity.
- It has a 10 gigabit Ethernet card for high-speed, low-latency communication between members and CFs. The 10 gigabit Ethernet also allows for RDMA over Ethernet.
- It has shared connectivity to a common set of disks.
Figure 2 shows a typical configuration of the main hardware components of a DB2 pureScale Feature deployment.
Figure 2. Sample DB2 pureScale Feature hardware configuration
Table 1 lists the high-level configurations of each physical node.
Table 1. Configuration overview
| Hostname | coralinst07 | coralinst08 |
|---|---|---|
| OS Level | SUSE Linux Enterprise Server 10 SP3 (x86_64) Linux Kernel 2.6.16.60-0.69.1-smp | SUSE Linux Enterprise Server 10 SP3 (x86_64) Linux Kernel 2.6.16.60-0.69.1-smp |
| Server Type | Member 0 + Primary CF | Member1 + Secondary CF |
| Cores | 8 | 8 |
| RAM | 64GB | 64GB |
| BIOS Firmware | Version: -[G0E122DUS-1.23]- required for X5 only (x3850) | Version: -[G0E122DUS-1.23]- required for X5 only (x3850) |
| Shared disks | /dev/sdd - Disk to hold shared DB2 instance
files /dev/sde - Disk leveraged for DB2 data /dev/sdf - Disk leveraged for DB2 transaction logs /dev/sdg - Disk leveraged by DB2 Cluster Services Layer Note: Disks sizes vary based on specific requirements | |
| Disk Device Driver | Linux RDAC driver package for kernel 2.6 z-Bottom of Form;z-Top of Form;HTML Markup;Comment;09.03.0C05.0439 | Linux RDAC driver package for kernel 2.6 z-Bottom of Form;z-Top of
Form;HTML Markup;Comment; 09.03.0C05.0439 |
| Ethernet interface | eth0 | eth0 |
| 10 gigabit Ethernet card firmware | 2.7.700 ( fw-25408-2_7_700-DB2_59Y1905.bin) from mellanox.com | 2.7.700 (fw-25408-2_7_700-DB2_59Y1905.bin) from mellanox.com |
| 10 gigabit Ethernet interface Hostname | coralinst07-10ge | coralinst08-10ge |
| 10 gigabit Ethernet Interface | eth4 | eth4 |
| OpenSSH | openssh-4.2p1-18. 40.35 | openssh-4.2p1-18. 40.35 |
| OFED | OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.1.PTF.604678 | OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.1.PTF.604678 |
For information on how to configure the 10 gigabit Ethernet, refer to Appendix A, or refer to Appendix B to learn more about deploying an InfiniBand network.
DB2 pureScale Feature pre-installation steps
Unless otherwise specified, the commands listed in these steps are run as a user with root privileges.
- Ensure that
SSH(without a password) is set up at the root level within all the physical machines participating in a DB2 pureScale cluster. You can validate theSSHconfiguration by issuing the following command from machine to machine in the cluster, and ensuring that it returns the valid hostname without any prompting:
# ssh <target machine> hostnameInstance level SSH will be set up by the installer during the instance setup.
- Have at least 10 GB of free space in the /tmp and in /var file systems of each machine.
- Make sure that all the file sets are installed as part of the OS
installation:
cpp, gcc, gcc-c++, kernel-source, binutils, libstdc++ 32/64 bits ones. - Identify the disks to be used for the DB2 pureScale Feature and ensure
each is tagged with a WWID/WWN, and are the same across all nodes.
You can use the
fdisk -lcommand to list all physical volumes available on a machine, along with sizes of the disks. The following example shows output from that command:
Disk /dev/sdd: 214.7 GB, 214749020160 bytes 255 heads, 63 sectors/track, 26108 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes
- On systems with IBM RDAC driver, do the following:
- Determine the LUN mapping using the
lsvdevcommand:coralinst07:~ # /opt/mpp/lsvdev Array Name Lun sd device ------------------------------------- DS5300SVT1 0 -> /dev/sdc DS5300SVT1 1 -> /dev/sdd DS5300SVT1 2 -> /dev/sde DS5300SVT1 3 -> /dev/sdf DS5300SVT1 4 -> /dev/sdg
- Get a list of Storage Arrays seen by the host:
coralinst07:~ # /usr/sbin/mppUtil -a Hostname = coralinst07 Domainname = N/A Time = GMT 08/06/2010 16:27:59 --------------------------------------------------------------- Info of Array Modules seen by this Host. --------------------------------------------------------------- ID WWN Type Name --------------------------------------------------------------- 0 600a0b800012abc600000000402756fc FC FASTSVT1 1 600a0b800047bf3c000000004a9553b8 FC DS5300SVT1 ---------------------------------------------------------------
- Get a list of WWN of the disks associated to the LUNS:
coralinst07:# mppUtil -a DS5300SVT1 | awk '/WWN/' | grep -v Restore Lun #0 - WWN: 600a0b800047bf3c0000803e4baca3df Lun #1 - WWN: 600a0b800047b9ca00008bec4baca3e1 Lun #2 - WWN: 600a0b800047bf3c000080404baca406 Lun #3 - WWN: 600a0b800047b9ca00008bee4baca406
- Get a list of WWID of the disks. The WWN and WWID are the same
except for the first digit:
coralinst07# cd /dev/disk/by-id scsi-3600a0b800047b9ca00008bec4baca3e1 -> ../../sdd scsi-3600a0b800047bf3c000080404baca406 -> ../../sde scsi-3600a0b800047b9ca00008bee4baca406 -> ../../sdf scsi-3600a0b800047bf3c0000803e4baca3df -> ../../sdg
DB2 pureScale Feature installation steps
- Identify one of the nodes as the installation-initiating host (IIH).
This host will run the installation program for the DB2 pureScale
Feature. Ensure that the display is set to show graphical tools. In
this case, you are using
coralinst07as the IIH. - To launch the DB2 graphical installer, go to the
ese_dsf folder on the downloaded product image,
or to the root directory of the product installation DVD, and run the
db2setupcommand:
#./db2setup -t /tmp/db2setup.trc -l /tmp/db2setup.log - From the welcome screen, as shown in Figure 3, you can view
documentation before you install DB2 pureScale. For example, the
Architecture Overview topic provides a high-level
overview of the DB2 pureScale instance environment. To install the DB2
pureScale Feature, click Install a Product on the
left pane.
Figure 3. DB2 setup launchpad
- From the Install a Product as root screen click
Install New, as shown in Figure 4.
Figure 4. DB2 setup launchpad - Install a product
- The DB2 Setup wizard provides a set of intuitive instructions to help
you navigate through the remainder of the deployment, as shown in
Figure 5. Click Next to continue the installation.
You can optionally click View Features to see which
features will be installed.
Figure 5. DB2 setup wizard
- Read the license agreement, as shown in Figure 6. Click
Accept if you accept the licensing terms, and
then click Next.
Figure 6. Software License Agreement
- As shown in Figure 7, you can choose to install the pureScale Feature
immediately, set up a response file to install it later, or install it
immediately and keep your settings in a response file. For this
example you want to install it immediately without creating a response
file, so select DB2 Enterprise Server Edition with the
pureScale Feature, and then click Next.
Figure 7. Installation type
- From the Select the installation directory screen
shown in Figure 8, you provide information for the installation
directory for the DB2 binaries. Click Next after
determining the installation path. In most cases, you can accept the
default path.
Figure 8. Installation directory
- From the Set up a DB2 Instance screen, as shown in
Figure 9, you can either create a DB2 instance, or wait until after
the install is done. For this example you want to create an instance,
so select Create a DB2 instance and click
Next.
Figure 9. Set up a DB2 instance
- From the Set user information for the DB2 instance
owner screen, as shown in Figure 10, type the information
for the instance owner, and then click Next. If an
existing user is selected as the DB2 instance owner or the DB2 fenced
user, the user must exist on all the hosts with the same UID, GID,
group name, and $HOME path. The hosts should not share the instance
owner's $HOME directory because that will be local to each host. If a
new user creation is selected, the defined new user must not exist in
any of the hosts.
Figure 10. Instance owner information
- From the Set user information for the fenced user
screen, as shown in Figure 11, enter the information about the fenced
user and click Next.
Figure 11. Fenced user information
- From the Set up a DB2 Cluster File System screen, as
shown in Figure 12, you will use one of the predefined disks to create
a shared file system used by the DB2 pureScale instance environment
for instance files that are shared across all machines. The file
system will be mounted as /db2sd<timestamp> and the directory
/db2sd<timestamp>/<instance_name> will be the default
database path (as defined by the DFTDBPATH configuration parameter).
Another small disk will be leveraged for automatic internal cluster recovery purposes. Provide the full path to the disks and then click Next.
Figure 12. DB2 Cluster File System setup
- From the Host List screen, as shown in Figure 13, add
the remaining hosts that should be part of the DB2 pureScale cluster.
By default, the IIH will already be included. For each host that you
need to add, click Add.
Figure 13. Host list selection
- After you click Add, as shown in Figure 14, you are
prompted to enter a host name (the output when running the
hostnamecommand). Type the host name and then click OK. Repeat this step to add each host that you need.
Figure 14. Remote host name input
After each additional host is entered, the DB2 installation program will validate each new host, as shown in Figure 15.
Figure 15. Installation settings progress indicator
- After you add all the hosts, you will see the list of hosts, as shown
in Figure 16. A check mark beside each host confirms that it has been
validated. When you are satisfied with the configuration identified in
the bottom portion of the window, click Next. At this
time, the DB2 installation program will do one additional validation
of the passwordless SSH configuration.
Figure 16. Host list confirmation
If there is more than one physical machine in the configuration, the installation wizard will automatically assign each CF to a different machine.
- If you want to have both a member and a CF installed on the same host,
from the Host List screen, click the
Advanced button, and then click Manually
assign Cluster Caching Facilities (CF), as shown in
Figure 17.
Figure 17. CF assignment
As shown in Figure 18, under Preferred Primary CF, click Configure the host as both a CF and a DB2 member. Similarly, you can select the check box under Preferred Secondary CF to create a secondary CF, and a DB2 member, on a different host in the cluster.
Figure 18. Preferred primary and secondary CF assignment
You should have two CFs identified on two distinct physical machines to avoid creating a single point of failure.
- After the DB2 installation program validates the passwordless
SSH, it will show a summary of the inputs before proceeding with the
installation, as shown in Figure 19. If you are satisfied with the
inputs, click Finish to start the installation.
Figure 19. Installation settings confirmation
- During the actual installation, a progress monitor bar shows you the
installation progress, as shown in Figure 20. This step will take
several minutes because the DB2 pureScale Feature and the selected
components are being deployed to all hosts in the configuration.
Figure 20. Installation progress indicator
- Upon successful completion of the DB2 pureScale Feature installation,
the Setup has completed successfully screen provides additional
information on where to find log and setup files, as shown in Figure
21. Click Finish.
Figure 21. Installation completion
The installation of the DB2 pureScale Feature across both hosts is complete with a DB2 instance ready for your use.
DB2 pureScale Feature post-installation steps
With a DB2 pureScale Feature for Enterprise Server Edition instance ready
for use, you should create a file system to use for the data and the logs.
You can create a file system using the
db2cluster command. As root, complete the
following steps:
- Create one file system for data and one file system for logs:
#<DB2 Install Path>/bin/db2cluster -cfs -create -filesytem data -disk /dev/sde #<DB2 Install Path>/bin/db2cluster -cfs -create -filesytem log -disk /dev/sdf
The DB2 Install Path in this deployment would be /opt/ibm/db2/V9.8. The data and log file systems will be created under /db2fs by default, and will be accessible on all hosts in the DB2 pureScale instance.
- Modify the owner of the file system to be the DB2 instance owner so it
has full access to this file system. In this case,
db2sdin1is the instance owner's name anddb2iadm1is the instance owner's group name.
#chown db2sdin1:db2iadm1 /db2fs/data #chown db2sdin1:db2iadm1 /db2fs/log
- Start the DB2 instance by issuing the
db2startcommand. You can see the state of the DB2 pureScale instance at any point by using thedb2instancecommand.
> db2start 04/19/2010 11:02:08 0 0 SQL1063N DB2START processing was successful. 04/19/2010 11:02:08 1 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful.
You can view the state of a DB2 pureScale cluster using thedb2instance -listcommand. - Create the database and move the logs to the log file system. The
following commands must be run from member hosts, not CF hosts.
> db2 create db testdb on /db2fs/data > db2 update db cfg for testdb using newlogpath /db2fs/log
- Catalog client connections to any active pureScale members and connect
to the database.
Using the DB2 pureScale Feature
There are many advantages to the DB2 pureScale Feature. This next section provides use cases that demonstrate that added value. The simple deployment of DB2 pureScale Feature already demonstrated that it can help to reduce total cost of ownership.
The DB2 pureScale Feature allows the ability to add members to the configuration quickly and without any data redistribution requirements. The DB2 installation binaries are automatically stored on the IIH and therefore do not requiring access to the original installation media when members are being added. You can simply add a member by stopping the instance and running the following command from the IIH:
db2iupdt -d -add -m ServerX:ServerX-10ge db2sdin1 |
Similarly, a member can be removed by running the following command from the IIH:
db2iupdt -d -drop -m ServerX:ServerX-10ge db2sdin1
You can transparently start or quiesce members to the application in such
a way that the application is unaware a change has occurred.
The DB2 pureScale Feature provides the ability to dynamically distribute a
workload across all the active members, based on the utilization
characteristics of the different machines. Multi-threaded CLI applications
will, by default, have connection level workload balancing without any
changes. This workload balancing can be modified such that it applies at
the transaction level as opposed to the connection level. For
multi-threaded Java applications,
enableSysplexWLB=true can be changed in the
connection string to take advantage of transaction level workload
balancing.
As additional members are started, clients will automatically route to the new member without any interruption of service. Also, members can be stopped, as per the instructions under stealth maintenance, without the application knowing this operation has even occurred.
You can also configure clients to have a preference to which member it should connect to. This feature is referred to as client affinity and can be beneficial if a partitioned workload already exists.
To take advantage of DB2 pureScale features such as transaction level workload balancing or client affinity, the minimum client level should be 9.7 fix pack 3 or the correlating JCC level. To correlate the JCC levels included in the various fix pack levels, see the Resources section for more details.
In many cases it is critical to apply maintenance to a system, but you don't want any negative impact to the client applications. Stealth maintenance lets all transactions on a member complete and then transparently routes that application to another member. For example, to drain member 1, you can run the following command:
db2stop member 1 quiesce |
In some cases you may encounter situations where a user session for which a unit of work (UOW) was started but was not committed or rolled back. Unless a timeout value is specified, the db2stop quiesce will have to wait for that UOW to be completed before stopping that member. For situations like this, you can specify a timeout value, for example - ten minutes, which would allow the application ten minutes to complete the UOW. If, after ten minutes, the UOW is not completed, then the DB2 software will automatically force off that application. For any applications that completed within the ten minutes, they will have been automatically re-routed to the active members when they completed their UOW. To drain member 1 with a ten minute timeout, you can run the following command:
db2stop member 1 quiesce 10 |
One of the significant value propositions of the DB2 pureScale Feature is the high availability characteristics integrated into the architecture. All necessary resources are automatically monitored by the DB2 pureScale cluster services and restarted as needed. Applications that are connected to a failing member will automatically be re-routed to an active member where the application can re-issue any failed transactions. Applications connected to a non-failing component will not be impacted.
One of the distinguishing factors of the DB2 pureScale Feature compared to most competing technologies is that no cluster-wide freeze occurs when a member fails. In fact, only data in the process of being updated on the failing member is temporarily unavailable until recovery is complete. Applications on active members trying to access the locked data on the failing member will be briefly in a lock-wait state, and by default will not receive any errors. The recovery will be completed quickly so that data availability through a member failure will look similar to the hypothetical representation shown in Figure 22.
Figure 22. Typical data availability pattern during member recovery
While the DB2 pureScale Feature inherently brings a local high availability solution, many customers will also require a disaster recovery solution to meet their business continuity requirements. The DB2 pureScale Feature uses remote disk mirroring technology and is also designed to work with database replication products, as shown in Figure 23.
Figure 23. Typical disaster recovery setup
If the entire primary site running a DB2 pureScale instance fails, the remote site can be used to allow business operations to continue. The DB2 pureScale Feature can use the traditional database backup, restore and roll-forward functionality for disaster recovery solutions.
Using CF and RDMA technologies, the DB2 pureScale Feature for Enterprise Server Edition provides a database solution that can scale effectively to meet the growing and dynamic needs of different organizations, including your most demanding customers. Additional members can be added to the DB2 pureScale environment to meet the demands of peek processing times without any impact to existing applications.
The DB2 pureScale Feature automatically balances the workload across all DB2 members in the cluster without any application changes, taking full advantage of the additional processing capacity. If a DB2 member fails, applications will be automatically routed among the other active members. When the failed member host returns, applications will be transparently routed to the restarted member.
The DB2 pureScale Feature's design and capabilities can help reduce total cost of ownership compared to other solutions, allowing for a simplified deployment and maintenance model. The installation of the DB2 pureScale Feature manages the deployment and configuration of all the bundled software components to all the hosts in the DB2 pureScale environment. Once the DB2 pureScale Feature environment is up and running, its operating status is monitored and maintained easily from any of the active members.
We would like to acknowledge the following additional contributors to this article:
- Serge Boivin, DB2 LUW Information Development
- Jason Shayer, DB2 LUW Information Development
- Matthew Huras, Lead Architect, DB2 LUW
Appendix A: Configuring pureScale cluster to support RDMA over Ethernet (RoCE)
As discussed previously in this article, the DB2 pureScale Feature leverages a 10 gigabit Ethernet network for RDMA over Ethernet to allow optimal communication between members and CF.
RDMA over Ethernet is an alternative to InfiniBand, and is supported for System X / Linux implementations of pureScale. RDMA over Ethernet is an efficient and light-weight transport layered directly over Ethernet.
This appendix describes the high-level steps to deploy a 10 gigabit Ethernet network (for cluster inter-connect) to support RDMA over Converged Ethernet (RoCE).
- Deploy the 10 gigabit Ethernet switch that supports priority-based flow control and loss-less Ethernet similar to a typical switch deployment.
- In this configuration, a single 10 gigabit Ethernet card (MT 26448 with 2ports) is attached to each node in the cluster.
- Use SFP+ cables to connect the switch and hosts.
- Install OFED 1.5.2 (OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.2.PTF.604678). OFED (Open Fabrics Enterprise Distribution) is a device driver package for Linux that installs VERBS, Utils, uDAPL, RDMA CM, Mellanox components, and others. For detailed instructions on installing OFED, please refer to the 9.8.0.3 Information Center and corresponding tech note.
- Edit the network scripts as follows under /etc/sysconfig/network/ for
each node to assign an IP for each interface, or use YaST to configure
interfaces.
BOOTPROTO='static'If you don't configure this then you will get
BROADCAST=''ETHTOOL_OPTIONS=''IPADDR='192.168.1.103/24'MTU=''NAME='MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'NETWORK=''REMOTE_IPADDR=''STARTMODE='auto'DAT_INVALID_ADDRerror. - Edit the /etc/hosts file on each node for routing
purposes, similar to the
following:
#10ge Network192.168.1.100 coralinst07-10ge
192.168.1.101 coralinst08-10ge - On each node ensure the /etc/dat.conf file has a format similar to the
following:
ofa-v2-roe u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth4 0" ""Hereofa-v2-roeis the device name that is used to make a connection.Dapl2.0isDAT2.0, andeth4is the interface name. If you don't configure this then you will getDAT_INTERNAL_ERROR. /etc/init.d/openibdrestart- Ensure the
openibdis running usingchkconfig -a openibd, and all the modules and the devices are loaded usingservice openibd status.
You can validate the state of the RoCE on each node by running the
ibv_devinfo or
ibstatus commands as a root user. Verify that
the ports, or ports configured, are active and that the link is up.
For example:
port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet |
Also, you should perform a ping test using the addresses and hostnames
defined in the /etc/hosts file. You can also do
dtest, which is a utility that comes with
OFED.
Run dtest -P ofa-v2-roe on host 1, and check if
you are able to see a listening connection, and from host2 run
dtest -P ofa-v2-roe -h
<host1-10ge>.
You should noticed that the connection has PASSED.
If the host where the CF or the member resides has more than 64 GB of
memory, then the Mellanox HCA driver (mlx4_core) module's parameter
log_mtts_per_seg must be increased from 3 (the
default) to 7 for larger memory registrations.
To increase the size, issue the following command as root:
- On SUSE:
echo "options mlx4_core log_mtts_per_seg=7" >> /etc/modprobe.conf.local
- On RHEL:
echo "options mlx4_core log_mtts_per_seg=7" >> /etc/modprobe.conf
For this change to take effect, you must reboot the server. To check if
your change is effective on the module, enter:
<host-name>/sys/module/mlx4_core/parameters # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
Subnet manager is not required for RoCE, but an IP should be configured on the switch so that RSCT/TSA can monitor the associated network resources.
The following example shows you how you can make the Switch IP a default gateway to the RoCE network.
Assign an IP on the RoCE switch, and use route add, or use the following code:
coralinst07:~ # route add -net 192.168.1.100 netmask 255.255.255.0 gw 192.168.1.3 ==> |
This is not persistent. To make it permanent, you will need to append to
the /etc/sysconfig/network/routes file something like
192.168.1.0
192.168.1.3 eth-id-00:02:c9:08:28:10,
and query using the route command. This will
require network service restart, as follows:
/etc/init.d/network restart.
coralinst07:~ # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.1.0 192.168.1.3 255.255.255.0 UG 0 0 0 eth2 192.168.1.0 * 255.255.255.0 U 0 0 0 eth2 9.26.92.0 * 255.255.252.0 U 0 0 0 eth0 loopback * 255.0.0.0 U 0 0 0 lo default rsb-v94-hsrp.to 0.0.0.0 UG 0 0 0 eth0 |
Appendix B: Configuring pureScale cluster to support InfiniBand
InfiniBand is a switched fabric communication link used for high-speed communication. Its features include high-throughput, low-latency, and also offers point to point bidirectional links. Like RoCE, OFED is also required for InfiniBand.
This appendix shows you the high-level steps to deploy InfiniBand.
- Deploy an IB switch (Mellanox IS5030 with 36 ports). You will need to enable the subnet manager (SM) on the switch, which also needs a separate license to activate it.
- In this configuration, a single pMT 26428 IB card with two ports is attached to each node in the cluster.
- QSFP cables should be used to connect to the switch and the hosts.
- Install OFED 1.5.2 (OFED-IBM-DB2-pureScale-PTF-1.5.2-4.1404.2.PTF.604678). OFED (Open Fabrics Enterprise Distribution) is a device driver package for Linux which installs VERBS, Utils, uDAPL, RDMA CM, Mellanox components, and so on.
- Edit the
/etc/dat.confto add the following lines in the configuration file:
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
Here,ofa-v2-ib0is the IB device name to make a connection, anddapl.2.0is uDAPL. If you do not configuredat.conf, then you will getDAT_INTERNAL_ERROR. - Edit
/etc/sysconfig/network/Ifcfg-ib0to configure the static IP for ib0 interfaces. For example, forifcfg-ib0, use the following:
DEVICE=ib0 BOOTPROTO='static'IPADDR='10.1.1.154' #replace with IP address IB port is to use NETMASK='255.255.255.0' #Change if IB is have address in a large range like class CSTARTMODE='onboot' WIRELESS='no'
- Add the netname and the IP configured in /etc/hosts on all the
machines in the cluster. For
example:
10.1.1.130 host130-ib0
10.1.1.131 host231-ib0 - Run
/etc/init.d/openibd restartto load the modules. - Ensure that the
openibdis running usingchkconfig -a openibd, and that all the modules and devices are loaded usingservice openibd status.
You can validate the state of the InfiniBand on each node by running the
ibv_devinfo or
ibstatus commands as a root user. Verify that
the ports, or ports configured, are active and the link is up.
For Example, for port 1:
state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 21 port_lmc: 0x00 link_layer: IB |
You should also perform a ping test using the addresses and hostnames
defined in the /etc/hosts file.
You should also run
dtest, which is a utility that comes with OFED.
Run dtest -P ofa-v2-ib0 on host 1, and ensure
you see a listening connection. From host2, run
dtest -P
ofa-v2-ib0 -h <host1-ib0>.
You will see the connection has PASSED.
If the host where the CF or the member resides has more than 64 GB of
memory, the Mellanox HCAdriver (mlx4_core) module's parameter
log_mtts_per_seg must be increased from 3 (the
default) to 7 for larger memory registrations.
To increase the size, issue the following command as root:
- On SUSE:
echo "options mlx4_core log_mtts_per_seg=7" >> /etc/modprobe.conf.local
- On RHEL:
echo "options mlx4_core log_mtts_per_seg=7" >> /etc/modprobe.conf
For this change to take effect, you must reboot the server. To check if your change is effective on the module, run:
<host-name>/sys/module/mlx4_core/parameters # cat /sys/module/mlx4_core /parameters/log_mtts_per_seg |
Learn
- Get in depth help with DB2 from DB2 for Linux, UNIX, and Windows best practices.
- Learn about DB2 for Linux, UNIX, and Windows area on developerWorks, to get
the resources you need to advance your DB2 skills.
- Correlate the JCC levels included in the various fix pack levels.
- Learn more about Information Management at
the developerWorks
Information Management zone. Find technical documentation, how-to
articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Follow developerWorks on
Twitter.
Get products and technologies
- Build your next
development project with IBM trial
software, available for download directly from
developerWorks.
-
Evaluate IBM products in the
way that suits you best: Download a product trial, try a product online,
use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to
implement Service Oriented Architecture efficiently.
Discuss
- Participate in the discussion forum.
- Check out the developerWorks
blogs and get involved in the developerWorks
community.
Aslam Nomani has been with the IBM Toronto Lab for 15 years with most of that time spent with the Quality Assurance team. His main area of focus has been on high availability and disaster recovery solutions. Aslam is currently the Quality Assurance Architect for the DB2 pureScale Feature. Aslam has published over 20 papers related to high availability and disaster recovery solutions.
Pandu Mutyala holds a Masters in Information Sciences and Technology from the Missouri University of Science and Technology. He has been a member of the DB2 Quality Assurance team since 2010, and is responsible for testing of DB2 pureScale software on Linux operating systems.
Yugandhra Rayanki has been with IBM for over 6 years, working with different teams within the DB2 Quality Assurance group. He is currently working with system verification testing for DB2 pureScale and he is an IBM DB2 Certified Advanced DBA. He has special interests in the areas of HADR, backup, and recovery.
Aimin Wu is the usability focal point of the IBM System Optimization Competency Center (SOCC) including management, maintenance, and monitoring. She spent the first 10 years of her IBM career with the DB2 install team, starting with Platform Installer on UNIX and Linux. She was the architect of DB2 Install: Up & Running on UNIX and Linux before she moved to the SOCC.




