Skip to main content

Introduction to clustering on IBM eServer OpenPower 710

Examples for high-performance and high-availability clustering

John Engel (engel@us.ibm.com), Linux on POWER Technical Consultant, IBM eServer Solutions Enablement
John Engel is a Linux technical consultant for the IBM eServer Solutions Enablement organization at IBM. He is based in Rochester, MN. John's main role is to help solution developers bring their applications to Linux on POWER. While working at IBM, he has also held various positions in Linux software development.

Summary:  This paper introduces basic clustering components and software tools that can be used to build a Linux™ cluster on IBM® eServer® OpenPower™ 710 by utilizing its 64-bit POWER5™ architecture. This paper is intended as an introduction for someone interested in building a new Linux cluster running on OpenPower 710 servers or converting an existing cluster to OpenPower 710 servers.

Date:  25 Jan 2005
Level:  Introductory
Activity:  506 views

Introduction

Clustering, which is the process of networking many compute nodes together to act as a single computing solution, has been popular on Intel systems running Linux for years. With the announcement of IBM eServer OpenPower 710, a Linux on POWER-based cluster is an ideal solution because of the 710's one- or two-way 1.65GHz-based POWER5 64-bit processor. This article first introduces some general clustering concepts and software and then provides examples of how the OpenPower 710 can be used in the two most common types of clusters: high performance clusters and high availability clusters.

IBM eServer OpenPower 710

The IBM eServer OpenPower 710 is a high volume entry-level server that uses the POWER5 architecture. It is a 2U 26" rack-mounted server that can contain up to two 1.65 GHz processors. The minimum amount of memory is 512Mb which can be expanded up to 32G. It has dual-port Gigabit Ethernet for network connectivity. POWER5 architecture contains Simultaneous Multi threading (SMT) which allows two threads to be executed at the same time on a single processor. Like other POWER5 processor-based hardware, the OpenPower 710 also supports micro-partitioning and virtual I/O functions, which can be enabled as features when you purchase the server or through the Capacity Upgrade on Demand process.


Cluster components

A cluster can consist of any number of components that work together to perform a certain task. A quick overview of these components is presented in order to provide a better understanding of the example clusters in this paper.

Cluster management

Cluster management involves "keeping tabs" on each of the nodes in a cluster by having one or more management nodes monitor the other nodes, such as compute or storage nodes. The management node is also responsible for performing software updates on the other nodes. There are a large number of tools available for cluster management, ranging from minimal tools such as webmin to larger packages such as IBM Cluster System Management (CSM). For more information on these tools, follow the links under "Cluster management" in the Resources section.

Monitoring

Monitoring nodes of a cluster is a sub-component of cluster management that tracks the status of nodes. Heartbeat is an application used to monitor nodes in a cluster that should be run on its own private network or serial link to eliminate points of failure. When the Heartbeat application detects that a node has gone down, the proper action is performed to ensure cluster stability. For more information about Heartbeat, refer to the link under "Monitoring" in Resources.

File systems

Linux offers many different file systems, but a distributed file system provides optimal performance in a clustered environment. Concurrent access to data from all nodes in a cluster and the high availability built into some distributed file systems are a couple of reasons why they are ideal for a cluster environment. Some common distributed file systems are IBM General Parallel File System (GPFS), Intermezzo, and OpenAFS. For more information on these, refer to the links under "File systems" in Resources.

Volume management

Volume management is the ability to handle storage dynamically without negatively impacting a cluster's storage system. A scenario in which volume management would be useful is when you have a storage server that has filled up with data and storage needs to be added. Volume management allows you to add storage to the existing file system without disrupting the storage server. A couple of facilities are Logical Volume Management (LVM) and the Enterprise Volume Management System (EVMS). For more information on these facilities, follow the links under "Volume management" in the Resources section.

Job management

Job management ensures resource management in a cluster. Job management software manages the run jobs that can be run in a cluster and decides which nodes have the proper resources available to run these jobs. Two available job management packages are Balance and Torque. For more information on Balance and Torque, refer to the links under "Job management" in Resources.

Schedulers

Scheduling is a sub-component of job management that handles the assignment of jobs to resources in a cluster. In a high performance cluster, a high priority is placed on scheduling, as it is important to maximize CPU utilization across all nodes when working with complex computational problems. Condor and Maui are examples of schedulers that are available for Linux. For more information, see the links under "Schedulers" in the Resources section.

Message passing

Message passing involves the efficient passing of data between nodes in a high performance cluster. The Message Passing Interface (MPI) defines a library specification that describes how applications should be written to take advantage of parallel computing concepts. Mpich is a software package that uses MPI and provides for efficient parallel computing. For more information on MPI and Mpich, refer to the links under "Message passing" in Resources.


High performance cluster example

The OpenPower 710 is an ideal solution for high performance clustering with its POWER5 processor-based 64-bit architecture and SMT capabilities. In this example, the Linpack benchmark suite will be the high performance application that runs in the cluster. The cluster components we will focus on are node management, monitoring, and scheduling.

Objective

The high performance cluster described in this example consists of five OpenPower 710 servers: one management node and four compute nodes. Instead of installing a high performance application package, the Linpack benchmark is installed on this cluster in order to test performance. The Linpack benchmark is designed to solve a large number of dense linear equations. You will find a link to a parallel implementation of this benchmark in Resources.

Details

CSM software runs on the management node and provides a single point of control for node management. The management node will communicate with the compute nodes on the "management" network, which is a private network running on eth1 as shown in Figure 1. Definitions for each of the compute nodes exist on the management node. CSM provides the flexibility to define the list of applications that need to be installed on each compute node. This can be tuned to support the type of high performance software that is needed. After the compute nodes are installed, the management node monitors the compute nodes and provides software updates on the compute nodes using functions provided by CSM.


Figure 1. High performance cluster
High performance cluster

Parallel computing involves combining large amounts of data and spreading that data across each of the compute nodes for faster processing. As shown in Figure 1, the Message Passing Interface (MPI) library specification passes the data between the nodes. The parallel version of the Linpack benchmark requires that an MPI library be installed. In this example we use the MPICH package. (See Resources for a link.)

Communication between the compute nodes occurs on the "compute" network, which is a private network running on eth0, as shown in Figure 1. Monitoring is important in a high performance cluster since it is essential to know when a compute node has failed or when the desired performance is not being achieved. In this example, the Heartbeat application is installed and run over a serial link between the nodes in the cluster. We chose a serial link to reduce the points of failure.

You should now have a small high performance cluster set up that can exercise the parallel built Linpack benchmark. If a compute node crashes, the management node can log it and you will know that while the benchmark that ran is not accurate, the proper action is being taken to restore the failed compute node. Once the node is up and running again, you can rerun the benchmark and analyze the results.


High availability cluster example

A high availability (HA) cluster contains enough redundant resources to provide system functionality in case of any component failure. This example presents a simple HA cluster that consists of three nodes: a master node, a backup node, and a management node. The master and the backup node is connected to an IBM DS4500 storage server. This cluster is typical in a small Web serving environment.

Objective

With redundancy being the key feature of an HA cluster, the components that are essential in this example are node management, a distributed file system, volume management, and monitoring. This example shows how fault tolerance is possible using two OpenPower 710 models: one acting as a primary server and the other acting as a backup server. Monitoring software is used to detect whether the primary server goes down or not.

Details

As with the high performance cluster described above, CSM will be installed on the management node. The management node is connected to the master and backup nodes on a private management network running on eth1, as shown in Figure 2. The management node installs both the master and backup nodes and provides software updates as required.


Figure 2. High availability cluster

The Heartbeat package provides the software to perform the monitoring of the nodes. In order to reduce the number of points of failure, a serial connection is used instead of a network connection to link the nodes that will be monitored. Heartbeat is configured to switch over to the backup node if the master node happens to go down. This is a fairly high-level view of how the Heartbeat application works, but more information can be found at the High-Availability Linux Project Website.

GPFS is the distributed file system running on the DS4500 storage server. The storage server is connected to the master and backup nodes with a high speed myrinet or fibre-channel connection, as shown in Figure 2. The combination of GPFS and the DS4500 provides for a redundant storage subsystem in case of disk failure. GPFS allows both the master and backup nodes to have concurrent access to all of the files in the cluster.

Volume management is handled by installing LVM on the cluster. LVM provides the ability to dynamically add and remove storage to the DS4500 without interrupting service to the running cluster. For instance, if the file system in which you are storing your Web content fills up, LVM allows you to add another disk to the file system to increase storage without impacting your existing data.

You should now have all the components necessary to handle a failure of the master node, if it should occur. If the Heartbeat application notices that the master node is not available, the backup node will then become active with the proper network configuration. The backup node now handles the Web serving at this point, and no failure is noticed from the outside, routable, network. Meanwhile, maintenance can be performed on the master node to analyze why it went down.


Summary

The OpenPower 710, with its 64-bit POWER5 processor-based architecture, is an ideal solution for high availability and high performance clusters. In addition, much of the Linux clustering software that is currently available has been ported to the 64-bit POWER architecture, making this entry-level server an affordable 64-bit platform for a Linux on POWER-based cluster solution.


Acknowledgements

I would like to acknowledge Linda Kinnunen for her document template and helpful reviews and Brent Baude and Steve Dibbell for their technical reviews of this document.


Resources

About the author

John Engel is a Linux technical consultant for the IBM eServer Solutions Enablement organization at IBM. He is based in Rochester, MN. John's main role is to help solution developers bring their applications to Linux on POWER. While working at IBM, he has also held various positions in Linux software development.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=90499
ArticleTitle=Introduction to clustering on IBM eServer OpenPower 710
publish-date=01252005
author1-email=engel@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers