Innovations within reach: Stretching the limits with elastic software

New concepts and strategies require changes in vocabulary. With a move toward lower cost, highly flexible, cloud-friendly architectures, the concept of elasticity has been established for an enterprise IT solution. This article explores a specific definition of elasticity by describing examples present in IBM® WebSphere® eXtreme Scale, an elastic in-memory data grid. This content is part of the IBM WebSphere Developer Technical Journal.

Share:

Robert Wisniewski, Technical Evangelist, IBM

Robert Wisniewski is a Software Engineer specializing in performance and scalability. He has previously worked on WebSphere Applications Server performance for 7 years focusing on all areas of the product from EJB/JPA to autonomic computing and benchmark design. His current position as Technical Evangelist refocuses this experience on customer scenarios and the application of XTP strategies in the real world.



03 March 2010

Also available in Chinese Russian Japanese Spanish

Each installment of Innovations within reach features new information and discussions on topics related to emerging technologies, from both developer and practitioner standpoints, plus behind-the-scenes looks at leading edge IBM® WebSphere® products.

Putting substance behind the buzz

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See Cloud computing: Introduction to Infrastructure as a Service

One thing we have plenty of in the enterprise software industry are buzzwords. Overwhelming at times, buzzwords are necessary in order to expand the vocabulary we use to describe the solutions and tools available to solve an ever-evolving set of business problems. Without this expansion, many of these concepts would struggle to leave their infancy. One concept for which we have been attempting to champion a very specific meaning is the use of the term elastic to describe an enterprise solution.

It's easy to fall into the trap of using the idea of elasticity to make a point about the set goal for a given solution. In its simplest form, a solution might be elastic by simply enabling more resources to be added or removed without bringing the system offline. For the sake of creating a higher and more useful standard, I'd like to propose a more ambitious goal of a specific definition.

Elasticity in a system or component of a system (I'll use software as an example, since I work with IBM WebSphere eXtreme Scale every day) implies three specific degrees of freedom:

Now, before you label these as buzzwords or empty concepts, allow me to put some substance behind these ideas.

Scaling with no reasonable limitation

We shouldn't expect much controversy regarding the idea that an elastic system can be scaled up and down without a significant effect on the availability of the system during these operations. However, I believe we should also expect that the system itself not place any real restriction on a reasonable scale-up scenario. By this, I mean that the infrastructure itself should be architected to enable the continued growth of the system and make the new resources available with little or no overhead. This implies the possibility of true linear scaling.

We've addressed the concept of elasticity within WebSphere eXtreme Scale by considering the effects of extremely large grids on every aspect of the product. A few examples can illustrate this nicely:

  • First, the architecture of the grid membership infrastructure itself is componentized into smaller soluble and containable problems of scale. Rather than wrangling thousands of servers into a single core group, the catalog service (an administrative process which handles the structure of the grid) divides the members into groups of 20. Each of these individual groups then runs a membership view algorithm involving heart-beating, which has a proven track record and shares function with IBM WebSphere Application Server. An elected "leader" of this smaller group keeps the catalog service up to date on the status of the group, which then only needs to stay in contact with 1/20th of the total members of the grid.
  • Another example is the client interactions with the grid itself. One question that comes up often is the possible bottleneck that a single administrative process provides, such as the catalog service. Catalog services can be duplicated and clustered, as well, but that's simply for redundancy. The truth is that a single catalog service can actually handle the needs of a nearly unlimited number of clients because those clients interact with the catalog service only once to bootstrap into the grid. In that interaction, the catalog service returns information about the grid, including a complete routing map defining the location of all grid partitions and the associated key space for each. After this, the clients interact directly with the partitions and even keep this routing table up to date through subchannel interactions during the normal transaction process. The catalog service is then free to focus its attention on simply managing the balance and membership of the grid as resources are added and removed.

With approaches like these, we've been able to effectively scale a grid to an arbitrarily large size. In the lab, we've achieved a 1,500 container grid with no real difference in perceived performance. After that, we simply ran out of time to go further, but there is no specific or reasonable limitation to this scaling. This is an important factor to truly considering a solution to be elastic.

It's important to know that this doesn't imply that EVERY deployment of an elastic infrastructure will provide the overall application with linear scaling as resources are added. There are still considerations regarding the logic and business being conducted within that infrastructure, and whether or not they employ scalable extreme transaction processing fundamentals. In this regard, the enterprise application itself must also have elastic characteristics. An elastic infrastructure should, however, provide the plumbing to effectively achieve these goals.

Fault tolerance and self-healing

If you're going to expect a deployer to trust your solution to scale indefinitely, you must also tolerate the events that occur with greater probability and frequency as a system grows, such as the addition or loss of nodes due to maintenance or fault, network faults and changes, and so on. With more resources comes a greater chance for failure, and an elastic system must be able to overcome these failures in a predictable and efficient manner, while again returning to a state of fault tolerance, if possible.

Continuing with our WebSphere eXtreme Scale data grid example, as you grow to grids of hundreds or thousands of container processes, the loss or maintenance of one of those processes is more and more probable. Through replication -- which is a core competency of WebSphere eXtreme Scale and similar in-memory data grid offerings -- these events can be tolerated. Not only that, but since the placement and migration of the data is completely transparent behind the "black box" of the WebSphere eXtreme Scale client APIs, a new replica is automatically created and fault tolerance is achieved again.

Elasticity needs to have this conceptual addendum in order to be truly useful as deployments grow and become more complex.

Administrative simplicity

Specialized needs regarding the administration and maintenance of a system can be subtle when considering the meaning of an elastic infrastructure. However, similar to the requirement of fault tolerance as systems grow and become more complex, you must also consider the ability of the deployer to perform common administrative tasks.

The key concept here is that the configuration and maintenance of each node should be either identical or very minimally different. You shouldn't expect the deployer to provide a list of all member machines or processes for the system to operate. There should be some level of automatic discovery and management based on a common set of configuration artifacts.

In the case of WebSphere eXtreme Scale, the approach is fairly straightforward. Configuration information focuses on the structure and characteristics of the grid itself, not on any details of the specific member processes. For example, you configure how many partitions to split the data into, and how those partitions should be replicated. Given this information, WebSphere eXtreme Scale maps that to the available grid members and enforces the policies set forth in the configuration. The exact same set of configuration artifacts are provided to each grid member when started, and the details of that member's place in the grid's world is managed and determined automatically.

This philosophy is carried down throughout the administration and maintenance spectrum, with each interaction designed to separate the details of the physical grid from the logical structure of the grid constructs.

We can find many more examples of this, such as the decoupling of replica placement through the use of the zone abstraction, or the ability to upgrade the actual grid code level without bringing the grid itself offline. The key concept is that administrative tasks should have a constant complexity as the system scales outwards, or at least as close to constant as possible. From this, you can see how nicely elastic software and elastic hardware (that is, virtualization and cloud deployments) can dovetail to provide a new level of freedom to enterprise solutions.


In praise of buzzwords

It's probably safe to say that even our most pervasive foundational technologies within computing started out as some sort of buzzword. We simply need to strive to define meaning and useful purpose when adding a new concept or goal set to our vocabulary. In this way, I think it's clear that elasticity in enterprise solutions can be a valuable concept when clearly defined and thought through to logical conclusions. We have consistently tried to relate concrete and useful meaning when we talk about WebSphere eXtreme Scale as an elastic data grid, and will strive to continue to do so as we apply these concepts to other solutions that are designed to create truly flexible infrastructures.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Web development
ArticleID=470485
ArticleTitle=Innovations within reach: Stretching the limits with elastic software
publish-date=03032010