Each installment of Innovations within reach features new information and discussions on topics related to emerging technologies, from both developer and practitioner standpoints, plus behind-the-scenes looks at leading edge IBM® WebSphere® products.
Putting substance behind the buzz
One thing we have plenty of in the enterprise software industry are buzzwords. Overwhelming at times, buzzwords are necessary in order to expand the vocabulary we use to describe the solutions and tools available to solve an ever-evolving set of business problems. Without this expansion, many of these concepts would struggle to leave their infancy. One concept for which we have been attempting to champion a very specific meaning is the use of the term elastic to describe an enterprise solution.
It's easy to fall into the trap of using the idea of elasticity to make a point about the set goal for a given solution. In its simplest form, a solution might be elastic by simply enabling more resources to be added or removed without bringing the system offline. For the sake of creating a higher and more useful standard, I'd like to propose a more ambitious goal of a specific definition.
Elasticity in a system or component of a system (I'll use software as an example, since I work with IBM WebSphere eXtreme Scale every day) implies three specific degrees of freedom:
Now, before you label these as buzzwords or empty concepts, allow me to put some substance behind these ideas.
Scaling with no reasonable limitation
We shouldn't expect much controversy regarding the idea that an elastic system can be scaled up and down without a significant effect on the availability of the system during these operations. However, I believe we should also expect that the system itself not place any real restriction on a reasonable scale-up scenario. By this, I mean that the infrastructure itself should be architected to enable the continued growth of the system and make the new resources available with little or no overhead. This implies the possibility of true linear scaling.
We've addressed the concept of elasticity within WebSphere eXtreme Scale by considering the effects of extremely large grids on every aspect of the product. A few examples can illustrate this nicely:
- First, the architecture of the grid membership infrastructure itself is componentized into smaller soluble and containable problems of scale. Rather than wrangling thousands of servers into a single core group, the catalog service (an administrative process which handles the structure of the grid) divides the members into groups of 20. Each of these individual groups then runs a membership view algorithm involving heart-beating, which has a proven track record and shares function with IBM WebSphere Application Server. An elected "leader" of this smaller group keeps the catalog service up to date on the status of the group, which then only needs to stay in contact with 1/20th of the total members of the grid.
- Another example is the client interactions with the grid itself. One question that comes up often is the possible bottleneck that a single administrative process provides, such as the catalog service. Catalog services can be duplicated and clustered, as well, but that's simply for redundancy. The truth is that a single catalog service can actually handle the needs of a nearly unlimited number of clients because those clients interact with the catalog service only once to bootstrap into the grid. In that interaction, the catalog service returns information about the grid, including a complete routing map defining the location of all grid partitions and the associated key space for each. After this, the clients interact directly with the partitions and even keep this routing table up to date through subchannel interactions during the normal transaction process. The catalog service is then free to focus its attention on simply managing the balance and membership of the grid as resources are added and removed.
With approaches like these, we've been able to effectively scale a grid to an arbitrarily large size. In the lab, we've achieved a 1,500 container grid with no real difference in perceived performance. After that, we simply ran out of time to go further, but there is no specific or reasonable limitation to this scaling. This is an important factor to truly considering a solution to be elastic.
It's important to know that this doesn't imply that EVERY deployment of an elastic infrastructure will provide the overall application with linear scaling as resources are added. There are still considerations regarding the logic and business being conducted within that infrastructure, and whether or not they employ scalable extreme transaction processing fundamentals. In this regard, the enterprise application itself must also have elastic characteristics. An elastic infrastructure should, however, provide the plumbing to effectively achieve these goals.
Fault tolerance and self-healing
If you're going to expect a deployer to trust your solution to scale indefinitely, you must also tolerate the events that occur with greater probability and frequency as a system grows, such as the addition or loss of nodes due to maintenance or fault, network faults and changes, and so on. With more resources comes a greater chance for failure, and an elastic system must be able to overcome these failures in a predictable and efficient manner, while again returning to a state of fault tolerance, if possible.
Continuing with our WebSphere eXtreme Scale data grid example, as you grow to grids of hundreds or thousands of container processes, the loss or maintenance of one of those processes is more and more probable. Through replication -- which is a core competency of WebSphere eXtreme Scale and similar in-memory data grid offerings -- these events can be tolerated. Not only that, but since the placement and migration of the data is completely transparent behind the "black box" of the WebSphere eXtreme Scale client APIs, a new replica is automatically created and fault tolerance is achieved again.
Elasticity needs to have this conceptual addendum in order to be truly useful as deployments grow and become more complex.
Specialized needs regarding the administration and maintenance of a system can be subtle when considering the meaning of an elastic infrastructure. However, similar to the requirement of fault tolerance as systems grow and become more complex, you must also consider the ability of the deployer to perform common administrative tasks.
The key concept here is that the configuration and maintenance of each node should be either identical or very minimally different. You shouldn't expect the deployer to provide a list of all member machines or processes for the system to operate. There should be some level of automatic discovery and management based on a common set of configuration artifacts.
In the case of WebSphere eXtreme Scale, the approach is fairly straightforward. Configuration information focuses on the structure and characteristics of the grid itself, not on any details of the specific member processes. For example, you configure how many partitions to split the data into, and how those partitions should be replicated. Given this information, WebSphere eXtreme Scale maps that to the available grid members and enforces the policies set forth in the configuration. The exact same set of configuration artifacts are provided to each grid member when started, and the details of that member's place in the grid's world is managed and determined automatically.
This philosophy is carried down throughout the administration and maintenance spectrum, with each interaction designed to separate the details of the physical grid from the logical structure of the grid constructs.
We can find many more examples of this, such as the decoupling of replica placement through the use of the zone abstraction, or the ability to upgrade the actual grid code level without bringing the grid itself offline. The key concept is that administrative tasks should have a constant complexity as the system scales outwards, or at least as close to constant as possible. From this, you can see how nicely elastic software and elastic hardware (that is, virtualization and cloud deployments) can dovetail to provide a new level of freedom to enterprise solutions.
In praise of buzzwords
It's probably safe to say that even our most pervasive foundational technologies within computing started out as some sort of buzzword. We simply need to strive to define meaning and useful purpose when adding a new concept or goal set to our vocabulary. In this way, I think it's clear that elasticity in enterprise solutions can be a valuable concept when clearly defined and thought through to logical conclusions. We have consistently tried to relate concrete and useful meaning when we talk about WebSphere eXtreme Scale as an elastic data grid, and will strive to continue to do so as we apply these concepts to other solutions that are designed to create truly flexible infrastructures.
- WebSphere eXtreme Scale product information
- WebSphere eXtreme Scale Information Center
- IBM developerWorks WebSphere