Solution isolation in service-oriented environments

Many IT environments are often designed and deployed in what you might call a vertical fashion. This means that each line of business has its own set of solutions, which are isolated from each other. Among other things, the concept of SOA is an attempt to establish a more horizontal view, promoting better reuse of IT functionality and operational resources across organizational boundaries. And while this is all good, of course, it requires extra care and due diligence to achieve operational excellence. This article describes some of the operational challenges introduced by this view and suggests approaches for how to address them, namely by isolating solution components from each other. Included are examples showing how this isolation can happen on many levels, either individually or in concert, all based on a set of prioritized criteria. This content is part of the IBM WebSphere Developer Technical Journal.

Share:

Andre Tost, Senior Technical Staff Member, IBM China

Andre Tost works as a Senior Technical Staff Member in the IBM Software Services for WebSphere organization, where he helps IBM customers establishing Service-Oriented Architectures. His special focus is on Web services and Enterprise Service Bus technology. Before his current assignment, he spent ten years in various partner enablement, development and architecture roles in IBM software development, most recently for the WebSphere Business Development group. Originally from Germany, Andre now lives and works in Rochester, Minnesota. In his spare time, he likes to spend time with his family and play and watch soccer whenever possible.


developerWorks Master author
        level

Fred Tucci (ftucci@ca.ibm.com), Senior IT Specialist , IBM China

Fred Tucci is a Senior IT Specialist in the Architecture and Integration area of IBM Global Business Services. As a middleware integration specialist for over two decades, Fred has led many complex I.T. projects to successfully implement IBM technologies including WebSphere Application Server, WebSphere ESB and WebSphere DataPower. Fred engages in many facets of a solution's life cycle including concept, design, build, test, performance tuning, implementation and production management, following a solution to its logical end. These days Fred has his head in the clouds pondering how to best leverage cloud technologies in large enterprise infrastructures to resolve the biggest problems.



Ozair Sheikh (ozairs@ca.ibm.com), IT Specialist, IBM

Ozair Sheikh is a WebSphere DataPower IT Specialist for IBM WebSphere Software Services team at the IBM Toronto Lab. He helps customers adopt and implement Enterprise Service Bus and Web services solutions with WebSphere DataPower Appliances. In his spare time, he is an avid hockey and basketball fan and enjoys spending time with family and friends. You can contact Ozair at ozairs@ca.ibm.com.



30 March 2011

Also available in Chinese

Introduction

The term “solution” is used in this article in a rather broad sense, representing services or collections of services that participate in addressing one particular business problem.

There has always been great focus on the architectural aspects of service orientation and how it influences the development of IT solutions. There has been less focus on the impact service-oriented architecture (SOA) has on an enterprise’s operational environment and its related procedures. One of the benefits of services is that they can often be reused across multiple lines of business and multiple IT solutions, meaning that the same logic serves many different uses and scenarios. This not only applies to the services themselves, but also to other supporting components, like mediations that exist in an Enterprise Service Bus (ESB).

Since all of these components exist somewhere in the operational environment, the underlying resources (machines, networks, processes, message queues, and so on) are effectively shared by solutions to a larger degree than what was traditionally the case. A positive effect of this is the ability to provide more centralized IT management and better utilization of physical resources than can normally be achieved without SOA. (This also impacts the operational organization, since there are now roles and teams that work horizontally, such as ESB developers, SOA governance, and so on, but delving further into the organizational impact is beyond the scope of this article.)


Operational risk as a factor of maturity

While service-oriented solutions promote loosely coupled systems, components that are of the same type often share the same operational unit. For example, mediations might virtualize distinct services, but they run on the same machine and possibly even within the same IT process. Or, business processes, though supporting different lines of business, might all use the same relational database to store process state data. Or, an ESB gateway uses a small set of IBM® WebSphere® MQ queues for all incoming and outgoing MQ traffic.

This operational coupling implies that solution components could compete for limited resources. One solution could lock up resources that are then no longer available to other solutions. Examples for these resources are threads, database connections, network connections, internal queues, heap-based memory, and so on. This creates problems for management and maintenance and, in the worst case, it can lead to outages of an entire environment caused by only one component.

If, for example, one solution uses up all network connections to a remote host that executes extremely slowly, those connections are no longer available to other solutions running on the same machine. If two solutions share a common queue for incoming traffic, and if a large number of messages are sent to that queue for one solution, then messages targeting the other solution might not come through quickly enough. Or, if a faulty administrative change was made for one solution, causing an entire process to crash or hang, then all solution components hosted by that same process will crash or hang too.

In a perfect world, of course, this would never happen. Solutions are expected to be thoroughly tested before going into production, and changes should only be made once any unwanted side effects have been precluded.

In the real world, however, there is always a margin of error. The risk of failure within any IT environment always depends on the level of maturity and governance that the used technology, the developed solutions, and the overall organization have achieved.

Unwanted consequences must be taken into consideration whenever new components are added to an operational environment, but for the reasons mentioned earlier it is especially important in the context of a SOA. The impact of a poorly managed change is no longer isolated to the hosting solution, and could include other solutions running in the same environment.


Making a case for solution isolation

To overcome the challenges mentioned above, you should design and utilize an isolation strategy for your service-oriented environments. The concept of isolation is not new and has been used frequently in IT solutions to enhance reliability, but can those traditional isolation strategies be used successfully in a service-oriented environment? The answer is yes but with one very critical caveat: isolating too much can devalue the deployment and run time benefits of an SOA.

Any isolation strategy, when taken beyond its logical usefulness, will incur additional operational cost. However, that marker comes earlier and can have a higher cost implication in a service-oriented environment. As such, care must be taken to find the right level of isolation that balances business needs and operational costs. There is no exact formula for doing this, but a few very important elements that need to be considered when creating an isolation strategy for your service-oriented environment are highlighted here. In addition, you’ll see a concrete example that puts these elements into practice.

As you can imagine, no two IT shops will have the same isolation strategy, but there are characteristics that every successful isolation strategy shares. Namely, the strategy:

  • Takes operational maturity and the associated risk into consideration.
  • Is dictated by the services themselves and the business context around those services.
  • Is dictated by the ESB run time technologies used for those services.

The rest of this article will help you define and customize an isolation strategy to meet your unique IT requirements. Be aware that it is imperative that this strategy be defined before any new environment is deployed, because it can greatly influence the way topologies are designed.

There are two types of isolation to consider:

  • Architectural isolation

    SOA promotes design principles like loose coupling, separation of concern, implementation encapsulation, and standardized contracts. Following these principles will help creating isolation layers between components, promote reuse, and enable the establishment of central points of control and management, like an ESB. Because this kind of isolation is well documented and typically well understood, we won’t focus on architectural isolation here.

  • Operational isolation

    What’s new here -- and something that is often overlooked -- is that you also want to isolate components operationally. Ideally, problems in one solution should never affect another. Components that are sharing IT resources at run time should be shielded from each other. This includes consuming components that are used only in one solution, the ESB that exposes virtual services, and the provider components themselves. Moreover, this group contains supporting components for monitoring, security, caching, or load balancing, just to name some examples.

    Operational isolation can happen on many different levels. This begins at a rather fine-grained level: configurations that ensure queues are not reused across solutions, or configurations that ensure separate connection pools are used between solutions. But it goes all the way to physically separating solutions by giving them their own hardware altogether.


Elements of an isolation strategy for service-oriented environments

Properly developing an isolation strategy requires a multi-dimensional view.

The first and most important dimension in this view is driven by the services themselves and the business needs for those services. In order to have a clear view of this dimension, a thorough understanding of your services and the characteristics, both technical and non-technical, is required. It is also crucial that a standard approach be used to gain an understanding of your services. A very helpful tool to define this dimension is a service classification system. Criteria of this classification system are outlined later in this article.

A second dimension that needs to be considered in an isolation strategy is the run time technologies on top of which the services will be layered. Like any other technology, service-oriented run time technologies have their own methods and best practices for deployment, configuration, partitioning mechanisms, topology, and management. Your engineering staff responsible for these technologies is an integral part of designing your isolation strategy. While the first dimension identifies what needs to be isolated and why, this dimension identifies how you use your run time environment to achieve that isolation.

Isolation criteria

With focus on the services and the underlying run time technologies, you will be able to define the right set of criteria defined in your strategy. That will effectively drive where components are placed and at which levels they are isolated from each other.

Many different parameters can be used, and each IT organization must identify its priorities and define the right isolation approach from there. As a starting point, a classification of solution components must be conducted, which helps determine the boundaries of what is being isolated.

One set of criteria for such a classification is based primarily on technical parameters. For example (in no particular order):

  • Physical location
  • Message formats
  • Network protocols
  • Frequency of change
  • Dynamicity of run time changes
  • Performance
  • Scale
  • Maintenance and currency strategy
  • Complexity
  • Availability
  • Message size
  • Ease of administration
  • Interaction pattern(s)
  • Security
  • Maturity.

Moreover, additional criteria driving isolation are related to business aspects. For example:

  • Business criticality
  • Impact and cost of outage
  • Channel
  • Business domain.

When it comes to determining solution isolation priorities, business level criteria often have higher priority than technical criteria. For example, a mandate is given that business critical solutions are to be completely isolated from non critical ones (and potentially from each other), since they incur the highest impact in case of an outage. This assumes, of course, that critical solutions are more mature and therefore less prone to failure.


Solution isolation mechanisms with WebSphere software

There are a number of mechanisms that enable solution isolation in environments that use WebSphere products. Below, two rather different sets of examples are described for implementation techniques you can apply, namely for IBM WebSphere Application Server and IBM WebSphere DataPower® Appliances. These mechanisms can be applied individually or as a group, depending on the set of prioritized isolation criteria described above.

WebSphere Application Server

The descriptions provided here are all at a rather high level. See Resources for details on the covered WebSphere Application Server mechanisms.

The load balancing and workload management capabilities of WebSphere Application Server, as well as its failover support, are the basis of many isolation mechanisms. These capabilities enable the distribution of solution components across multiple processes and multiple physical machines by defining a "clustered" topology.

Machine level

The simplest, but also the most costly way of isolating solution components from each other is to put them on separate hardware, typically according to a defined classification system.

In an extreme case, solutions are placed not only on separate hardware, but also in distinct WebSphere Application Server cells. In many cases, however, this is not a good use of IT resources, as it requires a lot of (underutilized) hardware and offers very little in terms of central management and maintenance.

To be sure, it is often advisable to have more than one WebSphere Application Server cell. Having a so-called maintenance cell enables upgrading or migrating existing environment without the need to incur an outage. Also, if the upgrade reacts negatively in production, you can quickly fall back to the other cell, significantly reducing recovery time.

WebSphere Application Server offers a way of deploying solutions to one logical target, and then to distribute them physically. In a nutshell, this is done by defining clusters that host certain solutions, or parts of a solution. These clusters are then hosted on one or more nodes (which are typically mapped to physical machines). In other words, solutions can be isolated on a hardware level by defining appropriate clusters and then deploying them to separate physical machines. Keep in mind, though, that the clusters would still all belong to one WebSphere Application Server cell, and there is additional risk associated with that; for example, if administrative changes are made at the cell level.

Process level

As mentioned above, WebSphere Application Server offers the ability to define clusters. A common way of distributing functionality across clusters is described in the so-called WebSphere Application Server "golden" topology (Figure 1). In this topology, separate clusters exist for applications, messaging, and support. Each cluster has one or more members, each of which is represented by a separate process or JVM.

Figure 1. WebSphere Application Server "golden" topology
Figure 1. WebSphere Application Server golden topology

Thus, on top of defining clusters for separate machines, you define cluster members to distribute solutions on a process level. This is done in part for scalability reasons, but an important aspect of it is increased availability due to redundancy and isolation of components.

Figure 1 shows an example with three clusters across two nodes, where each node contains three cluster members and each cluster has two members (one on each node). This provides very little isolation of solutions from each other, since all application-specific components are deployed to one cluster only. A failure in one cluster member can be compensated by using the other cluster member, but in a case where a misbehaving component causes problems, it is likely that those problems will occur in both.

An increased level of isolation can be reached by creating additional clusters or cluster members. For example, if your analysis of appropriate isolation criteria has led you to favor separating mission critical solutions from non-mission critical ones, you might consider creating a cluster for each. The solution components will then run in separate processes (since they are in different cluster members). Now, when a non-critical application uses up, say, all of the threads within a process, your mission critical solutions will not be affected at all.

Be aware, however, that creating additional clusters and cluster members comes with a price in terms of memory, CPU capacity, and other things, so you should not create a large number of them. Therefore, simply putting each application in its own cluster is not practical. For more information on best practices and considerations for expanding clusters, see Resources.

Application server instance level

So far, you have seen how to isolate solutions from each other by placing them on separate machines or in separate processes, but there are ways of isolating solutions also within a process, or, rather, within an application server instance.

Solutions deployed in WebSphere Application Server are generally based on Java EE. This is true for "regular" WebSphere Application Server applications as well as for BPEL processes running in IBM WebSphere Process Server, or mediations running in IBM WebSphere ESB, just to name two examples. These applications have access to a number of application server resources, some of which are user defined and some of which are system defined. Below are examples of resources that are frequently in contention and, thus, if properly configured, can help increase the level of isolation between components.

  • Threads

    There are various settings that influence how WebSphere Application Server allocates threads at run time. Some are always shared, like the thread pools used by the Web and EJB containers, and some are associated with specific resources, such as the thread pool used by a messaging engine.

    Configuring these thread pools is primarily a part of tuning an application server for optimal performance. When it comes to solution isolation, you should look at it from the perspective of ensuring that components that are part of the different solutions use their own thread pool. There are not many places where this can be achieved. One example is a Java™ EE WorkManager, which enables managing threads from within Java logic in the application server. Each WorkManager has its own thread pool, so that you achieve isolation by creating multiple WorkManagers and assigning them to different solutions.

    Another area where you can influence how many threads are available to a component (and the pool from which these threads are taken) is in messaging. WebSphere Application Server supports the concept of a messaging bus. All messaging resources (connections factories, queues, topics, and so on) are assigned to a named bus. A bus can span multiple nodes and machines, and has one or more servers or clusters as its bus members. A bus member in turn uses a messaging engine to actually process messages. Only one messaging engine can be active per cluster.

    In short, a bus uses messaging engines to process messages. And, you guessed it: you can configure the size of the thread pool that is available to a messaging engine. More importantly, you can also define which solution component uses which bus, thus ensuring that solution components using different buses also use different thread pools.

    Overall, however, the concern remains that misbehaving components can use and lock all available threads for a process, even if you have properly configured isolated thread pools. One way of addressing this is to make testing thread allocation and management part of your regular system test.

  • Connections

    Just like threads, connections are a limited resource and thus are pooled. There are different types of connections, such as connecting to a database via JDBC, JMS connections, as well as connections to any other external resource connected via a J2C resource adapter.

    Problems can occur if any single solution uses up all of the connections of one type, leaving no connections for other solutions to use. Configuring proper values for connection pool sizes is a matter of tuning an application server. Solution isolation is typically achieved by making sure that the components of the system associated with one solution do not share connection pools with other components running in the same server.

    How this is done depends on the type of connection that is to be isolated. For example, connections to a JDBC-backed database are pooled via the JDBC datasource that is used; JMS and other J2C connection pools are defined through the activation specification. Thus, in the example of JMS, solution components can be isolated from each other by using different activation specifications, or components using the same database can be isolated by giving them separate data sources.

    When using a product that runs on top of WebSphere Application Server, like IBM WebSphere Process Server, the deployment of solution components can result in the automatic generation of resources that use connection pools. For example, a WebSphere Process Server module leads to the generation of a number of activation specs that are used to manage the communication of the module’s components with each other and with other external partners. This leads to a default level of isolation that does not require any additional intervention.

    As mentioned earlier, keep in mind that knowledge about which connection pools are used by which components not only provides you with a view of the level of isolation that exists, but it also gives you a good starting point for tuning the system for good performance.

  • Queues

    The last example is messaging queues. Many solutions leverage asynchronous communication in one way or another, either explicitly (by containing logic that uses JMS, for example) or implicitly (by defining asynchronous interactions between WebSphere ESB mediation module components, for example).

    If solutions are sharing queues for their messaging needs, then there is a risk that one solution’s behavior could impact another. For example, assume a service provider offers the retrieval of customer data over a set of JMS queues. One consumer for this service exists that is used in an enterprise’s call center. The staff using this solution must be able to retrieve customer information very quickly. Another consumer is a solution that consolidates customer information across multiple legacy systems, running once a day, with no human intervention. This second consumer can flood the request queue of the service with a large number of messages in a short time, making it difficult for the call center solution to continue to perform as appropriate.

    In this example, isolation is required not between components of separate service providers, but between consumers. Here, isolation can be achieved by assigning separate queues to separate consumers, and ensuring that the service provider serves all queues in a prioritized manner. Be aware that in the example of the daily, batch-style consumer, additional throttling may be needed to control the number of messages that are concurrently being served.

WebSphere DataPower SOA Appliance

Over recent years, a new generation of purpose-built appliances has entered the enterprise IT landscape. These hardware devices are similar in nature to network devices like routers and switches in that they combine hardware and software in a turnkey package. However, they differentiate themselves from typical network devices as they offer capabilities higher in the network stack and are focused on application layer capabilities.

This new generation of appliances offers simplicity and speed of deployment, characteristics that every enterprise IT division desires. At the core of these appliances is hardware and software, and as such the need to consider and implement a solution isolation strategy applies.

This section explores IBM’s WebSphere DataPower SOA Appliances and outlines the available elements that can be utilized to implement a solution isolation strategy.

As stated earlier, the most extreme level and arguably the safest level of isolation is physical isolation; however, this strategy can translate into a high total cost of ownership that might not justify the business results. As such, this section will primarily focus on internal elements of WebSphere DataPower that can be applied to achieve isolation before resorting to physical isolation. (This should not be confused with the need for additional devices to meet capacity requirements. Capacity planning is a key discipline for a successful WebSphere DataPower implementation, but this topic is beyond the scope of this article.)

While identifying the internal components of WebSphere DataPower that play an important role in solution isolation, we'll look briefly at how the physical topology of a WebSphere DataPower environment also plays a role in it.

Like any other computing platform, WebSphere DataPower has a software stack, which includes operating system and software components that implement a set of capabilities. However, due to its purpose-built nature, the entire software stack (including the operating system) is highly customized or "right-sized" for its intended use. Consequently, some of the isolation methods available in software stacks running on general purpose operating systems are not available. The core elements of WebSphere DataPower’s operating system are not exposed, constraining a solution to a process, and assigning a finite amount of resources (memory, threads, and so on) is not possible. This requires a slight shift in focus when isolating solutions within WebSphere DataPower. This shift is toward creating behavior boundaries for a solution versus resource boundaries. It’s about ensuring that one solution's workload does not monopolize the WebSphere DataPower resources and impact other solutions. As stated earlier, capacity planning is an important element here and will dictate the number of physical devices that are required for a set of workloads. However, for the purpose of this article, assume that the proper capacity is planned for and provisioned. Thus, we will focus on intra-device isolation principles in a situation where a device is shared across many solutions. Essentially, you are concerned with protecting both the device itself and workloads from negative, unexpected situations.

Fortunately, WebSphere DataPower has a very powerful capability called service level monitoring (SLM) that can be utilized to define and enforce behavior boundaries. SLM enables the management of individual or sets of Web services. For example, you can define thresholds for request and response message throughput, triggering alerts and possibly start throttling once the threshold has been reached. This prevents one solution from consuming too many of the appliance’s resources and not leaving sufficient capacity for others.

Although perhaps obvious, it's important enough to mention that before any of the behavior boundaries are implemented, you need to understand the expected behavior of the solution and each service within WebSphere DataPower that is part of the solution. The behavior boundaries must be defined and implemented in the context of the solution’s non-functional requirements. The implementation must be validated through adequate pre-production testing and monitored in production to ensure the solution is not limited from executing properly.

Another important aspect of isolation within a device is the configuration artifacts needed to tell WebSphere DataPower what and how to process messages. WebSphere DataPower implements an object oriented model for its configuration, which leads to a significant number of independent objects ideally suited for reuse and sharing within a device. This reuse of configuration items simplifies the administration, configuration, and deployment of a WebSphere DataPower device and its capabilities. However, care needs to be exercised to ensure that no configuration item associated with one service can prevent other services from operating. The key configuration items that should be considered to ensure the proper level of configuration isolation will be discussed next.

For a detailed discussion of WebSphere DataPower and its internal services, see WebSphere DataPower SOA Appliance Handbook.

WebSphere DataPower provides a number of service types that can be configured. We’ll focus here on WebSphere DataPower’s Web Service Proxy service type. The Web Service Proxy is a feature rich WSDL-based proxy for Web service providers. Since all WebSphere DataPower service types have the same basic characteristics, isolation concepts discussed for the Web Service Proxy can be equally applied to other WebSphere DataPower service types.

Figure 2 outlines the key configuration objects that help create boundaries within a WebSphere DataPower device. These objects need to be considered when implementing an isolation strategy. Let’s look at each one.

Figure 2. WebSphere DataPower internal isolation points
Figure 2. WebSphere DataPower internal isolation points

Application domain

At the core of configuration isolation is the application domain (hereafter referred to as a domain). A WebSphere DataPower domain is a logical partition that provides the ability to isolate services. A domain contains one or more services, including the WebSphere DataPower configuration objects required for those services. By default, objects within a domain are not visible to other domains. As such, segregating your services across multiple domains within a WebSphere DataPower device is a good method for isolating and minimizing the risk of configuration changes. You must also take special care ensuring that “orphaned” objects or objects not required by services are cleaned up; otherwise, you will find that startup time will be longer, and configuration complexity when managing objects will increase.

The challenge is how to break down and group your services into a logical domain structure. The grouping options are numerous, but one that stands out is grouping the services through a bottom-up approach by understanding an existing service governance model, as reflected in the service’s development cycle. Services will often be developed and maintained as a set, and using this as the basis for creating your domains will provide a logical flow of services from pre-production to production.

A WebSphere DataPower domain can be created that is environment agnostic and self contained. As such, a domain can become a very convenient package for deployment that is not only simple to deploy, but also provides good configuration isolation.

Application domains offer good configuration and deployment isolation, but other means need to be utilized to protect services from each other and to provide isolation at run time. These run time isolation elements will be discussed later in this article.

There is one important caution that needs to be noted. WebSphere DataPower’s architecture is such that the computing resources of a device are shared equally across the deployed domains. As shown in Figure 2, the full complement of network, CPU, and memory is accessible by all components (and thus solutions) deployed within WebSphere DataPower. In addition, these resources cannot be partitioned or bound to any one solution.

Web Service Proxy

The Web Service Proxy is a configuration construct that encapsulates and ultimately defines how a WSDL-based service transaction is processed. A Web Service Proxy can contain one or more WSDL-based services, as shown in Figure 2.

Similar to the Application Domain, the Web Service Proxy can help isolate the configuration objects of one service from another. Although there are a great number of tuning and setup options within a Web Service Proxy, the majority of these options (we'll discuss exceptions shortly) are geared toward how a message or service is handled, and do very little to help with solution isolation or creating behavior boundaries. However, segregating services across multiple Web Service Proxies within a domain can greatly assist in administration and help isolate the configuration.

As mentioned earlier, there are a few key parameters at the Web Service Proxy level that can be used to apply some boundaries to a service:

  • How long an in-flight request is held in WebSphere DataPower waiting for activity, either from the back side service provider or front side consumer:
    • Back side timeout
    • Front side timeout
  • How long an idle persistent TCP connection is held before it is disconnected by WebSphere DataPower:
    • Back side persistent timeout
    • Front side persistent timeout

Both of these sets of parameters help to optimize resource usage within WebSphere DataPower during unexpected behavior from consumers or provider services. Keeping too many idle requests or TCP connections around for too long can cause a "pile-up" effect in a device, which can then impact other requests. The words "idle" and "too long" are relative and depend on expected response times from things like the back end provider. What is considered "idle for too long" to one service may be the expected latency for another. Thus, it is important that response time goals and actual latency times be defined and validated via performance tests. Only then will you be able to set these values to ensure you are properly defining the behavior boundary for the service, yet not constraining it to the degree that requests are inappropriately failed.

A similar caveat applies to the Web Service Proxy as it does to a domain. Although a Web Service Proxy can isolate the configuration of a service or set of services, it does not contribute greatly to solution isolation on its own or by default.

Front side handler

The front side handler (FSH) object is what exposes the services defined within WebSphere DataPower to the world. The FSH is defined to listen on a specific IP address and network port. It is configured to handle one of the many protocols that WebSphere DataPower supports. In this case, we will limit the discussion to the HTTP-based FSH. The FSH will interact with the consumers accepting inbound messages and pass them on to the Web Service Proxy and ultimately the service defined within.

The FSH operates at the protocol layer and does not provide the ability to constrain network resources such that no one application can monopolize the network stack. However, there can be some performance and operational isolation by assigning different inbound ports to different services via FSHs; depending on the load balancing and fail over capabilities of your network, you might find benefits by isolating sets of services to their own port.

There is one important WebSphere DataPower behavior link to the FSH that could greatly impact how you design your FSH objects. As noted in WebSphere DataPower support technote Multiple Web Service Proxy using the same Front Side Handler, there is an implication to assigning many services to a FSH. The issue stems from the fact that all services that share a FSH become "linked," and so any change within those linked configuration objects will cause a refresh of all the services that are linked. This can become an operational issue and needs to be considered in conjunction with your operational processes.

Service level monitoring

The service level monitoring (SLM) functionality of WebSphere DataPower appliances enable fine-grained behavioral control over the workloads processed by individual services. The SLM features enable you to control the workload, the credential executing it, and the time interval when it is being processed. Several methods are available to handle workload that exceeds the set thresholds, such as shaping (queuing the requests), or rejecting them outright.

When performing your isolation analysis, you must consider isolating services from each other to ensure a single service does not overwhelm the system resources of the appliance. WebSphere DataPower system resources are shared among the appliance, and proper care must be taken to ensure one service does not overuse its fair share. For example, you might run into a situation where one of your services fronted by WebSphere DataPower is experiencing delays in sending responses. In this situation, you need to protect the back end from being overwhelmed and ensure WebSphere DataPower remains stable. In an SLM policy, you can configure the number of connections permitted into WebSphere DataPower or specify the number of transactions to execute based on the back end latency. Based on real-time transactional data, WebSphere DataPower is able to trigger an SLM policy to ensure that the right amount of connections are opened or transactions are sent to the back end. The optimal type of workload for WebSphere DataPower consists of short transactions; when a large number of long running transactions are running on WebSphere DataPower, it requires more system resources such as CPU and memory to keep connections open; hence, tuning them with the right values will ensure a fair share of resources among services.

The configuration of SLM policies is not a trivial task. You need to conduct performance testing with various workloads to identify the appropriate configuration values for your SLM policies. The task of setting up and performing a performance test are outside the scope of this article, but your enterprise should have testing tools and methodologies that should be utilized. Furthermore, your SLM policies should be tuned on a regular basis through transactional analysis of production data, since workloads change over time.

XML manager

The XML manager controls a number of characteristics of a service defined in WebSphere DataPower. There is a default XML manager that is automatically attached to the Web Service Proxy and enforces some boundaries for each service defined to the proxy.

Although the default settings work well, it is important that you familiarize yourself with the XML manager. There are two key aspects of a service that the XML manager can help control:

  • Message size.
  • Document cache size.

Applying boundaries to both of these elements for a service will prevent any rogue messages or processing from over-consuming the shared memory on the device and impacting other services.

The XML Parser tab of a Web Service Proxy provides various message size constraints that should be reviewed and set appropriately for the service. As stated earlier, it is important to understand the NFRs for a service, and maximum message size should be part of those NFRs. If message size limits are set too aggressively, then valid service transactions can fail (this applies to both the request and response message); by contrast, setting them at safe values will permit large messages that require a large amount of memory or CPU to process.

The Document Cache tab of a Web Service Proxy provides various constraints related to caching documents that services can use within WebSphere DataPower. The same thought process applies here as it does to the XML parser limits.

Physical isolation

As expensive as it is, no isolation strategy is complete unless it includes physical isolation as part of its makeup.

There are cases within WebSphere DataPower for which physical isolation becomes the right answer. Obviously, there is the capacity element, but for the purpose of this article, it is assumed that capacity has already been taken into consideration.

As noted earlier, certain amounts of computing resources cannot be constrained in WebSphere DataPower. The computing power within WebSphere DataPower is laid out as a linear plane accessible to all workloads flowing through the device.

Some workloads are just too important to share resources with others; they need an extremely high level of availability, specialized configuration settings that encroach on other services in the device, or generate a level of resource monopolization (like non-streamable, large message-size, or high latency workloads). This is where physical isolation comes into play.

With physical isolation, you take a solution or set of solutions with similar characteristics (driven from the service categories discussed earlier) and isolate them onto their own set of WebSphere DataPower devices. Now you have the luxury of applying a purpose-configured WebSphere DataPower environment for those solutions.

It should be noted that even in this situation the importance of applying intradevice behavior boundaries is still present.

Although beyond the scope of this article, a WebSphere DataPower implementation that includes multiple physical devices (either for capacity, isolation, or failover reasons) requires a sound network and load balancing design. Various intelligent load balancing solutions exist that can be integrated to work with WebSphere DataPower; however, new application optimization features within WebSphere DataPower are highly suggested, as they provide a integrated and powerful solution for intelligent load balancing.


Putting it all together

A complete solution isolation strategy is built by building a decision tree that is based on the isolation criteria identified with the relevant mechanisms offered by the runtimes and products that are in use. You can implement isolation on one level, using one specific technique, or multiples.

One way of doing so is to create a flow that represents each criterion as a decision node. At the end of the decision tree is an indicator defining how a component is isolated. Assume, for example, that for an environment containing a number of WebSphere Application Server clusters across three nodes, application components need to be placed according to isolation requirements. The decision flow in Figure 3 shows an example of how the components can be distributed in a way that maps isolation criteria to the physical environment.

Figure 3. Example decision tree for isolation of components with WebSphere Application Server
Figure 3. Example decision tree for isolation of components with WebSphere Application Server

Click to see larger image

Figure 3. Example decision tree for isolation of components with WebSphere Application Server

Figure 3. Example decision tree for isolation of components with WebSphere Application Server

Figure 3 depicts a WebSphere Application Server cell with three nodes and three clusters defined on them. Isolation criteria (business criticality, availability, frequency of change, and so on) are represented by decision nodes so that an appropriate cluster can be identified for each deployed application.


Conclusion

This article introduced the concept of solution isolation as a way of increasing operational availability of service-oriented environments. Given that SOA increases the level of reuse among components, be it because they are used by multiple consumers or because they share the same underlying run time environment (or both), an increased risk of components negatively impacting each other exists. This risk can be mitigated by defining and deploying a solution isolation strategy. Examples for criteria that help outlining the right strategy for your environment were presented here, along with examples of isolation mechanisms for WebSphere Application Server and WebSphere DataPower.


Acknowledgements

The authors thank Rachel Reinitz, Greg Flurry, and Alexandre Polozoff for their help with this article.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=643640
ArticleTitle=Solution isolation in service-oriented environments
publish-date=03302011