Build and employ a threshold criteria for critical cloud components

More and more businesses and government agencies migrate applications to the cloud to save costs and physical space through virtualization. Unfortunately, many organizations still employ a "reactive" response when a failure occurs instead of taking more prudent proactive step: The creation of threshold criteria for critical cloud components. The author illustrates cloud-specific threshold criteria and scenarios of what proactive actions can be taken when failures happen.

Share:

Judith M. Myerson, Systems Engineer and Architect

Judith M. Myerson is a systems architect and engineer. Her areas of interest include enterprise-wide systems, middleware technologies, database technologies, cloud computing, threshold policies, industries, network management, security, RFID technologies, presentation management, and project management.


developerWorks Contributing author
        level

22 February 2013

Also available in Chinese Russian

Threshold performance is the sum of all threshold levels based on threshold criteria for critical cloud components. Cloud threshold performance is critical in determining whether a cloud service will perform at the guaranteed level of service availability in an external service level agreement (SLA).

To ensure common understanding of what threshold criteria are, a threshold policy criteria should be crafted to outline what cloud computing service computers and providers should do and can save providers countless hours of management time.

Critical cloud components to consider when developing threshold criteria are:

  • Resource threshold: Ensures resource consumption is balanced dynamically for applications in the cloud below or at the threshold level.
  • User threshold: Ensures users can access concurrently the application up to the limit specified in a user license from the provider below or at the threshold level.
  • Data request threshold: Ensures data requests to the application can be processed immediately below or at the threshold level.
  • Response threshold: Ensures the application responds to a user or a data request in a timely manner below or at the threshold level.
  • Replicated service instance threshold: Ensures service instances are replicated that can survive host failures at in the range of acceptable threshold levels. The maximum is determined by the availability of resources.
  • Virtual machine threshold: Ensures the number of virtual machines running on the same hose is below or at the threshold level.
  • Packet-switched network latency threshold: Ensures the latency sending a packet to the destination and back over the external network is below or at the threshold level. The latency includes queuing delay to hold multiple packets from different sources.

The first three of these were described in previous articles.

Each threshold criteria is primarily shaped by:

  • What type of cloud service the provider hosts — Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS).
  • How much control the cloud service consumer has over the operating systems, hardware, and software.
  • And whether the type of industry the consumer represents is broad, such as retail, energy and utilities, financial markets, health care, or chemical and petroleum.

To satisfy consumer demand to review threshold criteria, all providers should provide consumers with copies of these criteria. The providers should encourage consumers to send questions that might need to be resolved or require negotiation before the consumer rents or subscribes to a cloud service type.

User-control impacts on threshold criteria

Following are SaaS control impacts that influence threshold criteria focus.

SaaS user: The only control an end user has is to access the provider's application from a desktop, laptop, or mobile. The user does not set the threshold levels.

SaaS provider: At a minimum, the provider controls operating systems, hardware, network infrastructure, application upgrades, and patches. The provider sets the user, data requests, resource and response threshold levels and limits negotiation with the user on these threshold levels.

Next are PaaS control impacts that influence threshold criteria focus.

PaaS developer: The developer controls all the applications found in a full business life cycle created and hosted by independent software vendors, startups, or units of large businesses. The developer builds, deploys, and runs, say, a custom retail management application, and manages upgrades and patches to all functionalities of this application. As part of the life cycle, the developer uses spreadsheets, word processors, backups, billing, payroll processing, and invoicing.

The developer in negotiation with the provider sets the user, resource, response and data requests threshold levels to check the performance of applications on the PaaS. The developer can change the threshold levels for each SaaS application on the PaaS to meet the requirements of SaaS users as PaaS co-residents.

PaaS provider: At a minimum, the provider controls operating systems, hardware, network infrastructure, and resource management. The provider is flexible in negotiating with the developer on setting threshold levels.

Here are IaaS control impacts that influence threshold criteria focus.

IaaS infrastructure and network specialists: The specialist controls the operating systems, network equipment, and deployed applications at the virtual machine level. The infrastructure specialist can scale up or down virtual servers or blocks of storage area. The specialist sets virtual machine and network latency threshold levels.

IaaS provider: At a minimum, the provider controls the infrastructure of traditional computing resources in the cloud environment. They build or upgrade infrastructure to provide better performing virtual machines. The provider is flexible in negotiating with the developer on setting threshold levels.


SaaS-specific threshold criteria

The SaaS threshold criteria focuses on the measurable aspects of managing thresholds on access to specific applications rented to consumers whether they are private individuals, businesses, or government agencies.

Following are examples of what tasks should be included in a SaaS-specific threshold criteria for proactive use of four threshold criteria: user, resource, response and data requests.

Tasks

At a minimum, the SaaS users should be allowed to:

  • Access the SaaS application up to the maximum number of access per user at below or at the threshold level.
  • Update records based on the roles the users are assigned below or at the threshold level.

Only the provider can:

  • Set all threshold criteria.
  • Purchase a software upgrade license.
  • Manage patches to SaaS applications.

Threshold scenarios

The provider limits negotiation with users over:

  • User threshold criteria on the maximum number of access per user depending on the user roles.
  • Resource threshold criteria on dynamic resource consumption.
  • Data requests criteria on the maximum number of data requests per user and the maximum number of queues to hold these requests.
  • Response threshold criteria on optimal response time for an application to respond to a user's data request.

The provider does not negotiate with users over:

  • Virtual machine threshold criteria.
  • Network latency threshold criteria.
  • Replicated services threshold criteria.

Let's take a look at a SaaS scenario without a threshold to illustrate this knowledge.


SaaS without threshold scenario

A company's on-premise retail application was successfully migrated as SaaS application to an external provider hosting a multitenancy environment in data center regions in the United States about six months before the Christmas shopping season. All users were happy. Threshold criteria policies were not in place; they were not necessary.

Unlimited number of users can access the SaaS application with no resource consumption issues. Each user sent data requests to the same application. When the data request arrived at a queue, it leaves the queue very quickly for processing by the application. The application sent the user an expected response very fast.

Then one day in the middle of the Christmas shopping season, the system at a data center crashed due to:

  • Too many users. The actual increase in the users attempting to access the application far exceeded the maximum limit specified in the user license.
  • Data requests queue overflow. Only one queue was available for all data requests. When the queue was filled up, there was nowhere for more data requests to go.
  • Insufficient resource for consumption. Resources were consumed faster than they were released.
  • Slow application response. The application hung in the middle or slowed down. Going from the state of a component in the application to the state of a different component was not complete. The resources were consumed faster than they were released.

Following are proactive actions the provider can take by implementing the user, data requests and response threshold criteria on skipping the damage and fixing the problem.

Skip the damage

To skip the damage, plan ahead on preparing alerts to the users that the SaaS application performance falls below the level of guaranteed service availability in the SLA. These alerts should include information on:

  • Whether the number of users concurrently accessing the application has exceeded the maximum limit allowed in the user license.
  • Whether the number of data requests for each user has exceeded the limit allowed in the user license.
  • Whether there are sufficient resources for consumption.
  • How much slower the application response has become.

Fix the problem

To fix the problem, the provider limits negotiation with users on setting:

  • User threshold to ensure users concurrently accessing the application below or at the threshold level.
  • Data requests threshold to ensure fast processing of data requests to the application at below or at the threshold level.
  • Response threshold to ensure fast response from an application below or at the threshold level.
  • Resource threshold to ensure resources are balanced dynamically below or at the threshold level.

The provider should use a stable numerical analysis method to calculate approximate threshold levels.


PaaS-specific threshold criteria

The PaaS threshold criteria focuses on the measurable aspects of managing threshold criteria for all phases of application life cycle, from design to deployment.

Following are examples of what tasks should be included in a PaaS-specific threshold criteria for proactive use of threshold criteria.

Tasks

PaaS developers can:

  • Set the four threshold criteria — user, resource, data requests and response.
  • Set replicated service instance threshold criteria.
  • Build, deploy, and run applications based on threshold criteria.
  • Flexibly customize their platforms to react to local market conditions that could impact threshold criteria.

At a minimum, only the provider can:

  • Set threshold criteria.
  • Run system applications.
  • Run virtual machines.
  • Access the traditional computing infrastructure underlying virtual machines.

Threshold scenarios

The provider allows PaaS developers to set user, resource, data requests and response threshold criteria up to the maximum allowed by the provider for the entire application life cycle, from design to deployment:

The provider also allows developer to develop service-pooling applications that can survive host failures at in the range of acceptable replicated service instance threshold levels. The maximum is determined by the availability of resources allocated to the developers.

The developers can apply numerical analysis methods to test the stability of approximate threshold levels they set.

The provider will not negotiate with developers over:

  • Virtual machine threshold requirements
  • Network latency threshold requirements.

Let's take a look at a PaaS scenario without a threshold to illustrate this knowledge.


PaaS without threshold scenario

A "resource optimization" application provided by either the developer or the provider fails. This failure causes all PaaS platforms hosted by the same provider to grind slowly to a complete halt. It could have been caused by failure to set resource and replicated service instance threshold criteria.

Without the resource threshold criteria, there is no way of knowing if resource consumption is balanced dynamically. Resources are consumed faster than they are released. When resource consumption reaches the point where remaining resources are no longer available for consumption, the system crashes. This may be due, for instance, to a numerical analysis method that causes endless loops of resource consumption, with each loop consuming resources more and more until there are no more resources to consume.

Without the replicated service threshold criteria, there is no way of knowing if service instances are replicated to survive host failures. The developer or the provider does not identify the failures.

Following are proactive actions the developer can take by implementing the resource and replicated service instance threshold criteria (up to the maximum allowed by the provider) on skipping the damage and fixing the problem.

Skip the damage

To detect the damage, plan ahead on preparing alerts to the developer that the PaaS performance falls below the level of guaranteed service availability in the SLA. These alerts should include information on:

  • How fast the resources are consumed than they are released in running the applications on the platform.
  • How many service instances have been replicated (probably none).

Fix the problem

To fix the problem, the developer uses stable numerical analysis methods to calculate resource threshold levels, compiles a list of results, and then sets dynamic resource consumption below or at the threshold level selected from the list.

The developer creates multiple redundant copies so they can be used at healthy data centers in case of an application or platform failure. To achieve this, he decomposes the components of a system into independent pools (see Resources). The developer uses numerical analysis methods to set replicated service instance below or at threshold level.


IaaS-specific threshold criteria

The IaaS threshold criteria focuses on the measurable aspects of managing threshold criteria on virtual machines that sit atop the infrastructure of traditional computing resources underlying the virtual machines.

Following are examples of what tasks should be included in a IaaS-specific threshold criteria for proactive use of threshold criteria.

Tasks

The IaaS specialists can:

  • Set virtual machine threshold criteria.
  • Set packet-switched network latency threshold criteria.

The specialists are allowed to:

  • Develop, manage, and access virtual machines below or at the threshold level.
  • Authorize PaaS developers to develop applications on the PaaS atop virtual machines on the same host.
  • Scan virtual machines for vulnerabilities.

At a minimum, only the provider can access the infrastructure of traditional computing resources underlying the virtual machines.

Threshold scenarios

The IaaS specialist sets the network latency threshold level and ensures the network latency over virtual machines on the same host remains low below or at this threshold level. The specialist ensures all delays, including queuing and processing, are accounted for when computing the threshold level.

Queuing delays occur when a network gateway receives an unexpected increase in the number of multiple packets from multiple sources heading towards the same destination.

High processing delays occur when it takes a gateway longer to determine what to do with a newly received packet.

Another delay type that can cause increased latency (and noticeable jitters) is the buffer bloat. As the name implies, buffering of packets creates an excess buffering of packets inside the network.

The provider does not allow IaaS specialists to develop thresholds on replicated services through service-pooling applications.

Let's take a look at an IaaS scenario without a threshold to illustrate this knowledge.


IaaS without threshold scenario

Virtual machines fail due to a lack of additional resources needed for consumption at high I/O points. Or they slow down to a crawl due to very high network latency between virtual machines.

Following are proactive actions the IaaS specialist can take by implementing the virtual machine and network latency thresholds on skipping the damage and fixing the problem.

Skip the damage

To skip the damage, plan ahead on preparing capacity studies on resource consumption and network latency over virtual machines. The studies should show the numerical results that the performance of virtual machines at high I/O points would most likely stay at or above the level of guaranteed service availability in the SLA.

Fix the problem

To fix the problem, the IaaS specialist uses the information from capacity studies to determine optimal resource consumption and network latency below or at the threshold levels. The specialist should periodically check the changes in virtual machine capacity on the same host.


In conclusion

Determining critical cloud threshold performance levels requires plenty pre-planning to resolve the issues with setting up threshold levels based on the threshold criteria for user, resource, data requests, response, replicated service instance, virtual machines and packet-switched network latency. Providers must communicate with users, developers and infrastructure specialist on how much a consumer should have, how user controls could impact threshold criteria, what tasks and threshold scenarios should be included for threshold criteria specific to the SaaS, PaaS, and IaaS and how to skip the damage and fix the problem should the cloud service fail. Like with everything else in life, the most important of all a cloud consumer should do is to get a copy of threshold policy criteria negotiated with the provider.

Resources

Learn

Get products and technologies

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Cloud computing on developerWorks


  • Bluemix Developers Community

    Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.

  • developerWorks Labs

    Experiment with new directions in software development.

  • DevOps Services

    Software development in the cloud. Register today to create a project.

  • Try SoftLayer Cloud

    Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing
ArticleID=858908
ArticleTitle=Build and employ a threshold criteria for critical cloud components
publish-date=02222013