Threshold performance is the sum of all threshold levels based on threshold criteria for critical cloud components. Cloud threshold performance is critical in determining whether a cloud service will perform at the guaranteed level of service availability in an external service level agreement (SLA).
To ensure common understanding of what threshold criteria are, a threshold policy criteria should be crafted to outline what cloud computing service computers and providers should do and can save providers countless hours of management time.
Critical cloud components to consider when developing threshold criteria are:
- Resource threshold: Ensures resource consumption is balanced dynamically for applications in the cloud below or at the threshold level.
- User threshold: Ensures users can access concurrently the application up to the limit specified in a user license from the provider below or at the threshold level.
- Data request threshold: Ensures data requests to the application can be processed immediately below or at the threshold level.
- Response threshold: Ensures the application responds to a user or a data request in a timely manner below or at the threshold level.
- Replicated service instance threshold: Ensures service instances are replicated that can survive host failures at in the range of acceptable threshold levels. The maximum is determined by the availability of resources.
- Virtual machine threshold: Ensures the number of virtual machines running on the same hose is below or at the threshold level.
- Packet-switched network latency threshold: Ensures the latency sending a packet to the destination and back over the external network is below or at the threshold level. The latency includes queuing delay to hold multiple packets from different sources.
The first three of these were described in previous articles.
Each threshold criteria is primarily shaped by:
- What type of cloud service the provider hosts — Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS).
- How much control the cloud service consumer has over the operating systems, hardware, and software.
- And whether the type of industry the consumer represents is broad, such as retail, energy and utilities, financial markets, health care, or chemical and petroleum.
To satisfy consumer demand to review threshold criteria, all providers should provide consumers with copies of these criteria. The providers should encourage consumers to send questions that might need to be resolved or require negotiation before the consumer rents or subscribes to a cloud service type.
User-control impacts on threshold criteria
Following are SaaS control impacts that influence threshold criteria focus.
SaaS user: The only control an end user has is to access the provider's application from a desktop, laptop, or mobile. The user does not set the threshold levels.
SaaS provider: At a minimum, the provider controls operating systems, hardware, network infrastructure, application upgrades, and patches. The provider sets the user, data requests, resource and response threshold levels and limits negotiation with the user on these threshold levels.
Next are PaaS control impacts that influence threshold criteria focus.
PaaS developer: The developer controls all the applications found in a full business life cycle created and hosted by independent software vendors, startups, or units of large businesses. The developer builds, deploys, and runs, say, a custom retail management application, and manages upgrades and patches to all functionalities of this application. As part of the life cycle, the developer uses spreadsheets, word processors, backups, billing, payroll processing, and invoicing.
The developer in negotiation with the provider sets the user, resource, response and data requests threshold levels to check the performance of applications on the PaaS. The developer can change the threshold levels for each SaaS application on the PaaS to meet the requirements of SaaS users as PaaS co-residents.
PaaS provider: At a minimum, the provider controls operating systems, hardware, network infrastructure, and resource management. The provider is flexible in negotiating with the developer on setting threshold levels.
Here are IaaS control impacts that influence threshold criteria focus.
IaaS infrastructure and network specialists: The specialist controls the operating systems, network equipment, and deployed applications at the virtual machine level. The infrastructure specialist can scale up or down virtual servers or blocks of storage area. The specialist sets virtual machine and network latency threshold levels.
IaaS provider: At a minimum, the provider controls the infrastructure of traditional computing resources in the cloud environment. They build or upgrade infrastructure to provide better performing virtual machines. The provider is flexible in negotiating with the developer on setting threshold levels.
SaaS-specific threshold criteria
The SaaS threshold criteria focuses on the measurable aspects of managing thresholds on access to specific applications rented to consumers whether they are private individuals, businesses, or government agencies.
Following are examples of what tasks should be included in a SaaS-specific threshold criteria for proactive use of four threshold criteria: user, resource, response and data requests.
At a minimum, the SaaS users should be allowed to:
- Access the SaaS application up to the maximum number of access per user at below or at the threshold level.
- Update records based on the roles the users are assigned below or at the threshold level.
Only the provider can:
- Set all threshold criteria.
- Purchase a software upgrade license.
- Manage patches to SaaS applications.
The provider limits negotiation with users over:
- User threshold criteria on the maximum number of access per user depending on the user roles.
- Resource threshold criteria on dynamic resource consumption.
- Data requests criteria on the maximum number of data requests per user and the maximum number of queues to hold these requests.
- Response threshold criteria on optimal response time for an application to respond to a user's data request.
The provider does not negotiate with users over:
- Virtual machine threshold criteria.
- Network latency threshold criteria.
- Replicated services threshold criteria.
Let's take a look at a SaaS scenario without a threshold to illustrate this knowledge.
SaaS without threshold scenario
A company's on-premise retail application was successfully migrated as SaaS application to an external provider hosting a multitenancy environment in data center regions in the United States about six months before the Christmas shopping season. All users were happy. Threshold criteria policies were not in place; they were not necessary.
Unlimited number of users can access the SaaS application with no resource consumption issues. Each user sent data requests to the same application. When the data request arrived at a queue, it leaves the queue very quickly for processing by the application. The application sent the user an expected response very fast.
Then one day in the middle of the Christmas shopping season, the system at a data center crashed due to:
- Too many users. The actual increase in the users attempting to access the application far exceeded the maximum limit specified in the user license.
- Data requests queue overflow. Only one queue was available for all data requests. When the queue was filled up, there was nowhere for more data requests to go.
- Insufficient resource for consumption. Resources were consumed faster than they were released.
- Slow application response. The application hung in the middle or slowed down. Going from the state of a component in the application to the state of a different component was not complete. The resources were consumed faster than they were released.
Following are proactive actions the provider can take by implementing the user, data requests and response threshold criteria on skipping the damage and fixing the problem.
Skip the damage
To skip the damage, plan ahead on preparing alerts to the users that the SaaS application performance falls below the level of guaranteed service availability in the SLA. These alerts should include information on:
- Whether the number of users concurrently accessing the application has exceeded the maximum limit allowed in the user license.
- Whether the number of data requests for each user has exceeded the limit allowed in the user license.
- Whether there are sufficient resources for consumption.
- How much slower the application response has become.
Fix the problem
To fix the problem, the provider limits negotiation with users on setting:
- User threshold to ensure users concurrently accessing the application below or at the threshold level.
- Data requests threshold to ensure fast processing of data requests to the application at below or at the threshold level.
- Response threshold to ensure fast response from an application below or at the threshold level.
- Resource threshold to ensure resources are balanced dynamically below or at the threshold level.
The provider should use a stable numerical analysis method to calculate approximate threshold levels.
PaaS-specific threshold criteria
The PaaS threshold criteria focuses on the measurable aspects of managing threshold criteria for all phases of application life cycle, from design to deployment.
Following are examples of what tasks should be included in a PaaS-specific threshold criteria for proactive use of threshold criteria.
PaaS developers can:
- Set the four threshold criteria — user, resource, data requests and response.
- Set replicated service instance threshold criteria.
- Build, deploy, and run applications based on threshold criteria.
- Flexibly customize their platforms to react to local market conditions that could impact threshold criteria.
At a minimum, only the provider can:
- Set threshold criteria.
- Run system applications.
- Run virtual machines.
- Access the traditional computing infrastructure underlying virtual machines.
The provider allows PaaS developers to set user, resource, data requests and response threshold criteria up to the maximum allowed by the provider for the entire application life cycle, from design to deployment:
The provider also allows developer to develop service-pooling applications that can survive host failures at in the range of acceptable replicated service instance threshold levels. The maximum is determined by the availability of resources allocated to the developers.
The developers can apply numerical analysis methods to test the stability of approximate threshold levels they set.
The provider will not negotiate with developers over:
- Virtual machine threshold requirements
- Network latency threshold requirements.
Let's take a look at a PaaS scenario without a threshold to illustrate this knowledge.
PaaS without threshold scenario
A "resource optimization" application provided by either the developer or the provider fails. This failure causes all PaaS platforms hosted by the same provider to grind slowly to a complete halt. It could have been caused by failure to set resource and replicated service instance threshold criteria.
Without the resource threshold criteria, there is no way of knowing if resource consumption is balanced dynamically. Resources are consumed faster than they are released. When resource consumption reaches the point where remaining resources are no longer available for consumption, the system crashes. This may be due, for instance, to a numerical analysis method that causes endless loops of resource consumption, with each loop consuming resources more and more until there are no more resources to consume.
Without the replicated service threshold criteria, there is no way of knowing if service instances are replicated to survive host failures. The developer or the provider does not identify the failures.
Following are proactive actions the developer can take by implementing the resource and replicated service instance threshold criteria (up to the maximum allowed by the provider) on skipping the damage and fixing the problem.
Skip the damage
To detect the damage, plan ahead on preparing alerts to the developer that the PaaS performance falls below the level of guaranteed service availability in the SLA. These alerts should include information on:
- How fast the resources are consumed than they are released in running the applications on the platform.
- How many service instances have been replicated (probably none).
Fix the problem
To fix the problem, the developer uses stable numerical analysis methods to calculate resource threshold levels, compiles a list of results, and then sets dynamic resource consumption below or at the threshold level selected from the list.
The developer creates multiple redundant copies so they can be used at healthy data centers in case of an application or platform failure. To achieve this, he decomposes the components of a system into independent pools (see Resources). The developer uses numerical analysis methods to set replicated service instance below or at threshold level.
IaaS-specific threshold criteria
The IaaS threshold criteria focuses on the measurable aspects of managing threshold criteria on virtual machines that sit atop the infrastructure of traditional computing resources underlying the virtual machines.
Following are examples of what tasks should be included in a IaaS-specific threshold criteria for proactive use of threshold criteria.
The IaaS specialists can:
- Set virtual machine threshold criteria.
- Set packet-switched network latency threshold criteria.
The specialists are allowed to:
- Develop, manage, and access virtual machines below or at the threshold level.
- Authorize PaaS developers to develop applications on the PaaS atop virtual machines on the same host.
- Scan virtual machines for vulnerabilities.
At a minimum, only the provider can access the infrastructure of traditional computing resources underlying the virtual machines.
The IaaS specialist sets the network latency threshold level and ensures the network latency over virtual machines on the same host remains low below or at this threshold level. The specialist ensures all delays, including queuing and processing, are accounted for when computing the threshold level.
Queuing delays occur when a network gateway receives an unexpected increase in the number of multiple packets from multiple sources heading towards the same destination.
High processing delays occur when it takes a gateway longer to determine what to do with a newly received packet.
Another delay type that can cause increased latency (and noticeable jitters) is the buffer bloat. As the name implies, buffering of packets creates an excess buffering of packets inside the network.
The provider does not allow IaaS specialists to develop thresholds on replicated services through service-pooling applications.
Let's take a look at an IaaS scenario without a threshold to illustrate this knowledge.
IaaS without threshold scenario
Virtual machines fail due to a lack of additional resources needed for consumption at high I/O points. Or they slow down to a crawl due to very high network latency between virtual machines.
Following are proactive actions the IaaS specialist can take by implementing the virtual machine and network latency thresholds on skipping the damage and fixing the problem.
Skip the damage
To skip the damage, plan ahead on preparing capacity studies on resource consumption and network latency over virtual machines. The studies should show the numerical results that the performance of virtual machines at high I/O points would most likely stay at or above the level of guaranteed service availability in the SLA.
Fix the problem
To fix the problem, the IaaS specialist uses the information from capacity studies to determine optimal resource consumption and network latency below or at the threshold levels. The specialist should periodically check the changes in virtual machine capacity on the same host.
Determining critical cloud threshold performance levels requires plenty pre-planning to resolve the issues with setting up threshold levels based on the threshold criteria for user, resource, data requests, response, replicated service instance, virtual machines and packet-switched network latency. Providers must communicate with users, developers and infrastructure specialist on how much a consumer should have, how user controls could impact threshold criteria, what tasks and threshold scenarios should be included for threshold criteria specific to the SaaS, PaaS, and IaaS and how to skip the damage and fix the problem should the cloud service fail. Like with everything else in life, the most important of all a cloud consumer should do is to get a copy of threshold policy criteria negotiated with the provider.
- Get more information about building a service pooling application in "Build a cloud failover policy".
- The Practical Guide for Service Level Agreements, published by the Cloud Standards Custom Council (CSCC), highlights the critical elements of a service level agreement for cloud computing and provides guidance on what to expect and what to be aware of when negotiating an SLA.
- Read more about cloud metrics in the Report on Cloud Computing to the OSG Steering Committee, written by the Cloud Computing Working in the Spec Open Systems Group.
- Learn more about cloud computing technologies at cloud at developerWorks.
- Follow developerWorks on Twitter.
- Watch developerWorks demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Access IBM SmartCloud Enterprise.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Dig deeper into Cloud computing on developerWorks
Exclusive tools to build your next great app. Learn more.
Crazy about Cloud? Sign up for our monthly newsletter and the latest cloud news.
Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.