The degraded operations pitfall

Share this post:

Let’s consider an information system supporting a highly critical online activity. Critical means it cannot fail, and if it fails there must be a contingency infrastructure that allows operations to continue “reasonably” well, with zero or tolerable performance impact.

Someone trying to reduce acquisition cost decides having half of the processing capacity in the contingency infrastructure. Should you, as an expert sizer, feel comfortable with this decision or you shouldn’t?

To illustrate the problem let us consider that the workload is the SAP SD benchmark (see “Phases of the SAP Benchmark” entry in this blog). The simplified response time curve is in Figure 1, and it can be seen that the system supports 80000 users with a response time of 1 s.

Figure 1: The response time versus the number of users graph for the normal mode server (blue).

If we put this workload in the 50% capacity degraded model infrastructure, that is: the same population,  the same activity, but with 50% less capacity, what happens?  Look closely at the figure 2.

Figure 2: The response time graph for the normal mode server (blue) and for the contingency one (red) with 50% performance capacity.

With a 50% capacity the response time for 80000 users (1 s in the normal mode server) would be around 12 s! Would anyone consider this is an usable system?


How to successfully solve the above situation? Two lines of action are possible:

  • At the workload side: reduce the number of users, that is, propose a significant restriction in the number of users that can use the system in degraded mode.

  • At the capacity side: increase capacity in contingence, that is, increase the capacity  of the contingency server ideally to 100% of the normal mode.

Summarizing: when sizing degraded mode infrastructures you have to pay much attention to the response time, and not only to the bandwidth (maximum throughput),

 

Mirror: http://demystperf.blogspot.com

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Uncategorized Stories

Climbing the peak

Scenario In your usual sizing efforts you need to know the peak usage% for a certain workload and server(s). What is the right averaging time to capture this peak? Let’s see possible choices. Too long averaging time Averaging Time = 500 min. Peak usage% = 50%. Total loss of finer details. Does this really mean […]

Continue reading

The degraded operations pitfall

Let’s consider an information system supporting a highly critical online activity. Critical means it cannot fail, and if it fails there must be a contingency infrastructure that allows operations to continue “reasonably” well, with zero or tolerable performance impact. Someone trying to reduce acquisition cost decides having half of the processing capacity in the contingency […]

Continue reading

The upgrade sizing pitfall

The art of sizing is not exempt of pitfalls, and you must be aware of them if you want your sizing to be accurate and adequate. Let us talk about a typical scenario: a server upgrade. All the metrics in sizing are measures of throughput, and this has an implication you must take into account: […]

Continue reading