Why P99 Latency Metrics Are Unreliable for Low Traffic Workloads

Troubleshooting

Problem

Summary

The 99th percentile (P99) latency metric is widely used to measure system performance by capturing the latency experienced by the slowest 1% of requests. While it can be a powerful metric for high-throughput systems, P99 latency can be extremely unreliable for workloads with low traffic, such as during downtimes when the system handles only a few requests per second.

This article explains why P99 latency metrics are unreliable in such scenarios and suggests alternative approaches for monitoring and optimizing performance.

Applies to

Astra Serverless
Astra Classic
DSE (DataStax Enterprise)
All database systems

Symptoms

Observation of uncharacteristically high p99 and p999 latency metrics while the database is handling only few requests

Cause

This is a problem with statistics and not with the database itself. Due to the small sample size, the p99 and p999 latency metrics become increasingly unreliable during low traffic workloads.

Small Sample Size
- P99 metrics require a statistically significant sample to be meaningful. For a workload with only 5 requests per second:
  - Over a 1-minute interval, there are just 300 requests.
  - The 99th percentile applies to only 3 requests.
- This small number means that any single slow request disproportionately skews the P99 metric.
Outlier Sensitivity
- In low traffic scenarios, even one anomalously slow request (e.g., due to a transient issue) can dramatically inflate the P99 latency.
- This makes it difficult to distinguish between actual performance problems and random outliers.
Lack of Granularity
- At low traffic rates, P99 metrics may not accurately reflect typical system behavior. Instead, they highlight extreme cases that occur infrequently and may not impact overall user experience.
Irregular Request Patterns
- During downtimes, requests may arrive sporadically rather than in a steady stream. This irregularity can introduce variability in latency that is not representative of normal operations.

Example

Consider a system handling 5 requests per second during downtime. Over a 1-minute period:

300 total requests are processed.
The 99th percentile corresponds to the 297th fastest request.

If a single request experiences a network glitch or disk contention, its latency might jump to several seconds. This single anomaly would define the P99 latency, even if the other 296 requests were handled in milliseconds. As a result, the reported P99 latency becomes misleading.

Example illustrations

Here are some metrics taken from a database during low traffic workloads due to the New Year.

Read request load

Screenshot 2025-01-03 at 10.36.31.png

p99

Screenshot 2025-01-03 at 10.36.55.png

p50

Screenshot 2025-01-03 at 10.37.05.png

The negative correlation between request rate and latency increase is clearly observable and is amplified for the p99 metrics. The effect is most significant when the request rate is less than 5 requests per second.

Alternative Metrics for Low Traffic Scenarios

To monitor and optimize low-traffic workloads, consider these alternatives:

P90 or Median (P50) Latency
- These metrics are less sensitive to outliers and provide a more realistic view of typical system performance.
Latency Distribution
- Analyze the full distribution of request latencies over a longer time window to identify patterns and trends.
Request Success Rate
- Monitor the success rate at application level to ensure the system is meeting reliability goals, even if individual requests are slow.
Time-Weighted Aggregates
- Use time-weighted averages or moving percentiles over longer intervals to smooth out the impact of outliers.

Conclusion

P99 and p999 latency metrics can be misleading and unreliable in low traffic scenarios due to small sample sizes, outlier sensitivity, and irregular request patterns. For systems with low throughput, it is better to rely on alternative metrics such as median latency, latency distribution, and success rates to gain meaningful insights into performance.

When request loads drop below 5 requests per second, p99 and p999 latency metrics are effectively meaningless.

By tailoring your monitoring strategy to the characteristics of your workload, you can achieve more accurate performance evaluations and make better-informed optimization decisions.

Last Reviewed Date: 7 January 2025

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSQEVRE","label":"IBM Astra Streaming"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka0Ui0000003jQPIAY

Was this topic helpful?

Document Information

Modified date:
30 January 2026

UID

ibm17258480

Tips