Why p99 Latency Matters More Than p50
November 2024 · 5 min read
Most performance dashboards show average latency. Average latency is almost useless. Here's why tail latencies — p95, p99 — are the metrics you should care about.
The problem with averages
Imagine 100 requests: 99 complete in 10ms, one takes 10 seconds. The average is ~109ms — a number that looks fine and hides a catastrophic experience for 1% of users.
What percentiles tell you
p50** (median): Half your requests are faster, half are slower. Useful for capacity planning.:
p95: 95% of users get this or better. A good SLO target for most APIs.
p99: 1% of users get this or worse. This is where infrastructure problems hide.
p99.9: One in a thousand. Matters at scale — a p99.9 of 10s means thousands of poor experiences per day on a busy service.
Why tail latencies spike
Common causes of p99 spikes:
Lock contention: A slow database query that blocks others under concurrency
GC pauses: Garbage collection in JVM or .NET languages
Cold starts: Unwarmed cache or connection pool exhaustion
Resource limits: CPU throttling, memory pressure, or disk I/O saturation
How PerfMonk helps
PerfMonk's AI analysis automatically identifies the cause of tail latency spikes — not just that p99 is high, but why. Contact us to learn more.