Engineering

Why p99 Latency Matters More Than p50

November 2024 · 5 min read

Most performance dashboards show average latency. Average latency is almost useless. Here's why tail latencies — p95, p99 — are the metrics you should care about.

The problem with averages

Imagine 100 requests: 99 complete in 10ms, one takes 10 seconds. The average is ~109ms — a number that looks fine and hides a catastrophic experience for 1% of users.

What percentiles tell you

p50** (median): Half your requests are faster, half are slower. Useful for capacity planning.:

p95: 95% of users get this or better. A good SLO target for most APIs.

p99: 1% of users get this or worse. This is where infrastructure problems hide.

p99.9: One in a thousand. Matters at scale — a p99.9 of 10s means thousands of poor experiences per day on a busy service.

Why tail latencies spike

Common causes of p99 spikes:

Lock contention: A slow database query that blocks others under concurrency

GC pauses: Garbage collection in JVM or .NET languages

Cold starts: Unwarmed cache or connection pool exhaustion

Resource limits: CPU throttling, memory pressure, or disk I/O saturation

How PerfMonk helps

PerfMonk's AI analysis automatically identifies the cause of tail latency spikes — not just that p99 is high, but why. Contact us to learn more.

Ready to ship faster?

See how PerfMonk can automate your performance testing.

Book a demo