M/M/1 Tail Latency Estimator

Percentiles for Wq (wait) and W (time-in-system) under M/M/1 assumptions.

home · tools · mm1 means

Inputs

requests / second
requests / second
Important: This is a math model, not a guarantee. Real systems have timeouts, burstiness, batching, priority queues, retries, non-exponential service, and multi-stage pipelines.

Outputs

Utilization ρ = λ/μ
Stability requires ρ < 1
Mean W (time in system)
Mean Wq (wait time)

Percentiles

pWq (wait)W (system)
How this is computed
For M/M/1 with λ<μ:
  • Sojourn time W is exponential with rate μ−λ: P(W > t) = e^{-(μ-λ)t}.
  • Wait time Wq has an atom at 0: P(Wq=0)=1-ρ, and for t>0: P(Wq > t)=ρ·e^{-(μ-λ)t}.
  • Thus percentiles use closed forms (with a flat region at 0 for Wq when p ≤ 1-ρ).