API Rate Limiting Strategies: Token Bucket, Leaky Bucket, and Sliding Window

Every production API needs rate limiting. Without it, a single abusive client can saturate your database connections, exhaust your worker pool, and degrade service for every other user. The question is not whether to rate limit — it is which algorithm to use. The four dominant strategies — token bucket, leaky bucket, fixed window, and sliding window — each make different trade-offs between accuracy, memory, and burst allowance.
Token Bucket: Controlled Bursts
The token bucket algorithm allows bursts up to a configurable capacity while enforcing a steady long-term rate. Tokens are added at a fixed rate — typically once per second. Each request consumes one token. If the bucket is empty, the request is denied.
// Token Bucket implementation with Redis
import { createClient } from 'redis'
const client = createClient()
async function tokenBucket(userId: string): Promise<boolean> {
const key = `rate-limit:${userId}`
const now = Date.now()
const capacity = 10 // max burst
const refillRate = 1 // tokens per second
const refillInterval = 1000 / refillRate
const result = await client.eval(`
local key = KEYS[1]
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refillRate = tonumber(ARGV[3])
local data = redis.call('HMGET', key, 'tokens', 'lastRefill')
local tokens = tonumber(data[1]) or capacity
local lastRefill = tonumber(data[2]) or now
local elapsed = math.max(0, now - lastRefill)
local newTokens = math.min(capacity, tokens + math.floor(elapsed * refillRate / 1000))
if newTokens >= 1 then
redis.call('HMSET', key, 'tokens', newTokens - 1, 'lastRefill', now)
redis.call('EXPIRE', key, 60)
return 1
end
return 0
`, [key], [now, capacity, refillRate])
return result === 1
}
The token bucket is ideal for APIs where occasional bursts are legitimate — a user refreshing a dashboard or batch uploading files. It permits short-term spikes while protecting long-term average throughput.
Leaky Bucket: Smoothing Traffic
The leaky bucket enforces a strict processing rate. Requests enter a queue and are processed at a fixed rate, regardless of incoming burst size. If the queue overflows, excess requests are dropped.
// Leaky bucket with timeout management
class LeakyBucket {
private queue: Array<() => Promise<void>> = []
private readonly capacity: number
private readonly leakRate: number // requests per second
private processing = false
constructor(capacity: number, leakRate: number) {
this.capacity = capacity
this.leakRate = leakRate
}
async add(request: () => Promise<void>): Promise<boolean> {
if (this.queue.length >= this.capacity) {
return false // overflow, request dropped
}
this.queue.push(request)
this.process()
return true
}
private async process() {
if (this.processing || this.queue.length === 0) return
this.processing = true
while (this.queue.length > 0) {
const request = this.queue.shift()!
const start = Date.now()
try {
await request()
} catch {}
// Enforce leak rate
const elapsed = Date.now() - start
const minInterval = 1000 / this.leakRate
if (elapsed < minInterval) {
await new Promise((r) => setTimeout(r, minInterval - elapsed))
}
}
this.processing = false
}
}
Use the leaky bucket for downstream integrations with strict throughput limits — webhook delivery, batch email sending, or third-party API proxies where exceeding the limit causes hard failures.
Fixed Window: Simple but Bumpy
Fixed window divides time into discrete buckets (e.g., 100 requests per minute). The implementation is trivial but suffers from boundary spikes — a client can send 100 requests at 10:59:59 and 100 more at 11:00:00, effectively doubling throughput at window edges.
// Fixed window rate limiter
async function fixedWindow(userId: string, limit: number, windowMs: number) {
const key = `fixed:${userId}:${Math.floor(Date.now() / windowMs)}`
const count = await client.incr(key)
if (count === 1) {
await client.expire(key, Math.ceil(windowMs / 1000))
}
return count <= limit
}
Fixed window is acceptable for non-critical rate limiting like login attempt throttling where occasional boundary bursts are tolerable. It uses minimal memory — one key per user per window.
Sliding Window: Smooth and Accurate
Sliding window rate limiting tracks requests across a rolling time window, eliminating the boundary spike problem. The log approach stores timestamps; the counter approach approximates using weighted counts of the current and previous windows.
// Sliding window log with sorted sets
async function slidingWindow(userId: string, limit: number, windowMs: number) {
const key = `sliding:${userId}`
const now = Date.now()
const windowStart = now - windowMs
// Remove expired timestamps
await client.zRemRangeByScore(key, 0, windowStart)
// Count requests in current window
const count = await client.zCard(key)
if (count >= limit) {
return false // rate limited
}
// Add current request
await client.zAdd(key, { score: now, value: `${now}` })
await client.expire(key, Math.ceil(windowMs / 1000))
return true
}
For high-throughput APIs, use the sliding window counter approximation instead of logs to avoid O(n) memory per user.
Distributed Rate Limiting with Redis
In multi-instance deployments, rate limit state must be shared. Redis provides atomic operations, sorted sets, and Lua scripting — everything needed for distributed rate limiting.
// Distributed rate limit middleware
import { Request, Response, NextFunction } from 'express'
function rateLimit(options: { limit: number; windowMs: number; algorithm: string }) {
return async (req: Request, res: Response, next: NextFunction) => {
const userId = req.ip || req.headers['x-user-id'] as string
let allowed: boolean
switch (options.algorithm) {
case 'token-bucket':
allowed = await tokenBucket(userId)
break
case 'sliding-window':
allowed = await slidingWindow(userId, options.limit, options.windowMs)
break
default:
allowed = await fixedWindow(userId, options.limit, options.windowMs)
}
if (!allowed) {
return res.status(429).json({
error: 'Too many requests',
retryAfter: Math.ceil(options.windowMs / 1000),
})
}
next()
}
}
Choosing Your Strategy
Match the algorithm to your traffic pattern. Token bucket for APIs serving interactive dashboards and authenticated users. Leaky bucket for downstream proxy integrations. Fixed window for simple, non-critical throttling. Sliding window for billing-grade rate limits where accuracy matters at every boundary.
Implementation Guidance
Rate limiting is not a one-size-fits-all problem. Start with the simplest algorithm that meets your requirements, instrument it with monitoring, and iterate as traffic patterns reveal edge cases.
At SoniNow, we implement production rate limiting across applications serving millions of requests daily. Our web development services include API architecture, rate limit strategy design, and Redis-based backend optimization.
Protect your API from abuse. Partner with SoniNow to implement the right rate limiting strategy for your application.
Related Insights

API Security Best Practices: Authentication, Rate Limiting, and Input Validation
Best practices for securing APIs including API key management, OAuth token validation, rate limiting, input sanitization, CORS configuration, and request signing.

Authentication Patterns in Modern Web Apps: JWT, OAuth, and Session Management
A guide to authentication patterns for web applications including JWT implementation, OAuth 2.0 flows, refresh tokens, session management, and secure storage.

Authentication Patterns in Modern Web Apps: JWT, Sessions, and Passkeys
A guide to modern authentication patterns comparing JWT, session-based auth, and passkeys including implementation strategies, security considerations, and user experience.