API Rate Limiting Strategies: Token Bucket, Leaky Bucket, and Sliding Window | SoniNow Blog

Limited TimeLearn More

api rate limitingbackendsecurityperformanceapi

API Rate Limiting Strategies: Token Bucket, Leaky Bucket, and Sliding Window

Published

2026-06-23

Read Time

5 mins

API Rate Limiting Strategies: Token Bucket, Leaky Bucket, and Sliding Window

Every production API needs rate limiting. Without it, a single abusive client can saturate your database connections, exhaust your worker pool, and degrade service for every other user. The question is not whether to rate limit — it is which algorithm to use. The four dominant strategies — token bucket, leaky bucket, fixed window, and sliding window — each make different trade-offs between accuracy, memory, and burst allowance.

Token Bucket: Controlled Bursts

The token bucket algorithm allows bursts up to a configurable capacity while enforcing a steady long-term rate. Tokens are added at a fixed rate — typically once per second. Each request consumes one token. If the bucket is empty, the request is denied.

// Token Bucket implementation with Redis
import { createClient } from 'redis'

const client = createClient()

async function tokenBucket(userId: string): Promise<boolean> {
  const key = `rate-limit:${userId}`
  const now = Date.now()
  const capacity = 10     // max burst
  const refillRate = 1    // tokens per second
  const refillInterval = 1000 / refillRate

  const result = await client.eval(`
    local key = KEYS[1]
    local now = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local refillRate = tonumber(ARGV[3])

    local data = redis.call('HMGET', key, 'tokens', 'lastRefill')
    local tokens = tonumber(data[1]) or capacity
    local lastRefill = tonumber(data[2]) or now

    local elapsed = math.max(0, now - lastRefill)
    local newTokens = math.min(capacity, tokens + math.floor(elapsed * refillRate / 1000))

    if newTokens >= 1 then
      redis.call('HMSET', key, 'tokens', newTokens - 1, 'lastRefill', now)
      redis.call('EXPIRE', key, 60)
      return 1
    end

    return 0
  `, [key], [now, capacity, refillRate])

  return result === 1
}

The token bucket is ideal for APIs where occasional bursts are legitimate — a user refreshing a dashboard or batch uploading files. It permits short-term spikes while protecting long-term average throughput.

Leaky Bucket: Smoothing Traffic

The leaky bucket enforces a strict processing rate. Requests enter a queue and are processed at a fixed rate, regardless of incoming burst size. If the queue overflows, excess requests are dropped.

// Leaky bucket with timeout management
class LeakyBucket {
  private queue: Array<() => Promise<void>> = []
  private readonly capacity: number
  private readonly leakRate: number // requests per second
  private processing = false

  constructor(capacity: number, leakRate: number) {
    this.capacity = capacity
    this.leakRate = leakRate
  }

  async add(request: () => Promise<void>): Promise<boolean> {
    if (this.queue.length >= this.capacity) {
      return false // overflow, request dropped
    }
    this.queue.push(request)
    this.process()
    return true
  }

  private async process() {
    if (this.processing || this.queue.length === 0) return
    this.processing = true

    while (this.queue.length > 0) {
      const request = this.queue.shift()!
      const start = Date.now()
      try {
        await request()
      } catch {}

      // Enforce leak rate
      const elapsed = Date.now() - start
      const minInterval = 1000 / this.leakRate
      if (elapsed < minInterval) {
        await new Promise((r) => setTimeout(r, minInterval - elapsed))
      }
    }

    this.processing = false
  }
}

Use the leaky bucket for downstream integrations with strict throughput limits — webhook delivery, batch email sending, or third-party API proxies where exceeding the limit causes hard failures.

Fixed Window: Simple but Bumpy

Fixed window divides time into discrete buckets (e.g., 100 requests per minute). The implementation is trivial but suffers from boundary spikes — a client can send 100 requests at 10:59:59 and 100 more at 11:00:00, effectively doubling throughput at window edges.

// Fixed window rate limiter
async function fixedWindow(userId: string, limit: number, windowMs: number) {
  const key = `fixed:${userId}:${Math.floor(Date.now() / windowMs)}`
  const count = await client.incr(key)
  if (count === 1) {
    await client.expire(key, Math.ceil(windowMs / 1000))
  }
  return count <= limit
}

Fixed window is acceptable for non-critical rate limiting like login attempt throttling where occasional boundary bursts are tolerable. It uses minimal memory — one key per user per window.

Sliding Window: Smooth and Accurate

Sliding window rate limiting tracks requests across a rolling time window, eliminating the boundary spike problem. The log approach stores timestamps; the counter approach approximates using weighted counts of the current and previous windows.

// Sliding window log with sorted sets
async function slidingWindow(userId: string, limit: number, windowMs: number) {
  const key = `sliding:${userId}`
  const now = Date.now()
  const windowStart = now - windowMs

  // Remove expired timestamps
  await client.zRemRangeByScore(key, 0, windowStart)

  // Count requests in current window
  const count = await client.zCard(key)

  if (count >= limit) {
    return false // rate limited
  }

  // Add current request
  await client.zAdd(key, { score: now, value: `${now}` })
  await client.expire(key, Math.ceil(windowMs / 1000))

  return true
}

For high-throughput APIs, use the sliding window counter approximation instead of logs to avoid O(n) memory per user.

Distributed Rate Limiting with Redis

In multi-instance deployments, rate limit state must be shared. Redis provides atomic operations, sorted sets, and Lua scripting — everything needed for distributed rate limiting.

// Distributed rate limit middleware
import { Request, Response, NextFunction } from 'express'

function rateLimit(options: { limit: number; windowMs: number; algorithm: string }) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const userId = req.ip || req.headers['x-user-id'] as string
    let allowed: boolean

    switch (options.algorithm) {
      case 'token-bucket':
        allowed = await tokenBucket(userId)
        break
      case 'sliding-window':
        allowed = await slidingWindow(userId, options.limit, options.windowMs)
        break
      default:
        allowed = await fixedWindow(userId, options.limit, options.windowMs)
    }

    if (!allowed) {
      return res.status(429).json({
        error: 'Too many requests',
        retryAfter: Math.ceil(options.windowMs / 1000),
      })
    }

    next()
  }
}

Choosing Your Strategy

Match the algorithm to your traffic pattern. Token bucket for APIs serving interactive dashboards and authenticated users. Leaky bucket for downstream proxy integrations. Fixed window for simple, non-critical throttling. Sliding window for billing-grade rate limits where accuracy matters at every boundary.

Implementation Guidance

Rate limiting is not a one-size-fits-all problem. Start with the simplest algorithm that meets your requirements, instrument it with monitoring, and iterate as traffic patterns reveal edge cases.

At SoniNow, we implement production rate limiting across applications serving millions of requests daily. Our web development services include API architecture, rate limit strategy design, and Redis-based backend optimization.

Protect your API from abuse. Partner with SoniNow to implement the right rate limiting strategy for your application.