Webhook Integration Architecture: Reliable Event-Driven Systems

Webhooks are the backbone of event-driven system integration. Unlike polling APIs that waste resources checking for updates, webhooks push events the moment they happen. But building a reliable webhook system requires careful attention to delivery guarantees, security, and error handling. Here is how to get it right.
Delivery Guarantees with Idempotency
Network failures mean webhook delivery is never fully guaranteed. The practical approach is at-least-once delivery coupled with idempotency keys on the consumer side.
// Webhook sender — tracks delivery state
interface WebhookEvent {
id: string
type: string
payload: unknown
idempotencyKey: string
}
async function deliverWebhook(
url: string,
event: WebhookEvent,
signature: string
): Promise<boolean> {
const response = await fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Signature': signature,
'X-Idempotency-Key': event.idempotencyKey,
},
body: JSON.stringify(event),
})
return response.ok
}
The consumer must deduplicate using the idempotency key:
// Webhook consumer — idempotent processing
const processed = new Set<string>()
export async function handleWebhook(request: Request) {
const idempotencyKey = request.headers.get('X-Idempotency-Key')
if (!idempotencyKey) {
return new Response('Missing idempotency key', { status: 400 })
}
// Check if already processed
if (await hasBeenProcessed(idempotencyKey)) {
return new Response('Already processed', { status: 200 })
}
const event = await request.json()
await processEvent(event)
await markProcessed(idempotencyKey)
return new Response('OK', { status: 200 })
}
Store processed keys in Redis with a TTL matching your retry window. This prevents duplicate processing even when the sender retries after a timeout.
Signature Verification for Security
Every webhook payload must be signed so the consumer can verify it came from you. HMAC-SHA256 with a shared secret is the industry standard:
import { createHmac, timingSafeEqual } from 'node:crypto'
function signPayload(payload: string, secret: string): string {
return createHmac('sha256', secret)
.update(payload)
.digest('hex')
}
function verifySignature(
payload: string,
signature: string,
secret: string
): boolean {
const expected = signPayload(payload, secret)
try {
return timingSafeEqual(Buffer.from(signature), Buffer.from(expected))
} catch {
return false
}
}
Always use timingSafeEqual to prevent timing attacks. Expose your public verification endpoints in your API documentation so integrators can test their implementation.
Retry Strategy with Exponential Backoff
A single failure should not discard the event. Implement a retry queue with exponential backoff and a maximum retry limit:
async function deliverWithRetry(
url: string,
event: WebhookEvent,
secret: string
): Promise<void> {
const maxRetries = 5
const baseDelay = 1000 // 1 second
for (let attempt = 0; attempt < maxRetries; attempt++) {
const signature = signPayload(JSON.stringify(event), secret)
const success = await deliverWebhook(url, event, signature)
if (success) return
const delay = baseDelay * Math.pow(2, attempt) * (0.5 + Math.random() * 0.5)
await sleep(delay)
}
// Move to dead letter queue for manual inspection
await enqueueDeadLetter(event)
}
The jitter (random multiplier) prevents the thundering herd problem when a downstream service recovers and all retry attempts hit it simultaneously.
Monitoring and Dead Letter Queues
Not all webhooks will succeed. A proper monitoring setup tracks delivery rates, latency percentiles, and failure reasons:
- Set up alerts when delivery success rate drops below 99%
- Log every delivery attempt with status code, latency, and response body
- Implement a dead letter queue for events that fail after all retries
- Build a dashboard that shows active webhook health per integration
// Dead letter queue handler
async function processDeadLetter(): Promise<void> {
const failed = await db.deadLetterWebhook.findMany({
where: { retryCount: { gte: 5 } },
})
for (const item of failed) {
console.error(`Webhook ${item.id} permanently failed after ${item.retryCount} attempts`)
await notifyAdmin(item)
}
}
Periodically reprocess dead letter items when you know the downstream service has recovered. Automate this as a scheduled job rather than requiring manual intervention.
Graceful Shutdown and Queue Persistence
Webhooks in flight during a server restart must not be lost. Use a persistent message queue like Bull with Redis instead of in-memory queues. On startup, check for any pending deliveries and resume them.
Building a reliable webhook system requires infrastructure thinking — persistence, retries, idempotency, and monitoring. At SoniNow, we design event-driven architectures that handle millions of webhook deliveries with sub-second latency and zero data loss.
Building an integration platform? Talk to SoniNow about your webhook architecture and let us help you build a reliable event delivery system.
Related Insights

API Rate Limiting Strategies: Token Bucket, Leaky Bucket, and Sliding Window
A guide to implementing API rate limiting including token bucket, leaky bucket, sliding window, and distributed rate limiting with Redis for production APIs.

API Security Best Practices: Authentication, Rate Limiting, and Input Validation
Best practices for securing APIs including API key management, OAuth token validation, rate limiting, input sanitization, CORS configuration, and request signing.

Building AI Agents That Actually Work: Architecture and Orchestration Patterns
Learn production architecture patterns for building reliable AI agents including task planning, tool use, memory systems, reflection loops, and human-in-the-loop workflows.