Disaster Recovery Planning for Web Applications: Backup and Restore Strategies

Every team has a disaster recovery plan. Most teams discover theirs is inadequate at 3 AM when the database is corrupt and the last backup is two weeks old. A proper disaster recovery plan isn't a document that collects dust—it's a tested, automated system that restores critical services within your recovery time objective (RTO) with data loss within your recovery point objective (RPO).

Defining RTO and RPO

Two numbers govern every disaster recovery plan:

Recovery Point Objective (RPO): Maximum acceptable data loss measured in time. If RPO is one hour, you can lose at most one hour of data.
Recovery Time Objective (RTO): Maximum acceptable downtime. If RTO is four hours, the system must be fully operational within four hours of disaster declaration.

These numbers drive every architectural decision. A 15-minute RPO requires continuous replication, not hourly backups. A 30-minute RTO demands automated failover infrastructure, not manual restore procedures.

Database Backup Strategies

PostgreSQL offers three backup approaches with different RPO implications:

pg_dump logical backups are portable and can be restored to different PostgreSQL versions or architectures. They're slow for large databases and only capture a point-in-time snapshot:

# Daily full backup
pg_dump -h localhost -U app_user \
  --format=custom \
  --file=/backups/daily/app_db_$(date +%Y%m%d).dump \
  --compress=9 \
  app_db

# Archive old backups
find /backups/daily -name "*.dump" -mtime +30 -delete

Physical backups with pg_basebackup copy the entire cluster directory. They're faster than logical backups and support point-in-time recovery (PITR) when combined with WAL archiving:

# Base backup (daily)
pg_basebackup -h localhost -U replicator \
  -D /backups/base/$(date +%Y%m%d) \
  -X stream -z -P

# WAL archiving (continuous)
# In postgresql.conf:
archive_mode = on
archive_command = 'aws s3 cp %p s3://myapp-wal-archive/%f'

With continuous WAL archiving, RPO drops to seconds—you can restore to any point in time between the last base backup and the last archived WAL segment.

Off-Site Backup Storage

Never store backups on the same machine as the database. Use a different region, a different provider, or at minimum a different availability zone. Encrypt backups before uploading:

# Encrypt and upload to S3
gpg --encrypt --recipient backup-key \
  /backups/daily/app_db_20260622.dump

aws s3 cp /backups/daily/app_db_20260622.dump.gpg \
  s3://myapp-backups/database/ \
  --storage-class STANDARD_IA

# Cross-region replication
aws s3 cp s3://myapp-backups/database/app_db_20260622.dump.gpg \
  s3://myapp-backups-dr-eu-west-1/database/

Test that you can decrypt and restore from your off-site storage. Encryption at rest is useless if the decryption key is stored next to the backups.

Multi-Region Replication

For critical applications, database replication across regions provides the fastest recovery:

# AWS Aurora global database
Resources:
  PrimaryCluster:
    Type: AWS::RDS::DBCluster
    Properties:
      Engine: aurora-postgresql
      MasterUsername: admin
      MasterUserPassword: !Ref MasterPassword
      StorageEncrypted: true

  SecondaryCluster:
    Type: AWS::RDS::DBCluster
    Properties:
      Engine: aurora-postgresql
      SourceRegion: us-east-1
      ReplicationSourceIdentifier: !Ref PrimaryCluster

Aurora Global Database replicates within one second across regions. In a disaster, promote the secondary cluster to a standalone primary. DNS failover (Route 53 health checks) then redirects traffic to the new region.

Restore Testing and Automation

Manual restore procedures fail under pressure. Automate the full restore workflow and run it monthly:

#!/bin/bash
# restore_from_backup.sh
set -euo pipefail

BACKUP_DATE=$1
REGION=$2

echo "Downloading backup from ${BACKUP_DATE}"
aws s3 cp s3://myapp-backups-${REGION}/database/${BACKUP_DATE}.dump.gpg \
  /tmp/restore/backup.dump.gpg

echo "Decrypting backup"
gpg --decrypt /tmp/restore/backup.dump.gpg > /tmp/restore/backup.dump

echo "Creating temporary database"
createdb temp_restore_verify

echo "Restoring backup"
pg_restore --dbname=temp_restore_verify --jobs=4 \
  /tmp/restore/backup.dump

echo "Verifying data integrity"
psql -d temp_restore_verify -c "SELECT count(*) FROM users"
psql -d temp_restore_verify -c "SELECT count(*) FROM orders"

echo "Cleanup"
dropdb temp_restore_verify
rm -rf /tmp/restore

echo "✅ Backup ${BACKUP_DATE} verified successfully"

Automate this script to run in a CI/CD pipeline weekly. If the restore fails, alert the team immediately. A backup that can't be restored is worthless.

Runbooks and Communication

Document the exact steps for declaring a disaster, notifying stakeholders, failing over, and communicating status. Keep runbooks in a system accessible without the infrastructure that's down—GitHub, a wiki, or a runbook tool like PagerDuty Runbooks.

Include in every runbook: who to call, what to say, which services are affected, the expected restoration time, and the criteria for concluding the incident.

Prepare Your Recovery Plan with SoniNow

Disaster recovery planning is insurance you hope never to use but must trust completely. SoniNow designs and tests disaster recovery architectures so your team knows exactly what to do when things go wrong.

Disaster Recovery Planning for Web Applications: Backup and Restore Strategies

Defining RTO and RPO

Database Backup Strategies

Off-Site Backup Storage

Multi-Region Replication

Restore Testing and Automation

Runbooks and Communication

Prepare Your Recovery Plan with SoniNow

Related Insights

uptimesaas Performance Monitoring Setup Guide

Website Uptime Monitoring with uptimesaas: The Complete Guide

CI/CD Pipeline Design: Automating Build, Test, and Deployment Workflows