Disaster Recovery Planning for Web Applications: Backup and Restore Strategies

Every team has a disaster recovery plan. Most teams discover theirs is inadequate at 3 AM when the database is corrupt and the last backup is two weeks old. A proper disaster recovery plan isn't a document that collects dust—it's a tested, automated system that restores critical services within your recovery time objective (RTO) with data loss within your recovery point objective (RPO).
Defining RTO and RPO
Two numbers govern every disaster recovery plan:
- Recovery Point Objective (RPO): Maximum acceptable data loss measured in time. If RPO is one hour, you can lose at most one hour of data.
- Recovery Time Objective (RTO): Maximum acceptable downtime. If RTO is four hours, the system must be fully operational within four hours of disaster declaration.
These numbers drive every architectural decision. A 15-minute RPO requires continuous replication, not hourly backups. A 30-minute RTO demands automated failover infrastructure, not manual restore procedures.
Database Backup Strategies
PostgreSQL offers three backup approaches with different RPO implications:
pg_dump logical backups are portable and can be restored to different PostgreSQL versions or architectures. They're slow for large databases and only capture a point-in-time snapshot:
# Daily full backup
pg_dump -h localhost -U app_user \
--format=custom \
--file=/backups/daily/app_db_$(date +%Y%m%d).dump \
--compress=9 \
app_db
# Archive old backups
find /backups/daily -name "*.dump" -mtime +30 -delete
Physical backups with pg_basebackup copy the entire cluster directory. They're faster than logical backups and support point-in-time recovery (PITR) when combined with WAL archiving:
# Base backup (daily)
pg_basebackup -h localhost -U replicator \
-D /backups/base/$(date +%Y%m%d) \
-X stream -z -P
# WAL archiving (continuous)
# In postgresql.conf:
archive_mode = on
archive_command = 'aws s3 cp %p s3://myapp-wal-archive/%f'
With continuous WAL archiving, RPO drops to seconds—you can restore to any point in time between the last base backup and the last archived WAL segment.
Off-Site Backup Storage
Never store backups on the same machine as the database. Use a different region, a different provider, or at minimum a different availability zone. Encrypt backups before uploading:
# Encrypt and upload to S3
gpg --encrypt --recipient backup-key \
/backups/daily/app_db_20260622.dump
aws s3 cp /backups/daily/app_db_20260622.dump.gpg \
s3://myapp-backups/database/ \
--storage-class STANDARD_IA
# Cross-region replication
aws s3 cp s3://myapp-backups/database/app_db_20260622.dump.gpg \
s3://myapp-backups-dr-eu-west-1/database/
Test that you can decrypt and restore from your off-site storage. Encryption at rest is useless if the decryption key is stored next to the backups.
Multi-Region Replication
For critical applications, database replication across regions provides the fastest recovery:
# AWS Aurora global database
Resources:
PrimaryCluster:
Type: AWS::RDS::DBCluster
Properties:
Engine: aurora-postgresql
MasterUsername: admin
MasterUserPassword: !Ref MasterPassword
StorageEncrypted: true
SecondaryCluster:
Type: AWS::RDS::DBCluster
Properties:
Engine: aurora-postgresql
SourceRegion: us-east-1
ReplicationSourceIdentifier: !Ref PrimaryCluster
Aurora Global Database replicates within one second across regions. In a disaster, promote the secondary cluster to a standalone primary. DNS failover (Route 53 health checks) then redirects traffic to the new region.
Restore Testing and Automation
Manual restore procedures fail under pressure. Automate the full restore workflow and run it monthly:
#!/bin/bash
# restore_from_backup.sh
set -euo pipefail
BACKUP_DATE=$1
REGION=$2
echo "Downloading backup from ${BACKUP_DATE}"
aws s3 cp s3://myapp-backups-${REGION}/database/${BACKUP_DATE}.dump.gpg \
/tmp/restore/backup.dump.gpg
echo "Decrypting backup"
gpg --decrypt /tmp/restore/backup.dump.gpg > /tmp/restore/backup.dump
echo "Creating temporary database"
createdb temp_restore_verify
echo "Restoring backup"
pg_restore --dbname=temp_restore_verify --jobs=4 \
/tmp/restore/backup.dump
echo "Verifying data integrity"
psql -d temp_restore_verify -c "SELECT count(*) FROM users"
psql -d temp_restore_verify -c "SELECT count(*) FROM orders"
echo "Cleanup"
dropdb temp_restore_verify
rm -rf /tmp/restore
echo "✅ Backup ${BACKUP_DATE} verified successfully"
Automate this script to run in a CI/CD pipeline weekly. If the restore fails, alert the team immediately. A backup that can't be restored is worthless.
Runbooks and Communication
Document the exact steps for declaring a disaster, notifying stakeholders, failing over, and communicating status. Keep runbooks in a system accessible without the infrastructure that's down—GitHub, a wiki, or a runbook tool like PagerDuty Runbooks.
Include in every runbook: who to call, what to say, which services are affected, the expected restoration time, and the criteria for concluding the incident.
Prepare Your Recovery Plan with SoniNow
Disaster recovery planning is insurance you hope never to use but must trust completely. SoniNow designs and tests disaster recovery architectures so your team knows exactly what to do when things go wrong.
Related Insights

CI/CD Pipeline Design: Automating Build, Test, and Deployment Workflows
A guide to designing CI/CD pipelines that automate build, test, and deployment including GitHub Actions, GitLab CI, environment strategies, and rollback patterns.

CI/CD Pipeline for Next.js: GitHub Actions to Vercel and Docker Deployments
A step-by-step guide to building CI/CD pipelines for Next.js applications using GitHub Actions including automated testing, preview deployments, Docker builds, and production rollouts.

Database Migration Strategies: Zero-Downtime Schema Changes
Learn zero-downtime database migration strategies including expand-contract patterns, online schema changes, backward-compatible migrations, and rollback planning.