Runbook: Backup PostgreSQL
Overview
Section titled “Overview”Propósito: Backup completo de base de datos PostgreSQL multi-tenant (todos los schemas).
Frecuencia: Diario (automatizado via cron)
Tiempo estimado: 10-30 minutos (depending on DB size)
Criticidad: 🔴 CRÍTICA - Required before deploys
Prerequisites
Section titled “Prerequisites”- Acceso SSH a servidor de base de datos
- Usuario PostgreSQL con permis de
pg_dump - Espacio en disco suficiente (~2-3x tamaño DB actual)
- Backup storage configurado (local + remote)
Backup Strategies
Section titled “Backup Strategies”Full Database Backup (Recommended)
Section titled “Full Database Backup (Recommended)”Método: pg_dumpall - Dump completo de toda la instancia PostgreSQL.
Ventajas:
- ✅ Incluye todos los schemas (tenants + central + parametros)
- ✅ Incluye roles y permisos
- ✅ Single file, fácil de restore
Desventajas:
- ❌ Archivo grande (~1-5GB depending on data)
- ❌ Restore es todo-o-nada (no se puede restore un solo tenant)
Per-Schema Backup
Section titled “Per-Schema Backup”Método: pg_dump --schema=tenant_X - Backup de schema individual.
Ventajas:
- ✅ Archivos pequeños (solo datos de un tenant)
- ✅ Restore granular (solo un tenant si se necesita)
Desventajas:
- ❌ Requiere script para iterar sobre todos los schemas
- ❌ No incluye schema
centralniparametrosautomáticamente
Procedure - Full Backup
Section titled “Procedure - Full Backup”-
Conectar a servidor de base de datos
Terminal window cd /opt/backups -
Create backup directory (si no existe)
Terminal window mkdir -p /opt/backups/$(date +%Y-%m)Organización por mes:
/opt/backups/2026-01/ -
Execute pg_dumpall
Terminal window BACKUP_FILE="/opt/backups/$(date +%Y-%m)/backup_$(date +%Y-%m-%d_%H-%M).sql"sudo -u postgres pg_dumpall > "$BACKUP_FILE"Expected output: Archivo ~1-5GB creado en
/opt/backups/2026-01/backup_2026-01-18_02-00.sql -
Compress backup
Terminal window gzip "$BACKUP_FILE"Compression ratio: Típicamente 5-10x (1GB → 100-200MB).
-
Verify backup integrity
Terminal window gunzip -t "$BACKUP_FILE.gz"Expected: (no output = OK). Si corrupto:
gzip: ... unexpected end of file -
Upload to remote storage
Terminal window aws s3 cp "$BACKUP_FILE.gz" s3://nostromo-backups/postgres/ -
Cleanup old backups (retention policy)
Terminal window find /opt/backups -name "backup_*.sql.gz" -mtime +7 -deleteRetention policy: Local 7 días, Remote 30 días, Archive 1 año.
Procedure - Per-Schema Backup
Section titled “Procedure - Per-Schema Backup”Procedure - Per-Schema Backup
Section titled “Procedure - Per-Schema Backup”1. List all tenant schemas
Section titled “1. List all tenant schemas”SELECT nspnameFROM pg_namespaceWHERE nspname LIKE 'tenant_%'ORDER BY nspname;Expected output:
tenant_empresa_123tenant_empresa_456tenant_empresa_7892. Backup each schema
Section titled “2. Backup each schema”SCHEMAS=$(psql -U postgres -d nostromo -t -c \ "SELECT nspname FROM pg_namespace WHERE nspname LIKE 'tenant_%'")
for SCHEMA in $SCHEMAS; do BACKUP_FILE="/opt/backups/schemas/$SCHEMA_$(date +%Y-%m-%d).sql"
pg_dump -U postgres -d nostromo \ --schema="$SCHEMA" \ -F c \ -f "$BACKUP_FILE"
gzip "$BACKUP_FILE"doneOutput: Un archivo por schema (ej: tenant_empresa_123_2026-01-18.sql.gz).
3. Backup shared schemas
Section titled “3. Backup shared schemas”Schema central:
pg_dump -U postgres -d nostromo --schema=central -F c -f /opt/backups/schemas/central_$(date +%Y-%m-%d).sqlSchema parametros:
pg_dump -U postgres -d nostromo --schema=parametros -F c -f /opt/backups/schemas/parametros_$(date +%Y-%m-%d).sqlAutomated Backup (Cron)
Section titled “Automated Backup (Cron)”Setup cron job
Section titled “Setup cron job”# Editar crontabcrontab -eAdd cron entry:
# Full backup daily at 2 AM0 2 * * * /opt/scripts/postgres_backup.sh >> /var/log/postgres_backup.log 2>&1
# Per-schema backup weekly (Sunday 3 AM)0 3 * * 0 /opt/scripts/postgres_backup_schemas.sh >> /var/log/postgres_backup_schemas.log 2>&1Backup script
Section titled “Backup script”File: /opt/scripts/postgres_backup.sh
#!/bin/bashset -e
# VariablesBACKUP_DIR="/opt/backups/$(date +%Y-%m)"BACKUP_FILE="$BACKUP_DIR/backup_$(date +%Y-%m-%d_%H-%M).sql"S3_BUCKET="s3://nostromo-backups/postgres/"
# Create directorymkdir -p "$BACKUP_DIR"
# Backupecho "Starting backup at $(date)"sudo -u postgres pg_dumpall > "$BACKUP_FILE"
# Compressecho "Compressing backup..."gzip "$BACKUP_FILE"
# Upload to S3echo "Uploading to S3..."aws s3 cp "$BACKUP_FILE.gz" "$S3_BUCKET"
# Cleanup old local backupsecho "Cleaning up old backups..."find /opt/backups -name "backup_*.sql.gz" -mtime +7 -delete
echo "Backup completed at $(date)"Make executable:
chmod +x /opt/scripts/postgres_backup.shRestore Procedure
Section titled “Restore Procedure”Full Database Restore
Section titled “Full Database Restore”1. Stop all services
Section titled “1. Stop all services”Option 1: PM2:
pm2 stop allOption 2: systemctl:
sudo systemctl stop orchestrator sevastopol2. Drop existing database
Section titled “2. Drop existing database”sudo -u postgres psql -c "DROP DATABASE nostromo;"3. Recreate database
Section titled “3. Recreate database”sudo -u postgres psql -c "CREATE DATABASE nostromo;"4. Restore from backup
Section titled “4. Restore from backup”# Descomprimir backupBACKUP_FILE="/opt/backups/2026-01/backup_2026-01-18_02-00.sql.gz"gunzip -c "$BACKUP_FILE" | sudo -u postgres psql
# Alternative: download from S3 firstaws s3 cp s3://nostromo-backups/postgres/backup_2026-01-18_02-00.sql.gz .gunzip -c backup_2026-01-18_02-00.sql.gz | sudo -u postgres psqlExpected: Proceso toma 15-45 minutos depending on backup size.
5. Verify restoration
Section titled “5. Verify restoration”-- Conectar a DBsudo -u postgres psql -d nostromo
-- Verificar schemas existen\dn
-- Expected: central, parametros, tenant_empresa_123, etc.
-- Verificar dataSELECT COUNT(*) FROM central.tenants;SELECT COUNT(*) FROM tenant_empresa_123.employees;6. Restart services
Section titled “6. Restart services”Option 1: PM2:
pm2 start allOption 2: systemctl:
sudo systemctl start orchestrator sevastopolPer-Schema Restore
Section titled “Per-Schema Restore”Para restaurar UN solo tenant (sin afectar otros):
SCHEMA="tenant_empresa_123"BACKUP_FILE="/opt/backups/schemas/${SCHEMA}_2026-01-18.sql.gz"
# Drop schemasudo -u postgres psql -d nostromo -c "DROP SCHEMA IF EXISTS $SCHEMA CASCADE;"
# Restoregunzip -c "$BACKUP_FILE" | sudo -u postgres pg_restore -d nostromoMonitoring
Section titled “Monitoring”Backup Success/Failure
Section titled “Backup Success/Failure”Check cron logs:
tail -f /var/log/postgres_backup.logExpected output:
Starting backup at Sat Jan 18 02:00:01 CLT 2026Compressing backup...Uploading to S3...Backup completed at Sat Jan 18 02:15:34 CLT 2026Alerting: Configure monitoring para detectar si backup falla:
- Slack webhook si script exit code ≠ 0
- Email si backup file size = 0 (indicador de fallo)
Backup Size Tracking
Section titled “Backup Size Tracking”# Listar backups por tamañols -lh /opt/backups/2026-01/ | grep backup
# Expected growth: ~5-10% monthlyAlert si:
- Backup size crece mayor a 50% month-over-month (posible data leak)
- Backup size = 0 bytes (backup falló)
Troubleshooting
Section titled “Troubleshooting”Backup fails with “out of space”
Section titled “Backup fails with “out of space””Error: No space left on device
Solution:
# Check disk usagedf -h /opt/backups
# Cleanup old backups manualmenterm /opt/backups/2025-*/backup_*.sql.gz
# Alternatively: move to larger diskpg_dumpall hangs
Section titled “pg_dumpall hangs”Síntoma: Backup corre por horas sin completar.
Debugging:
# Check active queriessudo -u postgres psql -c \ "SELECT pid, query, state FROM pg_stat_activity WHERE state != 'idle';"
# Si hay query bloqueando, kill itsudo -u postgres psql -c "SELECT pg_terminate_backend(12345);"Restore fails with “role does not exist”
Section titled “Restore fails with “role does not exist””Error: ERROR: role "orchestrator_user" does not exist
Solution: pg_dumpall debería incluir roles, pero si no:
# Restore SOLO roles firstgunzip -c backup.sql.gz | grep "CREATE ROLE" | sudo -u postgres psql
# Then restore datagunzip -c backup.sql.gz | sudo -u postgres psqlBest Practices
Section titled “Best Practices”- ✅ Backup ANTES de cada deploy (ver Deploy Runbook)
- ✅ Test restores mensualmente - Backup sin restore tested es inútil
- ✅ Remote backup obligatorio - Local disk puede fallar
- ✅ Encrypt backups si contienen PII (GDPR compliance)
- ✅ Document restore time - Saber cuánto tarda restore ayuda en DR planning
Related Documentation
Section titled “Related Documentation”- Runbook: Deploy - Backup requerido antes de deploy
- Runbook: Recovery - Disaster recovery completo
- ADR-001: Multi-Tenant - Schema-per-tenant strategy
- Infrastructure: PostgreSQL - PostgreSQL config
Changelog
Section titled “Changelog”| Fecha | Version | Cambios |
|---|---|---|
| 2026-01-18 | 1.0 | Runbook inicial creado |