Skip to content
GitHub

Runbook: Backup PostgreSQL

Propósito: Backup completo de base de datos PostgreSQL multi-tenant (todos los schemas).

Frecuencia: Diario (automatizado via cron)

Tiempo estimado: 10-30 minutos (depending on DB size)

Criticidad: 🔴 CRÍTICA - Required before deploys


  • Acceso SSH a servidor de base de datos
  • Usuario PostgreSQL con permis de pg_dump
  • Espacio en disco suficiente (~2-3x tamaño DB actual)
  • Backup storage configurado (local + remote)

Método: pg_dumpall - Dump completo de toda la instancia PostgreSQL.

Ventajas:

  • ✅ Incluye todos los schemas (tenants + central + parametros)
  • ✅ Incluye roles y permisos
  • ✅ Single file, fácil de restore

Desventajas:

  • ❌ Archivo grande (~1-5GB depending on data)
  • ❌ Restore es todo-o-nada (no se puede restore un solo tenant)

Método: pg_dump --schema=tenant_X - Backup de schema individual.

Ventajas:

  • ✅ Archivos pequeños (solo datos de un tenant)
  • ✅ Restore granular (solo un tenant si se necesita)

Desventajas:

  • ❌ Requiere script para iterar sobre todos los schemas
  • ❌ No incluye schema central ni parametros automáticamente

  1. Conectar a servidor de base de datos

    Terminal window
    cd /opt/backups
  2. Create backup directory (si no existe)

    Terminal window
    mkdir -p /opt/backups/$(date +%Y-%m)

    Organización por mes: /opt/backups/2026-01/

  3. Execute pg_dumpall

    Terminal window
    BACKUP_FILE="/opt/backups/$(date +%Y-%m)/backup_$(date +%Y-%m-%d_%H-%M).sql"
    sudo -u postgres pg_dumpall > "$BACKUP_FILE"

    Expected output: Archivo ~1-5GB creado en /opt/backups/2026-01/backup_2026-01-18_02-00.sql

  4. Compress backup

    Terminal window
    gzip "$BACKUP_FILE"

    Compression ratio: Típicamente 5-10x (1GB → 100-200MB).

  5. Verify backup integrity

    Terminal window
    gunzip -t "$BACKUP_FILE.gz"

    Expected: (no output = OK). Si corrupto: gzip: ... unexpected end of file

  6. Upload to remote storage

    Terminal window
    aws s3 cp "$BACKUP_FILE.gz" s3://nostromo-backups/postgres/
  7. Cleanup old backups (retention policy)

    Terminal window
    find /opt/backups -name "backup_*.sql.gz" -mtime +7 -delete

    Retention policy: Local 7 días, Remote 30 días, Archive 1 año.


SELECT nspname
FROM pg_namespace
WHERE nspname LIKE 'tenant_%'
ORDER BY nspname;

Expected output:

tenant_empresa_123
tenant_empresa_456
tenant_empresa_789
Terminal window
SCHEMAS=$(psql -U postgres -d nostromo -t -c \
"SELECT nspname FROM pg_namespace WHERE nspname LIKE 'tenant_%'")
for SCHEMA in $SCHEMAS; do
BACKUP_FILE="/opt/backups/schemas/$SCHEMA_$(date +%Y-%m-%d).sql"
pg_dump -U postgres -d nostromo \
--schema="$SCHEMA" \
-F c \
-f "$BACKUP_FILE"
gzip "$BACKUP_FILE"
done

Output: Un archivo por schema (ej: tenant_empresa_123_2026-01-18.sql.gz).

Schema central:

Terminal window
pg_dump -U postgres -d nostromo --schema=central -F c -f /opt/backups/schemas/central_$(date +%Y-%m-%d).sql

Schema parametros:

Terminal window
pg_dump -U postgres -d nostromo --schema=parametros -F c -f /opt/backups/schemas/parametros_$(date +%Y-%m-%d).sql

Terminal window
# Editar crontab
crontab -e

Add cron entry:

# Full backup daily at 2 AM
0 2 * * * /opt/scripts/postgres_backup.sh >> /var/log/postgres_backup.log 2>&1
# Per-schema backup weekly (Sunday 3 AM)
0 3 * * 0 /opt/scripts/postgres_backup_schemas.sh >> /var/log/postgres_backup_schemas.log 2>&1

File: /opt/scripts/postgres_backup.sh

#!/bin/bash
set -e
# Variables
BACKUP_DIR="/opt/backups/$(date +%Y-%m)"
BACKUP_FILE="$BACKUP_DIR/backup_$(date +%Y-%m-%d_%H-%M).sql"
S3_BUCKET="s3://nostromo-backups/postgres/"
# Create directory
mkdir -p "$BACKUP_DIR"
# Backup
echo "Starting backup at $(date)"
sudo -u postgres pg_dumpall > "$BACKUP_FILE"
# Compress
echo "Compressing backup..."
gzip "$BACKUP_FILE"
# Upload to S3
echo "Uploading to S3..."
aws s3 cp "$BACKUP_FILE.gz" "$S3_BUCKET"
# Cleanup old local backups
echo "Cleaning up old backups..."
find /opt/backups -name "backup_*.sql.gz" -mtime +7 -delete
echo "Backup completed at $(date)"

Make executable:

Terminal window
chmod +x /opt/scripts/postgres_backup.sh

Option 1: PM2:

Terminal window
pm2 stop all

Option 2: systemctl:

Terminal window
sudo systemctl stop orchestrator sevastopol

Terminal window
sudo -u postgres psql -c "DROP DATABASE nostromo;"

Terminal window
sudo -u postgres psql -c "CREATE DATABASE nostromo;"

Terminal window
# Descomprimir backup
BACKUP_FILE="/opt/backups/2026-01/backup_2026-01-18_02-00.sql.gz"
gunzip -c "$BACKUP_FILE" | sudo -u postgres psql
# Alternative: download from S3 first
aws s3 cp s3://nostromo-backups/postgres/backup_2026-01-18_02-00.sql.gz .
gunzip -c backup_2026-01-18_02-00.sql.gz | sudo -u postgres psql

Expected: Proceso toma 15-45 minutos depending on backup size.


-- Conectar a DB
sudo -u postgres psql -d nostromo
-- Verificar schemas existen
\dn
-- Expected: central, parametros, tenant_empresa_123, etc.
-- Verificar data
SELECT COUNT(*) FROM central.tenants;
SELECT COUNT(*) FROM tenant_empresa_123.employees;

Option 1: PM2:

Terminal window
pm2 start all

Option 2: systemctl:

Terminal window
sudo systemctl start orchestrator sevastopol

Para restaurar UN solo tenant (sin afectar otros):

Terminal window
SCHEMA="tenant_empresa_123"
BACKUP_FILE="/opt/backups/schemas/${SCHEMA}_2026-01-18.sql.gz"
# Drop schema
sudo -u postgres psql -d nostromo -c "DROP SCHEMA IF EXISTS $SCHEMA CASCADE;"
# Restore
gunzip -c "$BACKUP_FILE" | sudo -u postgres pg_restore -d nostromo

Check cron logs:

Terminal window
tail -f /var/log/postgres_backup.log

Expected output:

Starting backup at Sat Jan 18 02:00:01 CLT 2026
Compressing backup...
Uploading to S3...
Backup completed at Sat Jan 18 02:15:34 CLT 2026

Alerting: Configure monitoring para detectar si backup falla:

  • Slack webhook si script exit code ≠ 0
  • Email si backup file size = 0 (indicador de fallo)

Terminal window
# Listar backups por tamaño
ls -lh /opt/backups/2026-01/ | grep backup
# Expected growth: ~5-10% monthly

Alert si:

  • Backup size crece mayor a 50% month-over-month (posible data leak)
  • Backup size = 0 bytes (backup falló)

Error: No space left on device

Solution:

Terminal window
# Check disk usage
df -h /opt/backups
# Cleanup old backups manualmente
rm /opt/backups/2025-*/backup_*.sql.gz
# Alternatively: move to larger disk

Síntoma: Backup corre por horas sin completar.

Debugging:

Terminal window
# Check active queries
sudo -u postgres psql -c \
"SELECT pid, query, state FROM pg_stat_activity WHERE state != 'idle';"
# Si hay query bloqueando, kill it
sudo -u postgres psql -c "SELECT pg_terminate_backend(12345);"

Restore fails with “role does not exist”

Section titled “Restore fails with “role does not exist””

Error: ERROR: role "orchestrator_user" does not exist

Solution: pg_dumpall debería incluir roles, pero si no:

Terminal window
# Restore SOLO roles first
gunzip -c backup.sql.gz | grep "CREATE ROLE" | sudo -u postgres psql
# Then restore data
gunzip -c backup.sql.gz | sudo -u postgres psql

  • Backup ANTES de cada deploy (ver Deploy Runbook)
  • Test restores mensualmente - Backup sin restore tested es inútil
  • Remote backup obligatorio - Local disk puede fallar
  • Encrypt backups si contienen PII (GDPR compliance)
  • Document restore time - Saber cuánto tarda restore ayuda en DR planning


FechaVersionCambios
2026-01-181.0Runbook inicial creado