Runbook: Database Migrations
Overview
Section titled “Overview”Propósito: Ejecutar migraciones de esquema en la base de datos PostgreSQL multi-tenant.
Frecuencia: Por release (cada 1-2 semanas)
Tiempo estimado: 10-30 minutos (dependiendo de tamaño de datos)
Criticidad: 🔴 ALTA - Puede causar downtime si falla
Tipos de Migraciones
Section titled “Tipos de Migraciones”| Tipo | Riesgo | Ejemplo |
|---|---|---|
| DDL Safe | Bajo | ADD COLUMN nullable |
| DDL Risky | Alto | DROP COLUMN, ALTER TYPE |
| DML | Medio | UPDATE masivo, data backfill |
| Schema Multi-Tenant | Alto | Aplicar a todos los tenant schemas |
Pre-Migration Checklist
Section titled “Pre-Migration Checklist”- Backup completo realizado (ver Runbook: Backup)
- Migration testeada en staging/desarrollo
- Downtime window aprobado (si aplica)
- Rollback script preparado
- Team notificado
Procedure - Standard Migration
Section titled “Procedure - Standard Migration”-
Conectar a servidor de producción
Terminal window cd /opt/nostromo/orchestrator -
Verificar estado actual de migraciones
Terminal window npm run migrate:statusExpected output:
Applied migrations:✓ 001_initial_schema.sql✓ 002_add_indexes.sql✓ 003_add_audit_tables.sqlPending migrations:○ 004_add_employee_fields.sql -
Crear backup pre-migration
Terminal window BACKUP_FILE="/opt/backups/pre-migration-$(date +%Y%m%d-%H%M).sql.gz"sudo -u postgres pg_dumpall | gzip > "$BACKUP_FILE" -
Ejecutar migration en modo dry-run (si disponible)
Terminal window npm run migrate:dry-run -
Ejecutar migration
Terminal window npm run migrate:upExpected output:
Running migration: 004_add_employee_fields.sql✓ Migration applied successfully -
Verificar que migration se aplicó
Terminal window npm run migrate:statusTambién verificar en DB:
SELECT * FROM central.schema_migrations ORDER BY applied_at DESC LIMIT 5; -
Smoke test de aplicación
Terminal window curl http://localhost:8000/health
Procedure - Multi-Tenant Migration
Section titled “Procedure - Multi-Tenant Migration”Para migraciones que afectan todos los tenant schemas:
-
Listar todos los tenant schemas
SELECT schema_name FROM central.tenants WHERE active = true; -
Ejecutar migration para cada tenant
Terminal window # Script que itera sobre tenantsnpm run migrate:tenantsO manualmente:
Terminal window for SCHEMA in tenant_empresa_1 tenant_empresa_2; doecho "Migrating $SCHEMA..."psql -U postgres -d nostromo -c "SET search_path TO $SCHEMA; \i migration.sql"done -
Verificar cada tenant
SELECT schema_name,(SELECT COUNT(*) FROM information_schema.columnsWHERE table_schema = t.schema_name AND column_name = 'new_column') as has_columnFROM central.tenants t;
Procedure - Rollback
Section titled “Procedure - Rollback”Si la migration falla o causa problemas:
-
Evaluar severidad
- ¿Aplicación funciona? → Puede esperar
- ¿Errores 500? → Rollback inmediato
-
Ejecutar rollback script
Terminal window npm run migrate:downO manualmente:
-- Rollback específico (ejemplo)ALTER TABLE employees DROP COLUMN IF EXISTS new_field;DELETE FROM central.schema_migrations WHERE name = '004_add_employee_fields.sql'; -
Restaurar desde backup (si necesario)
Terminal window gunzip -c /opt/backups/pre-migration-*.sql.gz | sudo -u postgres psql -
Verificar estado post-rollback
Terminal window npm run migrate:statuscurl http://localhost:8000/health -
Post-mortem
Documentar qué falló y cómo prevenirlo.
Migraciones Seguras vs Riesgosas
Section titled “Migraciones Seguras vs Riesgosas”✅ Operaciones Seguras (sin downtime)
Section titled “✅ Operaciones Seguras (sin downtime)”-- Agregar columna nullableALTER TABLE employees ADD COLUMN middle_name VARCHAR(100);
-- Agregar índice concurrenteCREATE INDEX CONCURRENTLY idx_employees_email ON employees(email);
-- Agregar constraint con validación diferidaALTER TABLE employees ADD CONSTRAINT check_rut CHECK (rut ~ '^[0-9]+-[0-9Kk]$') NOT VALID;⚠️ Operaciones Riesgosas (requieren downtime)
Section titled “⚠️ Operaciones Riesgosas (requieren downtime)”-- Cambiar tipo de columna (bloquea tabla)ALTER TABLE employees ALTER COLUMN salary TYPE DECIMAL(12,2);
-- Agregar columna NOT NULL sin defaultALTER TABLE employees ADD COLUMN required_field VARCHAR(50) NOT NULL;
-- Borrar columnaALTER TABLE employees DROP COLUMN old_field;🔄 Patrón Seguro para Cambios Riesgosos
Section titled “🔄 Patrón Seguro para Cambios Riesgosos”-- Paso 1: Agregar columna nueva nullableALTER TABLE employees ADD COLUMN salary_new DECIMAL(12,2);
-- Paso 2: Backfill en batches (fuera de transaction)UPDATE employees SET salary_new = salary::decimal(12,2) WHERE id BETWEEN 1 AND 10000;UPDATE employees SET salary_new = salary::decimal(12,2) WHERE id BETWEEN 10001 AND 20000;-- ...
-- Paso 3: Agregar NOT NULL constraintALTER TABLE employees ALTER COLUMN salary_new SET NOT NULL;
-- Paso 4: Renombrar columnas (en siguiente release)ALTER TABLE employees RENAME COLUMN salary TO salary_old;ALTER TABLE employees RENAME COLUMN salary_new TO salary;
-- Paso 5: Drop columna vieja (en release posterior)ALTER TABLE employees DROP COLUMN salary_old;Tracking de Migraciones
Section titled “Tracking de Migraciones”Tabla de Control
Section titled “Tabla de Control”CREATE TABLE central.schema_migrations ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL UNIQUE, applied_at TIMESTAMP DEFAULT NOW(), checksum VARCHAR(64), execution_time_ms INTEGER);Registrar Migration Manualmente
Section titled “Registrar Migration Manualmente”INSERT INTO central.schema_migrations (name, checksum, execution_time_ms)VALUES ('004_add_employee_fields.sql', 'sha256hash...', 1500);Migraciones Multi-Tenant - Script Template
Section titled “Migraciones Multi-Tenant - Script Template”#!/usr/bin/env python3import psycopg2import sys
MIGRATION_FILE = sys.argv[1]
conn = psycopg2.connect(dbname='nostromo', user='postgres')conn.autocommit = Truecur = conn.cursor()
# Get active tenantscur.execute("SELECT schema_name FROM central.tenants WHERE active = true")tenants = [row[0] for row in cur.fetchall()]
# Read migrationwith open(MIGRATION_FILE) as f: migration_sql = f.read()
# Apply to each tenantfor schema in tenants: print(f"Migrating {schema}...") try: cur.execute(f"SET search_path TO {schema}") cur.execute(migration_sql) print(f" ✓ {schema} OK") except Exception as e: print(f" ✗ {schema} FAILED: {e}") # Continue or abort based on your policy
cur.close()conn.close()Troubleshooting
Section titled “Troubleshooting”Error: “relation already exists”
Section titled “Error: “relation already exists””-- Verificar si ya existeSELECT EXISTS ( SELECT 1 FROM information_schema.columns WHERE table_name = 'employees' AND column_name = 'new_column');
-- Usar IF NOT EXISTSALTER TABLE employees ADD COLUMN IF NOT EXISTS new_column VARCHAR(50);Error: “lock wait timeout”
Section titled “Error: “lock wait timeout””-- Ver locks activosSELECT * FROM pg_locks WHERE NOT granted;
-- Cancelar query bloqueadoraSELECT pg_cancel_backend(pid);Migration parcialmente aplicada
Section titled “Migration parcialmente aplicada”-- Verificar estadoSELECT * FROM central.schema_migrations WHERE name LIKE '%004%';
-- Limpiar y reintentarBEGIN; DELETE FROM central.schema_migrations WHERE name = '004_...'; -- Rollback manual de cambios parcialesCOMMIT;Related Documentation
Section titled “Related Documentation”- Runbook: Backup - Backup antes de migrations
- Runbook: Recovery - Disaster recovery
- PostgreSQL Config - Configuración DB
- ADR-001: Multi-Tenant - Arquitectura de schemas
Changelog
Section titled “Changelog”| Fecha | Version | Cambios |
|---|---|---|
| 2026-01-18 | 1.0 | Runbook inicial creado |