Troubleshooting Guide
Overview
Section titled “Overview”Guía de diagnóstico y resolución de problemas comunes en Orchestrator, Sevastopol y PostgreSQL.
🔴 Orchestrator No Responde
Section titled “🔴 Orchestrator No Responde”Síntomas
Section titled “Síntomas”curl http://localhost:8000/healthno responde- Frontend muestra “Network Error”
- Logs muestran timeouts
Diagnóstico
Section titled “Diagnóstico”# 1. Verificar estado del procesopm2 status orchestrator
# 2. Ver logs de errorpm2 logs orchestrator --err --lines 50
# 3. Verificar puertolsof -i :8000Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Proceso muerto | pm2 restart orchestrator |
| Puerto en uso | kill -9 $(lsof -t -i:8000) y reiniciar |
| OOM (Out of Memory) | Aumentar RAM o reducir max_connections |
| DB connection failed | Ver sección “PostgreSQL No Conecta” |
🔴 PostgreSQL No Conecta
Section titled “🔴 PostgreSQL No Conecta”Síntomas
Section titled “Síntomas”Error: connect ECONNREFUSED 127.0.0.1:5432FATAL: connection refused
Diagnóstico
Section titled “Diagnóstico”# 1. Verificar servicio PostgreSQLsudo systemctl status postgresql
# 2. Verificar si está escuchandopg_isready
# 3. Verificar logssudo tail -50 /var/log/postgresql/postgresql-16-main.logSoluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Servicio caído | sudo systemctl start postgresql |
| Max connections | Aumentar max_connections en postgresql.conf |
| pg_hba.conf | Verificar que permite conexiones locales |
| Disco lleno | Liberar espacio en disco |
🟡 Queries Lentas
Section titled “🟡 Queries Lentas”Síntomas
Section titled “Síntomas”- Timeouts en API (> 30s)
- Dashboard carga lento
- CPU PostgreSQL alto
Diagnóstico
Section titled “Diagnóstico”-- Queries actualmente corriendoSELECT pid, now() - pg_stat_activity.query_start AS duration, queryFROM pg_stat_activityWHERE state = 'active'ORDER BY duration DESC;
-- Top queries lentas (requiere pg_stat_statements)SELECT query, calls, mean_exec_time::numeric(10,2) as mean_msFROM pg_stat_statementsORDER BY mean_exec_time DESCLIMIT 10;Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Missing index | Crear índice en columnas filtradas |
| Table bloat | VACUUM ANALYZE <table> |
| Full table scan | Revisar EXPLAIN ANALYZE |
| Lock contention | Ver sección “Deadlocks” |
Ejemplo - Agregar índice faltante:
-- Si query filtra por employee_id frecuentementeCREATE INDEX CONCURRENTLY idx_contracts_employeeON contracts(employee_id);🟡 Error de Autenticación
Section titled “🟡 Error de Autenticación”Síntomas
Section titled “Síntomas”- “Invalid credentials” en login
- “Token expired”
- 401 Unauthorized
Diagnóstico
Section titled “Diagnóstico”# Verificar JWT_SECRET configuradoecho $JWT_SECRET | wc -c# Debe ser > 32 caracteres
# Verificar que cookies se envíancurl -v --cookie-jar - http://localhost:8000/api/command/auth/login \ -H "Content-Type: application/json" \Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| JWT_SECRET vacío | Configurar en .env.production |
| Token expirado | Relogin (JWT expira en 24h) |
| Cookie no persiste | Verificar SameSite y Secure flags |
| Usuario desactivado | UPDATE users SET active=true WHERE email='...' |
🟡 Error de Tenant
Section titled “🟡 Error de Tenant”Síntomas
Section titled “Síntomas”- “Tenant not found”
- “Schema does not exist”
- Datos mezclados entre clientes
Diagnóstico
Section titled “Diagnóstico”-- Verificar que tenant existeSELECT * FROM central.tenants WHERE id = '<TENANT_ID>';
-- Verificar que schema existeSELECT nspname FROM pg_namespace WHERE nspname = 'tenant_<ID>';
-- Verificar search_path actual en sesiónSHOW search_path;Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Tenant no existe | Crear tenant via admin |
| Schema borrado | Restore from backup |
| search_path incorrecto | Verificar middleware de tenant |
| Header X-Tenant-ID faltante | Verificar frontend envía header |
🟡 Build de Sevastopol Falla
Section titled “🟡 Build de Sevastopol Falla”Síntomas
Section titled “Síntomas”npm run buildfalla- Errores de TypeScript
- MDX parsing errors
Diagnóstico
Section titled “Diagnóstico”# Limpiar cacherm -rf node_modules/.viterm -rf dist
# Reinstalar dependenciasrm -rf node_modulesnpm ci
# Build con verbosenpm run build -- --verboseSoluciones
Section titled “Soluciones”| Error | Solución |
|---|---|
| MDX parsing error | Escapar caracteres especiales (<, >, {, }) |
| TypeScript error | Corregir tipos o usar // @ts-ignore temporal |
| Memory heap | NODE_OPTIONS="--max-old-space-size=4096" npm run build |
| Dependency conflict | Borrar lockfile y regenerar |
🟡 Errores de CORS
Section titled “🟡 Errores de CORS”Síntomas
Section titled “Síntomas”- “Access-Control-Allow-Origin” error en browser
- Requests bloqueados en frontend
- Preflight OPTIONS fallando
Diagnóstico
Section titled “Diagnóstico”# Verificar headers de respuestacurl -v -X OPTIONS http://localhost:8000/api/query/employees \ -H "Origin: http://localhost:4321" \ -H "Access-Control-Request-Method: GET"Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Origin no permitido | Agregar origin a lista de CORS |
| Credentials no permitidas | Access-Control-Allow-Credentials: true |
| Headers no permitidos | Agregar headers a Allow-Headers |
| Nginx proxy | Verificar config de proxy_set_header |
Config ejemplo (Orchestrator):
app.use( cors({ origin: ["http://localhost:4321", "https://app.nostromo.cl"], credentials: true, allowedHeaders: ["Content-Type", "Authorization", "X-Tenant-ID"], }),);🟡 Memory Leak
Section titled “🟡 Memory Leak”Síntomas
Section titled “Síntomas”- RAM de proceso crece constantemente
pm2 monitmuestra memoria subiendo- OOM kills en logs del sistema
Diagnóstico
Section titled “Diagnóstico”# Monitorear memoria en tiempo realpm2 monit
# Ver memoria del procesops aux | grep node
# Generar heap dump (Node.js)kill -USR2 <PID>Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Event listeners sin cleanup | Revisar removeEventListener |
| DB connections no cerradas | Verificar pool.release() |
| Cache sin límite | Agregar LRU cache con max size |
| Circular references | Usar WeakMap/WeakRef |
Configuración de restart automático:
# PM2 - restart cuando excede 500MBpm2 start dist/server.js --name orchestrator --max-memory-restart 500M🟠 Deadlocks en PostgreSQL
Section titled “🟠 Deadlocks en PostgreSQL”Síntomas
Section titled “Síntomas”- “deadlock detected” en logs
- Transactions colgadas
- Queries que nunca terminan
Diagnóstico
Section titled “Diagnóstico”-- Ver locks actualesSELECT blocked.pid as blocked_pid, blocked.query as blocked_query, blocking.pid as blocking_pid, blocking.query as blocking_queryFROM pg_stat_activity blockedJOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pidJOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype AND blocked_locks.relation = blocking_locks.relation AND blocked_locks.pid != blocking_locks.pidJOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pidWHERE NOT blocked_locks.granted;Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Transaction larga | Reducir tamaño de transactions |
| Lock order inconsistente | Ordenar acceso a tablas consistentemente |
| Lock timeout bajo | SET lock_timeout = '10s' |
| Query bloqueando | SELECT pg_cancel_backend(<PID>) |
🟠 Disco Lleno
Section titled “🟠 Disco Lleno”Síntomas
Section titled “Síntomas”- “No space left on device”
- PostgreSQL read-only mode
- Logs no se escriben
Diagnóstico
Section titled “Diagnóstico”# Ver uso de discodf -h
# Encontrar archivos grandesdu -sh /* | sort -rh | head -20
# Ver logs grandesdu -sh /var/log/*Soluciones
Section titled “Soluciones”| Causa | Solución |
|---|---|
| Logs grandes | Rotar logs: logrotate -f /etc/logrotate.conf |
| WAL acumulado | pg_switch_wal() + verificar replicación |
| Backups locales | Mover a storage remoto |
| Core dumps | Borrar /var/crash/* |
Herramientas de Diagnóstico
Section titled “Herramientas de Diagnóstico”# Orchestratorpm2 logs orchestrator --lines 100
# PostgreSQLsudo tail -f /var/log/postgresql/postgresql-16-main.log
# Nginxsudo tail -f /var/log/nginx/error.log
# Systemsudo journalctl -u orchestrator -fMétricas
Section titled “Métricas”# CPU y memoriahtop
# Disco I/Oiotop
# Networknethogs
# PostgreSQL statspsql -c "SELECT * FROM pg_stat_activity;"Contactos de Escalación
Section titled “Contactos de Escalación”| Nivel | Problema | Contacto |
|---|---|---|
| L1 | Servicio caído | On-call engineer |
| L2 | Data corruption | DBA team |
| L3 | Security breach | Security team + management |
Related Documentation
Section titled “Related Documentation”- Runbook: Recovery - Disaster recovery procedures
- Runbook: Backup - Backup y restore
- Infrastructure: PostgreSQL - Config de DB
- Infrastructure: Networking - Network config
Changelog
Section titled “Changelog”| Fecha | Version | Cambios |
|---|---|---|
| 2026-01-18 | 1.0 | Guía inicial creada |