Total Services
12
across all environments
Healthy
9
operating normally
Warnings
2
needs attention
Critical
1
immediate action required
worker-queue — Active Incident
Memory leak detected on worker-queue — process restart scheduled for off-peak (02:00 UTC)
1h ago
staging-db — Active Incident
staging-db disk at 88.7% — scheduled cleanup of old WAL files during next maintenance window
1h ago
🖥
Service Health
12 services
| Service | Status | CPU / Memory | Disk | Response | SSL | Env |
|---|---|---|---|---|---|---|
|
api-gateway-01
api-gw-01.proviso.internal
|
healthy | 89ms |
47d
✓ valid
|
production | ||
|
api-gateway-02
api-gw-02.proviso.internal
|
healthy | 94ms |
47d
✓ valid
|
production | ||
|
cdn-edge-us
cdn-us.proviso.internal
|
healthy | 18ms |
120d
✓ valid
|
production | ||
|
monitoring-agent
monitor.proviso.internal
|
healthy | 22ms |
180d
✓ valid
|
production | ||
|
postgres-primary
db-primary.proviso.internal
|
healthy | 120ms |
365d
✓ valid
|
production | ||
|
postgres-replica
db-replica.proviso.internal
|
healthy | 115ms |
365d
✓ valid
|
production | ||
|
redis-cache
redis-01.proviso.internal
|
healthy | 12ms |
365d
✓ valid
|
production | ||
|
web-server-01
web-01.proviso.internal
|
healthy | 45ms |
31d
✓ valid
|
production | ||
|
web-server-02
web-02.proviso.internal
|
warning | 340ms |
31d
✓ valid
|
production | ||
|
worker-queue
worker-01.proviso.internal
|
critical | 520ms |
14d
✓ valid
|
production | ||
|
staging-db
staging-db.proviso.internal
|
warning | 280ms |
60d
✓ valid
|
staging | ||
|
staging-web
staging-web.proviso.internal
|
healthy | 110ms |
60d
✓ valid
|
staging |
🤖
Agent Actions
2 open
staging-db disk at 88.7% — scheduled cleanup of old WAL files during next maintenance window
Memory leak detected on worker-queue — process restart scheduled for off-peak (02:00 UTC)
Rolled back staging-db migration v0.9.4 — column type mismatch caused 100% error rate
Restarted nginx after 502 spike (error rate 12% → 0.1% in 40s)
Cleared /tmp after disk usage reached 92% — freed 8.4 GB
Auto-renewed SSL certificate for web-01.proviso.internal (was 7 days to expiry)
Provisioned 2 additional web-02 instances — p95 latency was 1.8s during traffic spike
Increased postgres-primary connection pool from 50 → 100 (connection wait >200ms detected)
Rightsized redis-cache instance from r6g.xlarge → r6g.large — saving $48/mo (avg utilization 28%)
Restarted api-gateway-01 worker pool — 3 zombie processes consuming 31% CPU
Rebuilt 2 bloated postgres indexes — query time for /api/search improved 340ms → 41ms
Daily postgres-primary backup verified — 4.2 GB, restore tested successfully
Blocked 1,847 requests from 3 IPs showing brute-force pattern on /api/auth
Updated nginx rate limits on web-01 — max_conns increased to handle traffic growth
All 12 services passed health checks — 0 incidents in last 24h