Outage 2026-02-10
Summary
Primary database failed during peak load. Failover lagged due to replication delay.
Timeline
- 02:10 UTC: latency spikes
- 02:14 UTC: failover initiated
- 02:21 UTC: traffic restored
Root Cause
Replication lag due to oversized batch writes.
Actions
- Cap batch size
- Increase replica throughput