Playbooks describe repeatable responses to common operational scenarios. Keep them short, sequential, and verifiable.
Examples
- Provider degradation: failover to backup model tier; relax timeouts temporarily.
- Cost spike: enable dynamic throttling; switch to smaller model for low-sensitivity surfaces.
- Safety regression: freeze promotions; revert to last known good prompt version.
Approval
Changes to playbooks require sign-off from the service owner and incident commander group. Keep a changelog at the top of each document.
Note
Playbooks should be runnable by on-call engineers with limited context. Include commands, dashboards, and exact switch names.