Operational Playbooks

Public

Playbooks describe repeatable responses to common operational scenarios. Keep them short, sequential, and verifiable.

Examples

Provider degradation: failover to backup model tier; relax timeouts temporarily.
Cost spike: enable dynamic throttling; switch to smaller model for low-sensitivity surfaces.
Safety regression: freeze promotions; revert to last known good prompt version.

Approval

Changes to playbooks require sign-off from the service owner and incident commander group. Keep a changelog at the top of each document.

Note

Playbooks should be runnable by on-call engineers with limited context. Include commands, dashboards, and exact switch names.

Related docs