Alerting in production systems
Anyone who has been (or is going) through the wringer of a noisy pager rotation may enjoy this talk, delivered at intercom.io in Dublin on July 1st:
Pager Bound: high-signal alerting in production systems.
- Good alerting is something that needs to be designed: organic growth tends to not go so well;
- page, ticket or log: eliminate email alerts;
- if we must be paged out of bed, it should be for something that really needs human attention;
- we can only handle ~2 events well per shift;
- service-level objectives are a really useful way to orient our alerting to customer experience & business priorities;
- page on the symptom as it relates to our SLOs, not the cause.
Following Rob Ewaschuk’s philosophy on alerting.