Run & Operations
Make run observable and actionable: baselines, runbooks, ownership, and operational routines as reusable modules.
Platform Engineering isn’t only about “shipping”. At scale, the cost is often in operating: incidents, drift, unclear ownership, and tribal runbooks.
Argy’s goal is to make the run part productized:
- the platform team defines standards once,
- developers consume them by default,
- outcomes stay observable.
What “run automation” means in Argy
Observability baselines
Instead of asking every team to reinvent dashboards and alert rules, a golden path can bundle:
- a baseline dashboard set,
- actionable alerting rules,
- SLO targets and error budget signals.
This makes reliability consistent and reduces the onboarding time for new services.
Runbooks as part of the product
Runbooks are not a separate wiki page. They should be:
- linked to the service,
- versioned with the golden path,
- written for the common failures.
Ownership and escalation
Operational automation only works if ownership is explicit:
- who owns the product,
- who owns the platform part,
- how escalation works.
Operational routines
Examples of routines that can be standardized:
- incident response checklist,
- rollbacks and safe deploy patterns,
- postmortem templates,
- cost/FinOps baselines.
Why it matters
Run automation is not about removing humans. It’s about:
- reducing “support debt”,
- making systems easier to operate,
- making governance measurable.
Next steps
- Security and guardrails: Policies & Guardrails
- How modules package run baselines: Building Modules
- See outcomes-driven scenarios: Use cases