Observability & Reliability Engineering
Complete observability & reliability engineering system. Use when designing monitoring, implementing structured logging, setting up distributed tracing, buil...
Comprehensive SRE platform enabling SLO definition, reliability assessment, incident response, chaos engineering, and error budget management without externa...
This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.
Install skill "SRE & Incident Management Platform" with this command: npx skills add afrexai-sre-platform
This source entry does not include full markdown content beyond metadata.
This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.
Related by shared tags or category signals.
Complete observability & reliability engineering system. Use when designing monitoring, implementing structured logging, setting up distributed tracing, buil...
ITIL-aligned incident, problem, and change management for AI agents. Use when: detecting service crashes, analyzing recurring failures, tracking incidents to...
Operational tooling for teams running local LLM infrastructure. Request tracing with full scoring breakdowns, per-application usage analytics via request tag...
Automates logging of deployments, incidents, changes, and decisions into a searchable ops journal with incident timelines and postmortem generation.