clay-incident-runbook

Clay Incident Runbook

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "clay-incident-runbook" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-clay-incident-runbook

Clay Incident Runbook

Overview

Rapid incident response procedures for Clay-related outages.

Prerequisites

  • Access to Clay dashboard and status page

  • kubectl access to production cluster

  • Prometheus/Grafana access

  • Communication channels (Slack, PagerDuty)

Severity Levels

Level Definition Response Time Examples

P1 Complete outage < 15 min Clay API unreachable

P2 Degraded service < 1 hour High latency, partial failures

P3 Minor impact < 4 hours Webhook delays, non-critical errors

P4 No user impact Next business day Monitoring gaps

Instructions

Step 1: Quick Triage

Check Clay status page, your integration health endpoint, error rate metrics, and recent pod logs.

Step 2: Follow Decision Tree

If Clay API returns errors and status.clay.com shows an incident, wait and enable fallback. If no Clay incident, check your credentials and config. If no API errors but your service is unhealthy, investigate infrastructure.

Step 3: Execute Immediate Actions

  • 401/403: Verify API key in secrets, update if rotated, restart pods

  • 429: Check rate limit headers, enable request queuing

  • 500/503: Enable graceful degradation, monitor Clay status

Step 4: Communicate Status

Post to internal Slack with severity, impact, current action, and next update time. Update external status page with user-facing impact description.

For complete triage scripts, remediation commands, communication templates, and postmortem template, load the reference guide: Read(${CLAUDE_SKILL_DIR}/references/implementation-guide.md)

Output

  • Issue identified and categorized

  • Remediation applied

  • Stakeholders notified

  • Evidence collected for postmortem

Error Handling

Issue Cause Solution

Can't reach status page Network issue Use mobile or VPN

kubectl fails Auth expired Re-authenticate

Metrics unavailable Prometheus down Check backup metrics

Secret rotation fails Permission denied Escalate to admin

Resources

  • Clay Status Page

  • Clay Support

Next Steps

For data handling, see clay-data-handling .

Examples

Basic usage: Apply clay incident runbook to a standard project setup with default configuration options.

Advanced scenario: Customize clay incident runbook for production environments with multiple constraints and team-specific requirements.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

mindmap-generator

No summary provided by upstream source.

Repository SourceNeeds Review