DevOps

Automate deployments, manage infrastructure, and build reliable CI/CD pipelines.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "DevOps" with this command: npx skills add ivangdavila/devops

DevOps Rules

CI/CD Pipelines

  • Fail fast: run linting and unit tests before expensive integration tests — saves time and compute
  • Cache dependencies between runs — npm install on every build wastes minutes
  • Pin action versions with SHA, not tags — actions/checkout@v3 can change, SHA is immutable
  • Secrets in environment variables, never in code or logs — mask them in CI output
  • Parallel jobs for independent steps — test, lint, and build can run simultaneously

Deployment Strategies

  • Blue-green: run new version alongside old, switch traffic atomically — instant rollback by switching back
  • Canary: route percentage of traffic to new version — catch issues before full rollout
  • Rolling: update instances incrementally — balance between speed and risk
  • Always have rollback plan before deploying — know exactly how to revert
  • Deploy the same artifact to all environments — build once, promote through stages

Infrastructure as Code

  • Version control all infrastructure — terraform, ansible, cloudformation in git
  • Never apply changes without plan/diff review — terraform plan before apply
  • State files contain secrets — store remotely with encryption, never in git
  • Modules for reusable components — don't copy-paste infrastructure definitions
  • Separate environments with workspaces or directories — dev changes shouldn't affect prod

Containers

  • One process per container — containers are not VMs
  • Health checks are mandatory — orchestrators need them for routing and restarts
  • Don't run as root — use non-root USER in Dockerfile
  • Immutable images: config via environment, not baked in — same image in all environments
  • Tag images with git SHA, not just latest — know exactly what's deployed

Secrets Management

  • Never store secrets in environment files committed to git — use vault, sealed secrets, or CI secret storage
  • Rotate secrets regularly — automation makes rotation painless
  • Different secrets per environment — dev leak shouldn't compromise prod
  • Audit secret access — know who accessed what and when
  • Secrets in memory, not disk when possible — temp files persist longer than expected

Monitoring & Alerting

  • Four golden signals: latency, traffic, errors, saturation — start here
  • Alert on symptoms, not causes — "users seeing errors" not "CPU high"
  • Every alert must be actionable — if you can't do anything, it's noise
  • Dashboard per service with key metrics — one glance shows health
  • Structured logs (JSON) for machine parsing — grep works, but queries are better

Reliability

  • Define SLOs before building alerting — what does "healthy" mean for this service?
  • Error budgets: some failures are acceptable — 99.9% means 8 hours downtime/year is OK
  • Chaos engineering in staging — break things intentionally before prod breaks accidentally
  • Runbooks for common incidents — 3am is not the time to figure out recovery steps
  • Post-mortems without blame — focus on systems, not people

Common Mistakes

  • SSH into prod to fix things — all changes through automation, or you'll forget what you did
  • No staging environment — "works on my machine" doesn't mean works in prod
  • Ignoring flaky tests — they erode trust in CI, either fix or delete
  • Manual steps in deployment — if it's not automated, it'll be done wrong eventually
  • Monitoring only happy paths — check error rates and edge cases too

Networking

  • Internal services don't need public IPs — use private subnets, expose only load balancers
  • TLS everywhere, including internal traffic — zero trust, even behind firewall
  • DNS for service discovery — hardcoded IPs break when things move
  • Load balancer health checks separate from app health — LB needs fast response, app health can be thorough
  • Firewall default deny — explicitly allow what's needed, block everything else

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Power Automate Monitoring

**Pro+ subscription required.** Tenant-wide Power Automate flow health monitoring, failure rate analytics, and asset inventory using the FlowStudio MCP cache...

Registry SourceRecently Updated
General

Power Automate Governance

Govern Power Automate flows and Power Apps at scale using the FlowStudio MCP cached store. Classify flows by business impact, detect orphaned resources, audi...

Registry SourceRecently Updated
General

Secretary Memory

OpenClaw 秘书式多分区记忆系统 v3.0。仿生现代秘书的笔记本分类法,支持:(1) 多分区并发搜索 + 每分区3条上下文召回,(2) 会话自动摘要,(3) 偏好自动提取 + 用户关系图谱,(4) 记忆冲突主动检测,(5) 定时 consolidation + 会话结束 hook,(6) 精细化恢复/回溯,...

Registry SourceRecently Updated
General

运维助手 v2.0

运维助手 v2.0 - 支持本地、远程、多服务器集群监控 (健康检查、日志分析、性能监控、批量操作、文件传输)

Registry SourceRecently Updated