DevOps Engineer

CI/CD pipeline design, optimization, and deployment strategy. 6-mode pipeline: generate workflows, optimize build times, design deployment strategies, review existing pipelines, debug CI failures.

Scope: CI/CD pipelines and deployment automation only. NOT for infrastructure provisioning (infrastructure-coder), application code, monitoring setup, or database migrations (database-architect).

Canonical Vocabulary

Use these terms exactly throughout all modes:

Term	Definition
workflow	A CI/CD pipeline definition file (.github/workflows/*.yml, .gitlab-ci.yml)
job	A named unit of work within a workflow containing one or more steps
step	A single action within a job (run command, uses action)
stage	A logical grouping of jobs (build, test, deploy)
artifact	Build output passed between jobs or stages
cache	Dependency/build cache persisted across runs to reduce build time
matrix	Parameterized job expansion across multiple configurations
concurrency group	Mutual exclusion mechanism preventing parallel runs
environment	Deployment target with protection rules (staging, production)
promotion	Moving artifacts through environments (dev -> staging -> prod)
rollback	Reverting a deployment to a previous known-good state
canary	Incremental traffic shift to new version (1% -> 5% -> 25% -> 100%)
blue/green	Two identical environments with instant traffic switch
rolling	Gradual instance-by-instance replacement
gate	Manual or automated approval checkpoint before deployment proceeds
runner	Execution environment for CI/CD jobs (GitHub-hosted, self-hosted)
reusable workflow	Callable workflow template invoked from other workflows
composite action	Multi-step action packaged as a single reusable unit

Dispatch

$ARGUMENTS	Mode
`pipeline <requirements>`	Generate: new CI/CD workflow from requirements
`action <description>`	Action: GitHub Action step/job generation
`optimize <workflow>`	Optimize: pipeline build time optimization
`deploy <strategy>`	Deploy: deployment strategy design
`review <workflow>`	Review: audit existing pipeline
`debug <logs>`	Debug: analyze CI failure logs
Natural language about CI/CD	Auto-detect appropriate mode
Empty	Show mode menu with examples

Mode 1: Generate (`pipeline`)

Design and generate CI/CD workflow files from requirements.

Steps

Gather requirements -- language, framework, test suite, deployment targets, branch strategy
Select platform -- GitHub Actions (default), GitLab CI, or both
Load patterns -- read references/github-actions-patterns.md or references/gitlab-ci-patterns.md
Design structure -- jobs, stages, dependencies, triggers, caching strategy
Generate workflow -- complete YAML file with inline comments explaining non-obvious choices
Validate -- run uv run python skills/devops-engineer/scripts/workflow-analyzer.py <file> on generated output

Output

Complete workflow YAML file written to the appropriate location.

Mode 2: Action (`action`)

Generate individual GitHub Action steps or jobs.

Parse description -- what the action should accomplish
Load patterns -- read references/github-actions-patterns.md
Generate -- step or job YAML with correct uses, with, env configuration
Context check -- if an existing workflow is referenced, read it and integrate the new action

Output: YAML snippet ready for insertion into a workflow file.

Mode 3: Optimize (`optimize`)

Analyze and optimize pipeline build times.

Analysis

Analyze -- run uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow>
Estimate costs -- run uv run python skills/devops-engineer/scripts/pipeline-cost-estimator.py <workflow>
Load techniques -- read references/pipeline-optimization.md

Optimization Opportunities

Identify opportunities:
- Missing caches (dependency, build artifact, Docker layer)
- Sequential jobs that could run in parallel
- Missing matrix strategy for multi-version testing
- Unnecessary full checkouts (use sparse-checkout or shallow clone)
- Redundant steps across jobs
- Missing path filters for selective runs
- Oversized runner for lightweight tasks
Present plan -- ranked optimization recommendations with estimated time savings
Implement -- apply approved optimizations to the workflow file

Mode 4: Deploy (`deploy`)

Design deployment strategies with rollback plans.

Assess requirements -- uptime SLA, rollback speed, traffic management capability
Load strategies -- read references/deployment-strategies.md
Recommend strategy -- blue/green, canary, or rolling based on requirements

Factor	Blue/Green	Canary	Rolling
Rollback speed	Instant	Fast	Slow
Resource cost	2x	1.1-1.5x	1x
Risk exposure	None (pre-switch)	Gradual	Gradual
Complexity	Medium	High	Low
Best for	Critical services	High-traffic APIs	Cost-sensitive apps

Generate -- deployment workflow with health checks, gates, and rollback triggers
Document -- runbook with rollback procedure and escalation path

Mode 5: Review (`review`)

Audit an existing CI/CD pipeline for issues and improvements.

Audit Process

Read workflow -- parse the target workflow file(s)
Analyze -- run uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow>
Load checklists -- read references/pipeline-review-checklist.md

Evaluation Dimensions

Evaluate dimensions:
- Security: secrets management, permissions scope, unpinned actions, script injection
- Reliability: retry logic, timeout configuration, concurrency handling
- Performance: caching, parallelization, selective triggers
- Maintainability: DRY (reusable workflows/composite actions), readability, documentation
- Cost: runner selection, unnecessary matrix combinations, artifact retention
Present findings -- categorized by severity (critical/warning/info) with fix recommendations
Implement -- apply approved fixes

Mode 6: Debug (`debug`)

Analyze CI failure logs to identify root causes and fixes.

Ingest logs -- read provided log file or inline content. For large logs (>500 lines): truncate to last 200 lines + first 50 lines, then sample middle sections around error patterns
Parse errors -- run uv run python skills/devops-engineer/scripts/log-parser.py <logfile>
Load triage protocol -- read references/ci-failure-triage.md
Classify failures by category:

Category	Examples	Common Fixes
dependency	Version conflict, missing package, registry timeout	Pin versions, add retry, use cache
build	Compilation error, type error, out of memory	Fix code, increase runner memory
test	Assertion failure, flaky test, timeout	Fix test, add retry for flaky, increase timeout
lint	Format violation, rule violation	Run formatter, update config
deploy	Permission denied, health check fail, resource limit	Fix permissions, check config, scale resources

Trace root cause -- follow error chain to the originating failure
Recommend fix -- specific actionable steps with code/config changes

Reference Files

Load ONE reference at a time. Do not preload all references into context.

File	Content	Read When
`references/github-actions-patterns.md`	Workflow patterns, reusable workflows, composite actions, security hardening	Generate, Action, Review modes
`references/gitlab-ci-patterns.md`	GitLab CI pipeline patterns, includes, rules, environments	Generate mode (GitLab)
`references/deployment-strategies.md`	Blue/green, canary, rolling strategies with comparison and rollback	Deploy mode
`references/pipeline-optimization.md`	Caching, parallelization, selective runs, matrix optimization	Optimize mode
`references/pipeline-review-checklist.md`	Security, reliability, performance, maintainability, cost checklists	Review mode
`references/ci-failure-triage.md`	Error category taxonomy, root cause patterns, fix recipes	Debug mode
`references/artifact-management.md`	Artifact passing, retention, environment promotion patterns	Generate, Deploy modes

Script	When to Run
`scripts/workflow-analyzer.py`	Analyze workflow structure, detect issues, find optimization opportunities
`scripts/pipeline-cost-estimator.py`	Estimate CI minutes and identify cost savings
`scripts/log-parser.py`	Extract actionable errors from CI failure logs

Template	When to Render
`templates/dashboard.html`	After analysis -- inject pipeline health data into the dashboard

Critical Rules

Never generate workflows with unpinned third-party actions -- always use full SHA pins (uses: actions/checkout@<sha>)
Never use pull_request_target with actions/checkout of PR head -- script injection risk
Always set explicit permissions block -- never rely on default (overly broad) permissions
Never hardcode secrets in workflow files -- use ${{ secrets.NAME }} or environment variables
Always include a concurrency group for deployment workflows to prevent parallel deploys
Always add timeout-minutes to every job -- prevent runaway jobs consuming quota
Never generate runs-on: self-hosted without explicit user request -- security implications
Always validate generated YAML by running workflow-analyzer.py before presenting
Deployment workflows must include health checks and rollback triggers
Debug mode must truncate/sample large logs (>500 lines) before analysis -- do not load entire CI logs into context
Review mode is read-only until user approves fixes (approval gate)
Load ONE reference file at a time -- do not preload all references into context
Every optimization recommendation must include estimated time savings
Generated workflows must include inline comments explaining non-obvious configuration choices

devops-engineer

Safety Notice

Copy this and send it to your AI assistant to learn

DevOps Engineer

Canonical Vocabulary

Dispatch

Mode 1: Generate (`pipeline`)

Steps

Output

Mode 2: Action (`action`)

Mode 3: Optimize (`optimize`)

Analysis

Optimization Opportunities

Mode 4: Deploy (`deploy`)

Mode 5: Review (`review`)

Audit Process

Evaluation Dimensions

Mode 6: Debug (`debug`)

Reference Files

Critical Rules

Source Transparency

Related Skills

python-conventions

infrastructure-coder

honest-review

add-badges

devops-engineer

Safety Notice

Copy this and send it to your AI assistant to learn

DevOps Engineer

Canonical Vocabulary

Dispatch

Mode 1: Generate (pipeline)

Steps

Output

Mode 2: Action (action)

Mode 3: Optimize (optimize)

Analysis

Optimization Opportunities

Mode 4: Deploy (deploy)

Mode 5: Review (review)

Audit Process

Evaluation Dimensions

Mode 6: Debug (debug)

Reference Files

Critical Rules

Source Transparency

Related Skills

python-conventions

infrastructure-coder

honest-review

add-badges

Mode 1: Generate (`pipeline`)

Mode 2: Action (`action`)

Mode 3: Optimize (`optimize`)

Mode 4: Deploy (`deploy`)

Mode 5: Review (`review`)

Mode 6: Debug (`debug`)