DevOps & CI/CD

Complete framework for building automated pipelines that let you ship faster with confidence.

When to Use

Setting up CI/CD for new projects
Improving existing deployment pipelines
Dockerizing applications
Implementing infrastructure as code
Choosing deployment strategies
Setting up monitoring and alerts

Core DevOps Principles

Automate Everything:

If you do it twice, script it
If you script it twice, make it a pipeline
Reduce manual intervention

Shift Left:

Test early
Security early
Quality early
Catch issues before production

Infrastructure as Code:

Version control your infrastructure
Review changes like code
Reproducible environments

Workflow

Step 1: CI/CD Pipeline Design

Pipeline Stages:

COMMIT → BUILD → TEST → SECURITY → DEPLOY → MONITOR

COMMIT ├── Code pushed to GitHub ├── Trigger pipeline └── Validate commit message
BUILD ├── Install dependencies ├── Compile/bundle ├── Build Docker image └── Cache for speed
TEST ├── Lint code ├── Type check ├── Unit tests ├── Integration tests └── Coverage check
SECURITY ├── SAST (static analysis) ├── Dependency scan ├── Secret detection └── Container scan
DEPLOY ├── Deploy to environment ├── Run migrations ├── Health checks └── Smoke tests
MONITOR ├── Performance metrics ├── Error rates ├── Alerting └── Rollback if needed

Step 2: GitHub Actions CI/CD

Complete Pipeline Example:

name: CI/CD

on: push: branches: [main, develop] pull_request: branches: [main]

jobs:

Build and test

build: runs-on: ubuntu-latest

steps:
  - uses: actions/checkout@v4

  - name: Setup Node.js
    uses: actions/setup-node@v4
    with:
      node-version: '20'
      cache: 'npm'

  - name: Install dependencies
    run: npm ci

  - name: Lint
    run: npm run lint

  - name: Type check
    run: npm run typecheck

  - name: Unit tests
    run: npm test -- --coverage

  - name: Build
    run: npm run build

  - name: Upload coverage
    uses: codecov/codecov-action@v3
    with:
      token: ${{ secrets.CODECOV_TOKEN }}

Security scanning

security: runs-on: ubuntu-latest

steps:
  - uses: actions/checkout@v4

  - name: Run Snyk to check for vulnerabilities
    uses: snyk/actions/node@master
    env:
      SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

  - name: Check for secrets
    uses: trufflesecurity/trufflehog@main
    with:
      path: ./

Build and push Docker image

docker: needs: [build, security] runs-on: ubuntu-latest if: github.event_name == 'push' && github.ref == 'refs/heads/main'

steps:
  - uses: actions/checkout@v4

  - name: Set up Docker Buildx
    uses: docker/setup-buildx-action@v3

  - name: Login to Docker Hub
    uses: docker/login-action@v3
    with:
      username: ${{ secrets.DOCKER_USERNAME }}
      password: ${{ secrets.DOCKER_PASSWORD }}

  - name: Build and push
    uses: docker/build-push-action@v5
    with:
      context: .
      push: true
      tags: myapp/api:${{ github.sha }},myapp/api:latest
      cache-from: type=registry,ref=myapp/api:latest
      cache-to: type=inline

Deploy to staging

deploy-staging: needs: [docker] runs-on: ubuntu-latest environment: staging

steps:
  - name: Deploy to staging
    run: |
      # SSH into staging server and deploy
      ssh ${{ secrets.STAGING_USER }}@${{ secrets.STAGING_HOST }} &#x3C;&#x3C; 'EOF'
        docker pull myapp/api:${{ github.sha }}
        docker-compose up -d
      EOF

  - name: Run E2E tests
    run: npm run test:e2e
    env:
      BASE_URL: https://staging.myapp.com

  - name: Smoke tests
    run: |
      curl -f https://staging.myapp.com/health || exit 1

Deploy to production (manual approval required)

deploy-production: needs: [deploy-staging] runs-on: ubuntu-latest environment: production

steps:
  - name: Deploy to production
    run: |
      ssh ${{ secrets.PROD_USER }}@${{ secrets.PROD_HOST }} &#x3C;&#x3C; 'EOF'
        docker pull myapp/api:${{ github.sha }}
        docker-compose up -d --no-deps api
      EOF

  - name: Health check
    run: |
      curl -f https://myapp.com/health || exit 1

  - name: Notify Slack
    uses: slackapi/slack-github-action@v1
    with:
      webhook-url: ${{ secrets.SLACK_WEBHOOK }}
      payload: |
        {
          "text": "Deployed ${{ github.sha }} to production ✅"
        }

Step 3: Docker Best Practices

Multi-Stage Dockerfile:

Stage 1: Build

FROM node:20-alpine AS builder

WORKDIR /app

Copy package files first (cache optimization)

COPY package*.json ./

Install ALL dependencies (including devDependencies for build)

RUN npm ci

Copy source code

COPY . .

Build application

RUN npm run build

Stage 2: Production

FROM node:20-alpine

WORKDIR /app

Install only production dependencies

COPY package*.json ./ RUN npm ci --only=production

Copy built files from builder stage

COPY --from=builder /app/dist ./dist

Create non-root user

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

Change ownership

RUN chown -R appuser:appgroup /app

Switch to non-root user

USER appuser

EXPOSE 3000

Health check

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s
CMD node healthcheck.js || exit 1

CMD ["node", "dist/index.js"]

docker-compose.yml:

version: '3.8'

services: api: image: myapp/api:latest restart: unless-stopped ports: - "3000:3000" environment: - NODE_ENV=production - DATABASE_URL=${DATABASE_URL} - REDIS_URL=${REDIS_URL} depends_on: - db - redis healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 3s retries: 3

db: image: postgres:15-alpine restart: unless-stopped volumes: - postgres_data:/var/lib/postgresql/data environment: - POSTGRES_PASSWORD=${DB_PASSWORD} - POSTGRES_DB=myapp

redis: image: redis:7-alpine restart: unless-stopped volumes: - redis_data:/data

volumes: postgres_data: redis_data:

Step 4: Deployment Strategies

Blue-Green Deployment:

CONCEPT: ├── Two identical environments (Blue/Green) ├── One serves traffic, one is idle ├── Deploy to idle, then switch traffic └── Instant rollback by switching back

FLOW:

Blue is live, Green is idle
Deploy new version to Green
Test Green thoroughly
Switch load balancer to Green
Green is now live (Blue becomes rollback target)

Kubernetes Blue-Green:

green-deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: api-green spec: replicas: 3 selector: matchLabels: app: api version: green template: metadata: labels: app: api version: green spec: containers: - name: api image: myapp/api:v2.0.0

Switch traffic by updating service selector

apiVersion: v1 kind: Service metadata: name: api spec: selector: app: api version: green # Change from "blue" to "green" ports: - port: 80 targetPort: 3000

Canary Deployment:

CONCEPT: ├── Release to small subset first (5% traffic) ├── Monitor for issues ├── Gradually increase traffic (25%, 50%, 100%) └── Rollback if problems detected

KUBERNETES EXAMPLE:

95% traffic to stable, 5% to canary

apiVersion: v1 kind: Service metadata: name: api-stable spec: selector: app: api version: stable ports: - port: 80

apiVersion: v1 kind: Service metadata: name: api-canary spec: selector: app: api version: canary ports: - port: 80

Ingress splits traffic

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: api annotations: nginx.ingress.kubernetes.io/canary: "true" nginx.ingress.kubernetes.io/canary-weight: "5" # 5% to canary spec: rules: - host: api.myapp.com http: paths: - path: / pathType: Prefix backend: service: name: api-canary port: number: 80

Rolling Deployment:

Kubernetes rolling update

apiVersion: apps/v1 kind: Deployment metadata: name: api spec: replicas: 10 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 # Max pods down during update maxSurge: 1 # Max extra pods during update template: spec: containers: - name: api image: myapp/api:v2.0.0

Step 5: Infrastructure as Code

Terraform Example:

main.tf

terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } }

provider "aws" { region = var.region }

VPC

resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true

tags = { Name = "${var.project}-vpc" Environment = var.environment } }

ECS Cluster

resource "aws_ecs_cluster" "main" { name = "${var.project}-${var.environment}"

setting { name = "containerInsights" value = "enabled" } }

RDS Database

resource "aws_db_instance" "main" { identifier = "${var.project}-${var.environment}" engine = "postgres" engine_version = "15.4" instance_class = var.db_instance_class allocated_storage = 20 storage_encrypted = true skip_final_snapshot = var.environment != "production"

db_name = var.db_name username = var.db_username password = var.db_password

vpc_security_group_ids = [aws_security_group.db.id] db_subnet_group_name = aws_db_subnet_group.main.name

backup_retention_period = var.environment == "production" ? 7 : 0

tags = { Name = "${var.project}-db" Environment = var.environment } }

Step 6: Monitoring & Health Checks

Health Endpoints:

// health.js app.get('/health', (req, res) => { res.json({ status: 'healthy', timestamp: new Date().toISOString(), version: process.env.VERSION || 'unknown', uptime: process.uptime() }); });

// Readiness check (can serve traffic) app.get('/ready', async (req, res) => { try { // Check database await db.query('SELECT 1');

// Check cache
await redis.ping();

// Check critical dependencies
// await externalAPI.healthCheck();

res.json({
  status: 'ready',
  checks: {
    database: 'ok',
    redis: 'ok'
  }
});

} catch (error) { res.status(503).json({ status: 'not ready', error: error.message }); } });

Key Metrics (Golden Signals):

LATENCY: How long requests take TRAFFIC: Requests per second ERRORS: Error rate percentage SATURATION: Resource utilization (CPU, memory)

DevOps Checklist

DevOps Review: [Project]

CI/CD

Docker

Infrastructure

Infrastructure as code (Terraform/CloudFormation)
Secrets managed securely (not in code)
Environments are reproducible
Backups configured and tested
Disaster recovery plan documented

Monitoring

Health endpoints implemented
Metrics collected (Golden Signals)
Logging centralized
Alerts configured (PagerDuty, Slack)
Dashboards created (Grafana, Datadog)

Security

Dependencies scanned automatically
Secrets not in code or logs
Network policies defined
Least privilege access (IAM, RBAC)
Security headers configured

Common Pitfalls

Don't Do

Manual deployments Automate with CI/CD

Skip tests in pipeline Run all tests before deploy

Use latest tag Use specific versions/SHAs

Run as root in containers Create non-root user

Commit secrets to git Use secrets management

Deploy straight to prod Deploy to staging first

No rollback plan Test rollback procedure

Ignore failed health checks Fail deployment if unhealthy

Tools & Technologies

CI/CD:

GitHub Actions
GitLab CI
CircleCI
Jenkins

Containers:

Docker
Kubernetes
Docker Compose
Podman