blue-green-deploy

Blue-Green & Deployment Strategies

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "blue-green-deploy" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-blue-green-deploy

Blue-Green & Deployment Strategies

Implement zero-downtime deployment patterns for production systems.

When to Use This Skill

Use this skill when:

  • Implementing zero-downtime deployments

  • Reducing deployment risk

  • Enabling instant rollbacks

  • Running canary releases

  • Performing A/B testing in production

Prerequisites

  • Load balancer or ingress controller

  • Container orchestration (K8s) or cloud platform

  • CI/CD pipeline

  • Health check endpoints

Deployment Strategy Overview

┌─────────────────────────────────────────────────────────────┐ │ DEPLOYMENT STRATEGIES │ ├─────────────┬─────────────┬─────────────┬──────────────────┤ │ Blue-Green │ Canary │ Rolling │ Recreate │ ├─────────────┼─────────────┼─────────────┼──────────────────┤ │ Full env │ Gradual % │ Pod by pod │ All at once │ │ swap │ rollout │ replacement │ │ ├─────────────┼─────────────┼─────────────┼──────────────────┤ │ Instant │ Slow, safe │ Moderate │ Fast, risky │ │ rollback │ rollback │ rollback │ │ ├─────────────┼─────────────┼─────────────┼──────────────────┤ │ 2x resources│ +10-25% │ Same │ Same │ │ needed │ resources │ resources │ │ └─────────────┴─────────────┴─────────────┴──────────────────┘

Blue-Green Deployment

Concept

Before: ┌─────────┐ ┌───────────────┐ │ Users │────▶│ Blue (v1) │ ◀── Active └─────────┘ └───────────────┘ ┌───────────────┐ │ Green (v2) │ ◀── Staging └───────────────┘

After Switch: ┌─────────┐ ┌───────────────┐ │ Users │ │ Blue (v1) │ ◀── Standby └─────────┘ └───────────────┘ │ ┌───────────────┐ └────────▶│ Green (v2) │ ◀── Active └───────────────┘

Kubernetes Implementation

blue-deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: myapp-blue labels: app: myapp version: blue spec: replicas: 3 selector: matchLabels: app: myapp version: blue template: metadata: labels: app: myapp version: blue spec: containers: - name: myapp image: myapp:v1.0.0 ports: - containerPort: 8080 readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 5

green-deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: myapp-green labels: app: myapp version: green spec: replicas: 3 selector: matchLabels: app: myapp version: green template: metadata: labels: app: myapp version: green spec: containers: - name: myapp image: myapp:v2.0.0 ports: - containerPort: 8080 readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 5

service.yaml - Switch by changing selector

apiVersion: v1 kind: Service metadata: name: myapp spec: selector: app: myapp version: blue # Change to 'green' to switch ports:

  • port: 80 targetPort: 8080

Switch Script

#!/bin/bash

blue-green-switch.sh

CURRENT=$(kubectl get svc myapp -o jsonpath='{.spec.selector.version}') NEW_VERSION=$1

echo "Current version: $CURRENT" echo "Switching to: $NEW_VERSION"

Verify new deployment is ready

kubectl rollout status deployment/myapp-$NEW_VERSION

Check health

HEALTH=$(kubectl exec -it deployment/myapp-$NEW_VERSION -- curl -s localhost:8080/health) if [ "$HEALTH" != "ok" ]; then echo "Health check failed" exit 1 fi

Switch traffic

kubectl patch svc myapp -p "{"spec":{"selector":{"version":"$NEW_VERSION"}}}"

echo "Switched to $NEW_VERSION"

AWS ECS Blue-Green

AWS CodeDeploy appspec.yml

version: 0.0 Resources:

  • TargetService: Type: AWS::ECS::Service Properties: TaskDefinition: "arn:aws:ecs:region:account:task-definition/myapp:2" LoadBalancerInfo: ContainerName: "myapp" ContainerPort: 8080 Hooks:
  • BeforeInstall: "LambdaFunctionToValidateBeforeTrafficShift"
  • AfterInstall: "LambdaFunctionToValidateAfterTrafficShift"
  • AfterAllowTestTraffic: "LambdaFunctionToValidateTestTraffic"
  • BeforeAllowTraffic: "LambdaFunctionToValidateBeforeAllowTraffic"
  • AfterAllowTraffic: "LambdaFunctionToValidateAfterAllowTraffic"

Canary Deployment

Kubernetes with Istio

VirtualService for traffic splitting

apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: myapp spec: hosts:

  • myapp http:
  • match:
    • headers: x-canary: exact: "true" route:
    • destination: host: myapp subset: canary
  • route:
    • destination: host: myapp subset: stable weight: 90
    • destination: host: myapp subset: canary weight: 10

apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: myapp spec: host: myapp subsets:

  • name: stable labels: version: stable
  • name: canary labels: version: canary

Argo Rollouts

apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: myapp spec: replicas: 5 strategy: canary: steps: - setWeight: 10 - pause: {duration: 5m} - setWeight: 25 - pause: {duration: 5m} - setWeight: 50 - pause: {duration: 5m} - setWeight: 75 - pause: {duration: 5m} analysis: templates: - templateName: success-rate startingStep: 2 args: - name: service-name value: myapp selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:v2.0.0 ports: - containerPort: 8080

apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: success-rate spec: args:

  • name: service-name metrics:
  • name: success-rate interval: 1m successCondition: result[0] >= 0.95 failureLimit: 3 provider: prometheus: address: http://prometheus:9090 query: | sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.*"}[5m])) / sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

Rolling Deployment

Kubernetes Default

apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: replicas: 5 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Max pods above desired maxUnavailable: 0 # Max pods unavailable selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:v2.0.0 ports: - containerPort: 8080 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 10

Rolling Update Commands

Update image

kubectl set image deployment/myapp myapp=myapp:v2.0.0

Watch rollout

kubectl rollout status deployment/myapp

Pause rollout

kubectl rollout pause deployment/myapp

Resume rollout

kubectl rollout resume deployment/myapp

Rollback

kubectl rollout undo deployment/myapp

Rollback to specific revision

kubectl rollout undo deployment/myapp --to-revision=2

View history

kubectl rollout history deployment/myapp

Health Checks

Comprehensive Health Endpoint

Flask health endpoint

from flask import Flask, jsonify import psycopg2 import redis

app = Flask(name)

@app.route('/health') def health(): """Liveness probe - is the app running?""" return jsonify({'status': 'healthy'}), 200

@app.route('/ready') def ready(): """Readiness probe - can the app serve traffic?""" checks = {}

# Database check
try:
    conn = psycopg2.connect(DATABASE_URL)
    conn.close()
    checks['database'] = 'ok'
except Exception as e:
    checks['database'] = str(e)
    return jsonify({'status': 'unhealthy', 'checks': checks}), 503

# Redis check
try:
    r = redis.from_url(REDIS_URL)
    r.ping()
    checks['redis'] = 'ok'
except Exception as e:
    checks['redis'] = str(e)
    return jsonify({'status': 'unhealthy', 'checks': checks}), 503

return jsonify({'status': 'healthy', 'checks': checks}), 200

Rollback Procedures

Automated Rollback

#!/bin/bash

auto-rollback.sh

DEPLOYMENT=$1 THRESHOLD=0.95 INTERVAL=60

echo "Monitoring deployment $DEPLOYMENT"

while true; do

Get success rate from Prometheus

SUCCESS_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total{status=~\"2.*\"}[5m]))/sum(rate(http_requests_total[5m]))" | jq -r '.data.result[0].value[1]')

echo "Current success rate: $SUCCESS_RATE"

if (( $(echo "$SUCCESS_RATE < $THRESHOLD" | bc -l) )); then echo "Success rate below threshold! Rolling back..." kubectl rollout undo deployment/$DEPLOYMENT exit 1 fi

sleep $INTERVAL done

Manual Rollback Checklist

Rollback Checklist

Before Rollback

  • Confirm issue is deployment-related
  • Document current error rates
  • Notify team in #deployments channel

During Rollback

  • Execute rollback command
  • Monitor rollback progress
  • Verify old version is serving traffic

After Rollback

  • Confirm error rates normalized
  • Update incident ticket
  • Schedule post-mortem

Common Issues

Issue: Slow Deployments

Problem: Rollout takes too long Solution: Increase maxSurge, decrease minReadySeconds

Issue: Failed Health Checks

Problem: Pods not becoming ready Solution: Check probe endpoints, increase timeouts

Issue: Traffic During Rollback

Problem: Errors during switch Solution: Use connection draining, implement graceful shutdown

Best Practices

  • Always implement health checks

  • Use connection draining

  • Test rollback procedures regularly

  • Monitor key metrics during deployment

  • Implement circuit breakers

  • Use deployment slots/environments

  • Automate deployment verification

  • Document rollback procedures

Related Skills

  • kubernetes-ops - K8s deployment basics

  • argocd-gitops - GitOps deployments

  • feature-flags - Progressive rollout

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review