autoscaling-policy-designer

Design autoscaling policies based on traffic patterns, cost constraints, and performance SLOs

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "autoscaling-policy-designer" with this command: npx skills add charlie-morrison/autoscaling-policy-designer

Autoscaling Policy Designer

Design autoscaling policies that balance performance, cost, and reliability. This skill teaches an AI agent to analyze historical traffic patterns, recommend scaling thresholds, configure Kubernetes HPA/KEDA or cloud-native autoscalers, simulate behavior under load, and model the cost impact of different scaling strategies.

Use when: "design autoscaling", "scaling policy", "HPA configuration", "KEDA setup", "scale to zero", "autoscaling thresholds", "scaling costs", "traffic spike handling", "over-provisioned", "under-provisioned"

Commands

1. analyze -- Study traffic patterns

Before designing a policy, understand the workload. Collect metrics, identify patterns, and classify the traffic shape.

Step 1: Collect historical utilization data

# Kubernetes: Get CPU/memory utilization over 7 days from Prometheus
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
  --data-urlencode 'query=avg(rate(container_cpu_usage_seconds_total{namespace="production",pod=~"api-.*"}[5m])) by (pod)' \
  --data-urlencode "start=$(date -d '7 days ago' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=1h' | python3 -c "
import json, sys
from datetime import datetime

data = json.load(sys.stdin)
for series in data['data']['result']:
    pod = series['metric'].get('pod', 'aggregate')
    values = [float(v[1]) for v in series['values']]
    print(f'{pod}:')
    print(f'  min:  {min(values):.3f} cores')
    print(f'  avg:  {sum(values)/len(values):.3f} cores')
    print(f'  max:  {max(values):.3f} cores')
    print(f'  p95:  {sorted(values)[int(len(values)*0.95)]:.3f} cores')
    print(f'  p99:  {sorted(values)[int(len(values)*0.99)]:.3f} cores')
"

# AWS: Get CloudWatch CPU utilization for an ASG
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=AutoScalingGroupName,Value="$ASG_NAME" \
  --start-time "$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \
  --period 3600 \
  --statistics Average Maximum \
  --output json | python3 -c "
import json, sys
data = json.load(sys.stdin)
points = sorted(data['Datapoints'], key=lambda x: x['Timestamp'])
for p in points:
    print(f'{p[\"Timestamp\"]:>25}  avg={p[\"Average\"]:5.1f}%  max={p[\"Maximum\"]:5.1f}%')
"

Step 2: Identify the traffic pattern class

Classify the workload into one of these patterns, because each requires a different scaling strategy:

import json, sys
from collections import defaultdict
from datetime import datetime

def classify_traffic(timestamps_values):
    """Classify traffic into a pattern type based on 7 days of hourly data."""
    by_hour = defaultdict(list)
    by_weekday = defaultdict(list)

    for ts, val in timestamps_values:
        dt = datetime.fromtimestamp(float(ts))
        by_hour[dt.hour].append(float(val))
        by_weekday[dt.weekday()].append(float(val))

    hourly_avgs = {h: sum(v)/len(v) for h, v in by_hour.items()}
    weekday_avgs = {d: sum(v)/len(v) for d, v in by_weekday.items()}

    peak_hour = max(hourly_avgs, key=hourly_avgs.get)
    trough_hour = min(hourly_avgs, key=hourly_avgs.get)
    peak_to_trough = hourly_avgs[peak_hour] / max(hourly_avgs[trough_hour], 0.001)

    weekday_avg = sum(weekday_avgs.get(d, 0) for d in range(5)) / 5
    weekend_avg = sum(weekday_avgs.get(d, 0) for d in range(5, 7)) / 2

    all_values = [v for _, v in timestamps_values]
    max_val = max(float(v) for v in all_values)
    avg_val = sum(float(v) for v in all_values) / len(all_values)
    spike_ratio = max_val / max(avg_val, 0.001)

    pattern = {
        "peak_hour": f"{peak_hour}:00",
        "trough_hour": f"{trough_hour}:00",
        "peak_to_trough_ratio": round(peak_to_trough, 1),
        "weekday_vs_weekend_ratio": round(weekday_avg / max(weekend_avg, 0.001), 1),
        "spike_ratio": round(spike_ratio, 1),
    }

    if peak_to_trough > 3:
        pattern["type"] = "DAILY_CYCLE"
        pattern["strategy"] = "Predictive scaling + reactive HPA. Pre-warm before peak hours."
    elif spike_ratio > 5:
        pattern["type"] = "SPIKE"
        pattern["strategy"] = "Aggressive scale-up (short stabilization window), conservative scale-down."
    elif weekday_avg / max(weekend_avg, 0.001) > 2:
        pattern["type"] = "WEEKLY_CYCLE"
        pattern["strategy"] = "Scheduled scaling for weekday/weekend transitions + HPA for within-day variation."
    else:
        pattern["type"] = "STEADY_STATE"
        pattern["strategy"] = "Simple target-tracking policy. Right-size the baseline."

    return pattern

# Example: parse Prometheus query_range output
# result = classify_traffic(data['data']['result'][0]['values'])
# print(json.dumps(result, indent=2))

Step 3: Analyze request-level metrics (for RPS-based scaling)

# Get requests per second over 7 days
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
  --data-urlencode 'query=sum(rate(http_requests_total{namespace="production",service="api"}[5m]))' \
  --data-urlencode "start=$(date -d '7 days ago' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=1h' | python3 -c "
import json, sys
data = json.load(sys.stdin)
values = [(float(v[0]), float(v[1])) for v in data['data']['result'][0]['values']]
rps_values = [v for _, v in values]
print(f'RPS over 7 days:')
print(f'  min:  {min(rps_values):.0f} rps')
print(f'  avg:  {sum(rps_values)/len(rps_values):.0f} rps')
print(f'  max:  {max(rps_values):.0f} rps')
print(f'  p99:  {sorted(rps_values)[int(len(rps_values)*0.99)]:.0f} rps')
print(f'  Capacity per pod (from load tests): ~200 rps')
print(f'  Min pods needed at peak: {int(max(rps_values)/200) + 1}')
print(f'  Min pods needed at trough: {max(1, int(min(rps_values)/200))}')
"

# Get response latency percentiles to determine SLO baseline
curl -s "$PROMETHEUS_URL/api/v1/query" \
  --data-urlencode 'query=histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service="api"}[5m])) by (le))' | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
p99 = float(data['data']['result'][0]['value'][1])
print(f'Current p99 latency: {p99*1000:.0f}ms')
if p99 < 0.2:
    print('SLO headroom: GOOD (p99 < 200ms)')
elif p99 < 0.5:
    print('SLO headroom: TIGHT (p99 200-500ms)')
else:
    print('SLO headroom: CRITICAL (p99 > 500ms, scaling may be needed now)')
"

Report template

## Traffic Pattern Analysis

**Service:** api-service
**Period:** YYYY-MM-DD to YYYY-MM-DD (7 days)
**Data source:** Prometheus

### Utilization Summary
- CPU: avg 0.35 cores, p95 1.2 cores, max 2.1 cores
- Memory: avg 512MB, p95 780MB, max 1.1GB
- RPS: avg 450, p95 1,200, max 2,800

### Pattern Classification
- **Type:** DAILY_CYCLE
- **Peak hours:** 09:00-17:00 UTC
- **Trough hours:** 02:00-06:00 UTC
- **Peak-to-trough ratio:** 4.2x
- **Weekend reduction:** 60% lower than weekday

### Scaling Implications
- Minimum pods needed at trough: 3
- Minimum pods needed at peak: 14
- Currently running: 10 (fixed) -- overprovisioned at night, tight at peak
- Recommended strategy: Predictive scaling + reactive HPA

2. design -- Create a scaling policy

Based on the traffic analysis, generate a concrete autoscaler configuration.

Step 1: Kubernetes HPA (resource-based)

# Standard HPA for daily-cycle workloads
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3           # Floor: handles trough traffic + one pod failure
  maxReplicas: 25          # Ceiling: cost cap
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60    # React to traffic in 1 min
      policies:
        - type: Percent
          value: 100                    # Can double capacity per minute
          periodSeconds: 60
        - type: Pods
          value: 4                      # But add at least 4 pods at a time
          periodSeconds: 60
      selectPolicy: Max                 # Use whichever adds more pods
    scaleDown:
      stabilizationWindowSeconds: 300   # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 25                     # Remove at most 25% per 2 min
          periodSeconds: 120
      selectPolicy: Min                 # Use whichever removes fewer pods
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65        # Target 65% -- headroom for spikes
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75

Step 2: KEDA (event-driven scaling)

For workloads that should scale based on queue depth, RPS, or custom metrics.

# KEDA ScaledObject for a queue-processing worker
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: worker
  pollingInterval: 15
  cooldownPeriod: 120
  minReplicaCount: 0        # Scale to zero when queue is empty
  maxReplicaCount: 50
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://user:pass@rabbitmq.production:5672/
        queueName: jobs
        queueLength: "10"    # 1 pod per 10 queued messages
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        query: sum(rate(http_requests_total{service="api"}[2m]))
        threshold: "100"     # 1 pod per 100 rps
        activationThreshold: "5"  # Don't scale from zero until 5 rps

Step 3: AWS Auto Scaling Group policy

# Create a target-tracking scaling policy for an ASG
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name "$ASG_NAME" \
  --policy-name "cpu-target-tracking" \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 65.0,
    "ScaleInCooldown": 300,
    "ScaleOutCooldown": 60
  }'

# Add a scheduled scaling action for known daily pattern
aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name "$ASG_NAME" \
  --scheduled-action-name "morning-scaleup" \
  --recurrence "0 8 * * MON-FRI" \
  --min-size 6 \
  --desired-capacity 8

aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name "$ASG_NAME" \
  --scheduled-action-name "evening-scaledown" \
  --recurrence "0 20 * * *" \
  --min-size 2 \
  --desired-capacity 3

Step 4: Validate the design

# Check current HPA status
kubectl get hpa -n production -o wide

# Verify HPA can read the metrics it needs
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/production/pods" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for pod in data['items']:
    name = pod['metadata']['name']
    for c in pod['containers']:
        cpu = c['usage']['cpu']
        mem = c['usage']['memory']
        print(f'{name}: cpu={cpu}, mem={mem}')
"

# Check if custom metrics API is available (needed for RPS-based scaling)
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" 2>/dev/null && echo "Custom metrics API available" || echo "Custom metrics API NOT available -- install prometheus-adapter"

3. simulate -- Model behavior under load

Before deploying a scaling policy, simulate how it would react to different traffic scenarios.

Step 1: Replay historical traffic against the proposed policy

import json

def simulate_hpa(traffic_rps, capacity_per_pod, target_utilization,
                 min_replicas, max_replicas, scaleup_window_s, scaledown_window_s,
                 interval_s=60):
    """Simulate HPA behavior over a traffic timeline."""
    current_replicas = min_replicas
    history = []
    scaleup_cooldown = 0
    scaledown_cooldown = 0

    for i, rps in enumerate(traffic_rps):
        timestamp_min = i * interval_s // 60
        total_capacity = current_replicas * capacity_per_pod
        utilization = rps / max(total_capacity, 1)

        desired = max(min_replicas, min(max_replicas,
                      int(rps / (capacity_per_pod * target_utilization)) + 1))

        if desired > current_replicas and scaleup_cooldown <= 0:
            # Scale up: can double at most
            scale_to = min(desired, current_replicas * 2, max_replicas)
            current_replicas = scale_to
            scaleup_cooldown = scaleup_window_s // interval_s
            event = "SCALE UP"
        elif desired < current_replicas and scaledown_cooldown <= 0:
            # Scale down: remove at most 25%
            scale_to = max(desired, int(current_replicas * 0.75), min_replicas)
            current_replicas = scale_to
            scaledown_cooldown = scaledown_window_s // interval_s
            event = "SCALE DOWN"
        else:
            event = ""

        scaleup_cooldown = max(0, scaleup_cooldown - 1)
        scaledown_cooldown = max(0, scaledown_cooldown - 1)

        slo_ok = utilization < 0.85  # SLO: stay under 85% utilization
        history.append({
            "minute": timestamp_min,
            "rps": rps,
            "replicas": current_replicas,
            "utilization": round(utilization * 100, 1),
            "slo_ok": slo_ok,
            "event": event
        })

    return history

# Scenario 1: Normal daily cycle (24 hours, 1-min intervals)
import math
daily_traffic = [int(200 + 800 * max(0, math.sin((h - 6) * math.pi / 12)))
                 for h in range(24) for _ in range(60)]

result = simulate_hpa(
    traffic_rps=daily_traffic,
    capacity_per_pod=200,
    target_utilization=0.65,
    min_replicas=3,
    max_replicas=25,
    scaleup_window_s=60,
    scaledown_window_s=300
)

slo_violations = sum(1 for r in result if not r['slo_ok'])
max_replicas_used = max(r['replicas'] for r in result)
print(f"Daily cycle simulation:")
print(f"  SLO violations: {slo_violations} / {len(result)} minutes ({slo_violations/len(result)*100:.1f}%)")
print(f"  Max replicas used: {max_replicas_used}")
print(f"  Scale events: {sum(1 for r in result if r['event'])}")

Step 2: Simulate a traffic spike

# Scenario 2: 10x traffic spike lasting 15 minutes
spike_traffic = [300] * 60 + [3000] * 15 + [300] * 60  # ramp, spike, recovery

result = simulate_hpa(
    traffic_rps=spike_traffic,
    capacity_per_pod=200,
    target_utilization=0.65,
    min_replicas=3,
    max_replicas=25,
    scaleup_window_s=60,
    scaledown_window_s=300
)

# Find how long until capacity catches up
spike_start = 60
for r in result[spike_start:]:
    if r['utilization'] < 85:
        catch_up_min = r['minute'] - spike_start
        print(f"Capacity caught up in {catch_up_min} minutes after spike start")
        break
else:
    print("WARNING: Capacity never caught up during spike")

slo_violations_during_spike = sum(1 for r in result[60:75] if not r['slo_ok'])
print(f"SLO violations during spike: {slo_violations_during_spike} / 15 minutes")

Step 3: Check for flapping

# Scenario 3: Oscillating traffic (tests stabilization windows)
import random
oscillating = [500 + 300 * (1 if i % 6 < 3 else -1) + random.randint(-50, 50)
               for i in range(120)]

result = simulate_hpa(
    traffic_rps=oscillating,
    capacity_per_pod=200,
    target_utilization=0.65,
    min_replicas=3,
    max_replicas=25,
    scaleup_window_s=60,
    scaledown_window_s=300
)

scale_events = [r for r in result if r['event']]
print(f"Oscillation test: {len(scale_events)} scale events in {len(result)} minutes")
if len(scale_events) > 20:
    print("WARNING: Possible flapping. Increase stabilization windows.")
else:
    print("OK: Scaling is stable under oscillating load.")

4. cost -- Project scaling costs

Model the monthly cost of the autoscaling policy versus alternatives.

Step 1: Calculate cost for different strategies

import json

def model_monthly_cost(
    strategy,
    min_pods, max_pods,
    cpu_per_pod, mem_gb_per_pod,
    cpu_cost_hr, mem_cost_hr_gb,
    peak_hours_per_day=8,
    avg_pods_at_peak=None,
    avg_pods_off_peak=None
):
    """Model monthly cost of a scaling strategy."""
    hours_per_month = 730  # 24 * 30.4

    if strategy == "fixed_at_peak":
        pods = max_pods
        cost = pods * hours_per_month * (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb)
        return {"strategy": strategy, "monthly_cost": round(cost, 2), "avg_pods": pods}

    elif strategy == "fixed_at_average":
        pods = (min_pods + max_pods) // 2
        cost = pods * hours_per_month * (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb)
        return {"strategy": strategy, "monthly_cost": round(cost, 2), "avg_pods": pods,
                "risk": "Under-provisioned at peak, SLO violations likely"}

    elif strategy == "autoscaled":
        peak_hours = peak_hours_per_day * 30.4
        off_peak_hours = hours_per_month - peak_hours
        peak_pods = avg_pods_at_peak or int(max_pods * 0.7)
        off_peak_pods = avg_pods_off_peak or min_pods
        cost = ((peak_pods * peak_hours + off_peak_pods * off_peak_hours) *
                (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb))
        return {"strategy": strategy, "monthly_cost": round(cost, 2),
                "avg_pods_peak": peak_pods, "avg_pods_off_peak": off_peak_pods}

    elif strategy == "scale_to_zero":
        # For batch/worker: assume active only when queue has items
        active_hours = peak_hours_per_day * 30.4
        avg_pods = avg_pods_at_peak or max_pods // 2
        cost = avg_pods * active_hours * (cpu_per_pod * cpu_cost_hr + mem_gb_per_pod * mem_cost_hr_gb)
        return {"strategy": strategy, "monthly_cost": round(cost, 2),
                "active_hours_per_month": round(active_hours, 0)}

# Compare strategies
params = dict(min_pods=3, max_pods=20, cpu_per_pod=0.5, mem_gb_per_pod=1.0,
              cpu_cost_hr=0.048, mem_cost_hr_gb=0.006, peak_hours_per_day=8,
              avg_pods_at_peak=14, avg_pods_off_peak=3)

strategies = ["fixed_at_peak", "fixed_at_average", "autoscaled", "scale_to_zero"]
results = []
for s in strategies:
    results.append(model_monthly_cost(strategy=s, **params))

baseline = results[0]["monthly_cost"]
print(f"{'Strategy':<20} {'Monthly Cost':>12} {'vs Fixed Peak':>14}")
print("-" * 48)
for r in results:
    savings = (1 - r["monthly_cost"] / baseline) * 100
    print(f"{r['strategy']:<20} ${r['monthly_cost']:>10.2f} {savings:>+12.1f}%")

Step 2: Factor in spot/preemptible instances

# AWS: Compare on-demand vs spot pricing for the instance type
aws ec2 describe-spot-price-history \
  --instance-types m5.large \
  --product-descriptions "Linux/UNIX" \
  --start-time "$(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S)" \
  --query 'SpotPriceHistory[*].{AZ:AvailabilityZone,Price:SpotPrice,Time:Timestamp}' \
  --output table

# GKE: Check if node pool supports spot VMs
gcloud container node-pools describe "$NODE_POOL" \
  --cluster "$CLUSTER" --zone "$ZONE" \
  --format="value(config.spot)"

Report template

## Autoscaling Cost Projection

**Service:** api-service
**Instance type:** m5.large (2 vCPU, 8GB RAM)
**Region:** us-east-1

### Strategy Comparison (monthly)
| Strategy | Monthly Cost | Savings vs Fixed | Risk |
|----------|-------------|-----------------|------|
| Fixed at peak (20 pods) | $1,576.80 | baseline | None (over-provisioned) |
| Fixed at average (11 pods) | $867.24 | -45.0% | SLO violations at peak |
| Autoscaled (3-20 pods) | $623.88 | -60.4% | 1-2 min lag on spikes |
| Scale-to-zero + autoscale | $412.32 | -73.8% | Cold start latency |

### Recommended: Autoscaled (3-20 pods)
- Estimated savings: $952.92/month ($11,435/year) vs fixed-at-peak
- SLO risk: Minimal (simulation shows 0.3% violation rate)
- Cold start: N/A (min 3 pods always warm)

### Spot instance opportunity
- Current on-demand cost per pod: $0.054/hr
- Current spot price: $0.018/hr (67% discount)
- If 50% of scale-out pods use spot: additional $156/month savings
- Recommendation: Use spot for pods above minReplicas, on-demand for baseline

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

AIWolfPK - AI狼人杀

四个AI互相猜疑,你坐着看戏。每局30秒,到底谁是狼? Four AIs play Werewolf while you watch. 30s per round. Spot the wolf before they do.

Registry SourceRecently Updated
General

Project Analyzer

Analyze any project directory and produce a detailed report covering what the project does, its tech stack, folder structure, entry points, how to run it, an...

Registry SourceRecently Updated
General

Thought-Retriever

提炼对话回答中的核心洞察为高置信度知识晶体,存储于本体驱动记忆系统的自我进化与复用。

Registry SourceRecently Updated
General

Miaoji Bid Guard Pro

亚马逊广告护城河Pro版,90天ROI预测+多活动协同+季节性出价+关键词攻防矩阵。 从单次调价建议升级为完整的广告战役规划。基础功能可使用 miaoji-bid-guard 免费版。

Registry SourceRecently Updated