deco-site-scaling-tuning

Deco Site Scaling Tuning

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "deco-site-scaling-tuning" with this command: npx skills add decocms/deco-start/decocms-deco-start-deco-site-scaling-tuning

Deco Site Scaling Tuning

Analyze a site's Prometheus metrics to discover the optimal autoscaling parameters. This skill helps you find the CPU/concurrency threshold where latency degrades and recommends scaling configuration accordingly.

When to Use This Skill

  • A site is overscaled (too many pods for its traffic)

  • A site oscillates between scaling up and down (panic mode loop)

  • Need to switch scaling metric (concurrency vs CPU vs RPS)

  • Need to find the right target value for a site

  • After deploying scaling changes, to verify they're working

Prerequisites

  • kubectl access to the target cluster

  • Prometheus accessible via port-forward (from kube-prometheus-stack in monitoring namespace)

  • Python 3 for analysis scripts

  • At least 6 hours of metric history for meaningful analysis

  • For direct latency data: queue-proxy PodMonitor must be applied (see Step 0)

Quick Start

  1. ENABLE METRICS → Apply queue-proxy PodMonitor if not already done
  2. PORT-FORWARD → kubectl port-forward prometheus-pod 19090:9090
  3. COLLECT DATA → Run analysis scripts against Prometheus
  4. ANALYZE → Find CPU threshold where latency degrades
  5. RECOMMEND → Choose scaling metric and target
  6. APPLY → Use deco-site-deployment skill to apply changes
  7. VERIFY → Monitor for 1-2 hours after change

Files in This Skill

File Purpose

SKILL.md

Overview, methodology, analysis procedures

analysis-scripts.md

Ready-to-use Python scripts for Prometheus queries

Step 0: Enable Queue-Proxy Metrics (one-time)

Queue-proxy runs as a sidecar on every Knative pod and exposes request latency histograms. These are critical for precise tuning but are not scraped by default.

Apply this PodMonitor:

apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: knative-queue-proxy namespace: monitoring labels: release: kube-prometheus-stack spec: namespaceSelector: any: true selector: matchExpressions: - key: serving.knative.dev/revision operator: Exists podMetricsEndpoints: - port: http-usermetric path: /metrics interval: 15s

kubectl apply -f queue-proxy-podmonitor.yaml

Wait 2-3 hours for data to accumulate before running latency analysis

Metrics unlocked by this PodMonitor:

  • revision_app_request_latencies_bucket — request latency histogram (p50/p95/p99)

  • revision_app_request_latencies_sum / _count — for avg latency

  • revision_app_request_count — request rate by response code

Step 1: Establish Prometheus Connection

PROM_POD=$(kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus -o jsonpath='{.items[0].metadata.name}') kubectl port-forward -n monitoring $PROM_POD 19090:9090 &

Verify

curl -s "http://127.0.0.1:19090/api/v1/query?query=up" | jq '.status'

Step 2: Collect Current State

Before analyzing, understand what the site is currently configured for.

2a. Read current autoscaler config

SITENAME="<sitename>" NS="sites-${SITENAME}"

Current revision annotations

kubectl get rev -n $NS -o json |
jq '.items[] | select(.status.conditions[]?.status == "True" and .status.conditions[]?.type == "Active") | {name: .metadata.name, annotations: .metadata.annotations | with_entries(select(.key | startswith("autoscaling")))}'

Global autoscaler defaults

kubectl get cm config-autoscaler -n knative-serving -o json | jq '.data | del(._example)'

2b. Current pod count and resources

kubectl get pods -n $NS --no-headers | wc -l kubectl top pods -n $NS --no-headers | head -20

Step 3: Run Analysis

Use the scripts in analysis-scripts.md . The analysis follows this methodology:

Methodology: Finding the Optimal CPU Target

Goal: Find the CPU level at which latency starts to degrade. This is your scaling target — keep pods below this CPU to maintain good latency.

Approach:

Collect CPU per pod, concurrency per pod, pod count, and (if available) request latency over 6-12 hours

Bucket data by CPU range (0-200m, 200-300m, ..., 700m+)

For each bucket, compute avg/p95 concurrency per pod

Compute the "latency inflation factor" — how much concurrency increases beyond what the pod count reduction explains:

excess = (avg_conc_above_threshold / avg_conc_below_threshold) / (avg_pods_below / avg_pods_above)

  • excess = 1.0 → concurrency increase fully explained by fewer pods (no latency degradation)

  • excess > 1.0 → latency is inflating concurrency (pods are slowing down)

  • The CPU level where excess crosses ~1.5x is your inflection point

If queue-proxy latency is available, directly plot avg latency vs CPU — the hockey stick inflection is your target

What to Look For

CPU vs Concurrency/pod:

Low CPU (0-200m) → Low conc/pod → Pods are idle (overprovisioned) Medium CPU (200-400m) → Moderate conc → Healthy range ★ INFLECTION ★ → Conc jumps → Latency starting to degrade High CPU (500m+) → High conc/pod → Pods overloaded, latency bad

The inflection point is where you want your scaling target.

Decision Matrix

IMPORTANT: CPU target is in millicores (not percentage). E.g., target: 400 means scale when CPU reaches 400m.

Inflection CPU Recommended metric Target Notes

< CPU request CPU scaling target = inflection value in millicores Standard case

~ CPU request CPU scaling target = CPU_request × 0.8 Conservative

CPU request (no limit) CPU scaling target = CPU_request × 0.8, increase CPU request Need more CPU headroom

No clear inflection Concurrency scaling Keep current but tune target CPU isn't the bottleneck

Common Patterns

Pattern: CPU-bound app (Deno SSR)

  • Baseline CPU: 200-300m (Deno runtime + V8 JIT)

  • Inflection: 400-500m

  • Recommendation: CPU scaling with target = inflection (e.g., 400 millicores)

Pattern: IO-bound app (mostly external API calls)

  • CPU stays low even under high concurrency

  • Inflection not visible in CPU

  • Recommendation: Keep concurrency scaling, tune the target

Pattern: Oscillating (panic loop)

  • Symptoms: pods cycle between min and max

  • Cause: concurrency scaling + low target + scale-down-delay ratchet

  • Fix: Switch to CPU scaling (breaks the latency→concurrency feedback loop)

Step 4: Apply Changes

Use the deco-site-deployment skill to:

  • Update the state secret with new scaling config

  • Redeploy on both clouds

Example for CPU-based scaling (target is in millicores):

NEW_STATE=$(echo "$STATE" | jq ' .scaling.metric = { "type": "cpu", "target": 400 } ')

Step 5: Verify After Change

Monitor for 1-2 hours after applying changes:

Watch pod count stabilize

watch -n 10 "kubectl get pods -n sites-<sitename> --no-headers | wc -l"

Check if panic mode triggers (should be N/A for HPA/CPU)

HPA doesn't have panic mode — this is one of the advantages

Verify HPA is active

kubectl get hpa -n sites-<sitename>

Check HPA status

kubectl describe hpa -n sites-<sitename>

Success Criteria

  • Pod count stabilizes (no more oscillation)

  • Avg CPU per pod stays below your target during normal traffic

  • CPU crosses target only during genuine traffic spikes (and scales up proportionally)

  • No panic mode events (HPA doesn't have panic mode)

  • Latency stays acceptable (check with queue-proxy metrics if available)

Rollback

If the new scaling is worse, revert by changing the state secret back to concurrency scaling:

NEW_STATE=$(echo "$STATE" | jq ' .scaling.metric = { "type": "concurrency", "target": 15, "targetUtilizationPercentage": 70 } ')

Related Skills

  • deco-site-deployment — Apply scaling changes and redeploy

  • deco-site-memory-debugging — Debug memory issues on running pods

  • deco-incident-debugging — Incident response and triage

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

deco-site-memory-debugging

No summary provided by upstream source.

Repository SourceNeeds Review
General

deco-e2e-testing

No summary provided by upstream source.

Repository SourceNeeds Review
General

deco-tanstack-search

No summary provided by upstream source.

Repository SourceNeeds Review
General

deco-storefront-test-checklist

No summary provided by upstream source.

Repository SourceNeeds Review