Kubernetes
Production-grade K8s manifests with security-first defaults and educational comments.
Resource Detection & Adaptation
Before generating manifests, detect the target environment:
Detect node resources
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}: {.status.capacity.memory}, {.status.capacity.cpu}{"\n"}{end}'
Detect if Docker Desktop (local) or real cluster
kubectl get nodes -o jsonpath='{.items[0].metadata.labels.node.kubernetes.io/instance-type}' 2>/dev/null || echo "local"
Detect available resources
kubectl describe nodes | grep -A 5 "Allocated resources"
Adapt configurations based on detection:
Detected Environment Profile Default Limits Agent Action
Docker Desktop < 6GB Minimal 128Mi-256Mi Warn, reduce replicas
Docker Desktop 6-10GB Standard 256Mi-512Mi Normal deployment
Cloud/Real cluster Production Based on node size Full features
Agent Behavior
-
Detect cluster type and resources before generating manifests
-
Adapt resource requests/limits to cluster capacity
-
Warn if requested workload exceeds available resources
-
Calculate safe limits: (node_memory * 0.7) / expected_pod_count
Adaptive Resource Templates
Local/Constrained (< 6GB allocatable):
resources: requests: memory: 128Mi cpu: 100m limits: memory: 256Mi cpu: 500m
Standard (6-16GB allocatable):
resources: requests: memory: 256Mi cpu: 100m limits: memory: 512Mi cpu: 1000m
Production (> 16GB or cloud):
resources: requests: memory: 512Mi cpu: 250m limits: memory: 1Gi cpu: 2000m
Pre-Deployment Validation
Before applying manifests, agent should verify:
Check if deployment would exceed node capacity
kubectl get nodes -o jsonpath='{.items[0].status.allocatable.memory}'
If insufficient: warn user and suggest scaling down or increasing Docker Desktop resources.
What This Skill Does
Analysis & Detection:
-
Auto-detects from Dockerfile: ports, health endpoints, resources
-
Identifies workload type from project structure
-
Reads existing manifests to understand patterns
-
Detects GPU requirements from dependencies
Generation:
-
Creates production-hardened manifests (non-root, read-only, resource limits)
-
Generates all supporting resources (Service, ConfigMap, HPA, PDB)
-
Creates namespace governance (ResourceQuota, LimitRange, NetworkPolicy)
-
Supports multi-team isolation with environment progression (dev → staging → prod)
-
Adds educational comments explaining WHY each config choice
-
Outputs ArgoCD-compatible directory structure
Validation:
-
Verifies kubectl context exists
-
Creates namespace if needed
-
Deploys to local cluster (kind/minikube)
-
Confirms pods are running before delivering
Security:
-
Non-root user by default (runAsNonRoot: true)
-
Read-only root filesystem
-
No privilege escalation
-
Dropped capabilities
-
Resource limits always set
-
Unprivileged ports only (>=1024) - privileged ports (<1024) require root
What This Skill Does NOT Do
-
Generate Helm charts (document in references for future)
-
Create Kustomize overlays (document in references for future)
-
Handle Dapr sidecar injection (separate skill)
-
Deploy Kafka/Strimzi operators (separate skill)
-
Generate ArgoCD Application CRDs (separate skill)
Before Implementation
Gather context to ensure successful implementation:
Source Gather
Codebase Dockerfile, existing manifests, port/health patterns
Conversation Target environment, namespace, special requirements
Skill References Security contexts, health probes, resource limits
User Guidelines Cluster conventions, naming standards
Required Clarifications
After auto-detection, confirm with user if ambiguous:
Question When to Ask
Target environment "Deploying to local (kind/minikube) or remote cluster?"
Namespace "Use existing namespace or create new?"
Image availability "Is image in registry or needs to be built/loaded?"
Service exposure "Internal only (ClusterIP) or external access needed?"
Namespace governance "Need ResourceQuota/LimitRange for resource isolation?"
Multi-team setup "Single team or multi-team with namespace isolation?"
Environment progression "Creating dev/staging/prod namespaces with quota progression?"
Pre-flight Checks (CRITICAL)
Before generating manifests, verify:
1. Cluster access
kubectl cluster-info
2. Current context
kubectl config current-context
3. Target namespace (create if needed)
kubectl get namespace $NAMESPACE || kubectl create namespace $NAMESPACE
4. Image exists (or build it)
docker images | grep $IMAGE_NAME || docker build -t $IMAGE_NAME .
5. For local clusters: load image
kind load docker-image $IMAGE_NAME # or minikube image load
If any check fails → stop and report. Don't generate manifests for broken state.
Auto-Detection Matrix
From Dockerfile
Detect How Example
Port EXPOSE instruction EXPOSE 8000 → containerPort: 8000
Health CMD with health endpoint uvicorn → /health or /healthz
User USER instruction USER 1000 → runAsUser: 1000
Workdir WORKDIR instruction Context for volume mounts
Port Selection (CRITICAL for Security)
Privileged ports (<1024) conflict with runAsNonRoot: true .
Detected Port Action
80, 443 ⚠️ Use unprivileged variant (nginx-unprivileged:8080) or remap
8080, 8000, 3000+ ✅ Compatible with non-root
Common remappings:
Standard Image Security-Compatible Alternative
nginx (port 80) nginxinc/nginx-unprivileged (port 8080)
httpd (port 80) Configure Listen 8080 or use unprivileged image
redis (port 6379) ✅ Already unprivileged
postgres (port 5432) ✅ Already unprivileged
Service abstracts this: Service port: 80 → targetPort: 8080 keeps external API stable.
From Code
Detect How Example
Framework health Route definitions FastAPI /health , Express /healthz
Readiness DB connection check /health/ready with DB ping
Startup time Heavy imports ML models → startupProbe needed
Workload Type Decision
Is this a one-time task that completes? → Job (or CronJob if scheduled)
Does it need stable network identity or ordered deployment? → StatefulSet
Must run on every node? → DaemonSet
Otherwise → Deployment (default)
Workflow
- PRE-FLIGHT
- Verify kubectl context
- Check namespace exists
- Verify image exists or build it ↓
- ANALYZE PROJECT
- Read Dockerfile for EXPOSE, HEALTHCHECK, USER
- Scan code for health endpoints
- Check existing k8s/ directory
- Detect GPU requirements (torch, tensorflow) ↓
- DETERMINE WORKLOAD TYPE
- Deployment (default)
- Job/CronJob (batch processing)
- StatefulSet (databases, ordered)
- DaemonSet (node-level agents) ↓
- GENERATE MANIFESTS
- Deployment/Job/StatefulSet with hardened security
- Service (ClusterIP, NodePort, or LoadBalancer)
- ConfigMap for non-secret config
- HPA if autoscaling needed
- PDB for availability
- All with educational comments ↓
- VALIDATE
- kubectl apply --dry-run=server
- kubectl apply -n $NAMESPACE
- kubectl wait --for=condition=Ready pod
- kubectl logs to verify startup ↓
- DELIVER
- Files in k8s/base/ directory
- Summary of what was created
- Next steps for production
Generated Directory Structure
k8s/ ├── base/ # Raw manifests (ArgoCD-compatible) │ ├── namespace.yaml # Optional, if new namespace │ ├── resourcequota.yaml # Namespace-wide resource caps │ ├── limitrange.yaml # Per-container defaults and bounds │ ├── networkpolicy.yaml # Namespace isolation rules │ ├── deployment.yaml # Or job.yaml, statefulset.yaml │ ├── service.yaml # ClusterIP by default │ ├── configmap.yaml # Non-secret configuration │ ├── hpa.yaml # If autoscaling enabled │ ├── pdb.yaml # Pod Disruption Budget │ └── kustomization.yaml # For future Kustomize use └── README.md # Deployment instructions
Manifest Patterns
Deployment (Default)
apiVersion: apps/v1 kind: Deployment metadata: name: ${APP_NAME} labels: # Standard K8s labels (see references/labels-annotations.md) app.kubernetes.io/name: ${APP_NAME} app.kubernetes.io/instance: ${APP_NAME}-${ENV} app.kubernetes.io/version: "${VERSION}" app.kubernetes.io/component: api # or worker, frontend app.kubernetes.io/part-of: ${PROJECT} app.kubernetes.io/managed-by: kubectl spec: replicas: 2 # WHY: Minimum for availability during rolling updates selector: matchLabels: app.kubernetes.io/name: ${APP_NAME} template: metadata: labels: app.kubernetes.io/name: ${APP_NAME} spec: # WHY: Security hardening - never run as root securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: ${APP_NAME} image: ${IMAGE}:${TAG} # WHY: Never use :latest - breaks reproducibility imagePullPolicy: IfNotPresent ports: # WHY: Port must be >=1024 for runAsNonRoot (privileged ports need root) # Use Service port:80 → targetPort:8080 to expose standard ports externally - containerPort: ${PORT} # Must be >=1024 (e.g., 8080, 8000, 3000) protocol: TCP # WHY: Container-level security context securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] # WHY: Prevent resource starvation, enable HPA resources: requests: cpu: "100m" # 0.1 CPU cores memory: "128Mi" limits: cpu: "500m" # 0.5 CPU cores memory: "512Mi" # WHY: K8s restarts if app deadlocks livenessProbe: httpGet: path: /health/live port: ${PORT} initialDelaySeconds: 10 periodSeconds: 15 failureThreshold: 3 # WHY: Only route traffic when ready readinessProbe: httpGet: path: /health/ready port: ${PORT} initialDelaySeconds: 5 periodSeconds: 10 # WHY: Slow-starting apps (ML models) need longer startup startupProbe: httpGet: path: /health/live port: ${PORT} initialDelaySeconds: 0 periodSeconds: 10 failureThreshold: 30 # 5 minutes to start # WHY: Graceful shutdown for in-flight requests lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 5"] # WHY: Allow time for graceful shutdown terminationGracePeriodSeconds: 30
Service
apiVersion: v1 kind: Service metadata: name: ${APP_NAME} labels: app.kubernetes.io/name: ${APP_NAME} spec:
WHY: ClusterIP is safest default - internal only
Use NodePort for dev/testing, LoadBalancer for prod external access
type: ClusterIP ports:
WHY: Service abstracts internal port - clients connect to :80, Pod runs on :8080
This allows standard external ports while container runs unprivileged
- port: 80 # WHY: Service port (what clients connect to)
targetPort: ${PORT} # WHY: Pod port (>=1024, e.g., 8080)
protocol: TCP
name: http
selector:
CRITICAL: Must EXACTLY match Pod template labels from Deployment
Mismatch = zero endpoints = Service routes to nothing
app.kubernetes.io/name: ${APP_NAME}
Verify Service→Pod connection: kubectl get endpoints ${APP_NAME}
-
Shows Pod IPs if selector matches
-
Shows <none> if selector MISMATCHES Pod labels
Security Context (Always Applied)
See references/security-contexts.md for full patterns.
Pod level
securityContext: runAsNonRoot: true # WHY: Never run as root runAsUser: 1000 # WHY: Consistent non-root UID runAsGroup: 1000 # WHY: Consistent GID fsGroup: 1000 # WHY: Volume permissions seccompProfile: type: RuntimeDefault # WHY: Block dangerous syscalls
Container level
securityContext: allowPrivilegeEscalation: false # WHY: Prevent root escalation readOnlyRootFilesystem: true # WHY: Immutable container capabilities: drop: ["ALL"] # WHY: Minimal capabilities
Output Checklist
Before delivering, verify:
Pre-flight
-
kubectl context is valid
-
Namespace exists or was created
-
Image exists locally or in registry
-
For kind/minikube: image loaded into cluster
Manifests
-
All manifests have app.kubernetes.io/* labels
-
Security context applied (runAsNonRoot, readOnlyRootFilesystem)
-
containerPort >= 1024 (privileged ports incompatible with runAsNonRoot)
-
Resource requests AND limits defined
-
Liveness and readiness probes configured
-
No hardcoded secrets (use Secret references or env vars)
Namespace Governance (if applicable)
-
ResourceQuota sets namespace-wide CPU/memory/pod limits
-
LimitRange provides default requests/limits for containers
-
LimitRange max prevents single container from consuming quota
-
NetworkPolicy isolates namespace (default-deny + explicit allows)
-
Monitoring namespace allowed to scrape metrics
Validation
-
kubectl apply --dry-run=server passes
-
Deployed to cluster successfully
-
Pods reach Running state
-
Health endpoints respond
-
Service has endpoints (kubectl get endpoints shows Pod IPs, not <none> )
Documentation
-
Comments explain WHY for each config choice
-
README.md with deployment instructions
Reference Files
Always Read First
File Purpose
references/security-contexts.md
CRITICAL: Hardened security patterns
references/health-probes.md
CRITICAL: Liveness/readiness/startup
references/resource-limits.md
CRITICAL: CPU/memory guidance
references/namespace-governance.md
CRITICAL: ResourceQuota, LimitRange, NetworkPolicy, multi-team isolation
Debugging & Operations
File When to Read
references/debugging-workflow.md
CRITICAL: CrashLoopBackOff, command safety, logs, exec, debug containers
references/deployment-gotchas.md
CRITICAL: Architecture mismatch, ImagePull failures, pre-deploy validation, Helm gotchas
references/networking-patterns.md
DEBUGGING: Service has no endpoints, selector mismatch, DNS issues
references/control-plane.md
DEBUGGING: When deployments fail, pods stuck, rollback needed
Workload-Specific
File When to Read
references/workload-types.md
Choosing Deployment vs Job vs StatefulSet
references/init-sidecar-patterns.md
Init containers (model download, db wait), sidecars (logging, metrics)
references/autoscaling-patterns.md
HPA, custom metrics, KEDA
references/gpu-workloads.md
AI/ML workloads with GPU
references/keda-patterns.md
Event-driven scale-to-zero
Infrastructure
File When to Read
references/networking-patterns.md
Service types, Ingress, mesh
references/storage-patterns.md
PVC, ephemeral, shared storage
references/configmap-patterns.md
ConfigMap creation, env vars, volumes, hot-reload
references/secrets-patterns.md
ESO, Sealed Secrets, K8s Secrets
references/rbac-patterns.md
SECURITY: ServiceAccount, Role, RoleBinding, least privilege
references/labels-annotations.md
Standard labels, ArgoCD compat