Sync to Production Skill
This skill provides workflows for synchronizing Kubernetes kustomization configurations from staging to production environment in the simplex-gitops repository.
⚠️ CRITICAL: Production Deployment Policy
生产环境部署必须手动执行,禁止自动同步。
The workflow is:
-
✅ Update kustomization.yaml (can be automated)
-
✅ Commit and push to GitLab (can be automated)
-
⛔ ArgoCD sync to production cluster - MUST BE MANUAL
After pushing changes, inform the user:
-
Changes are pushed to the repository
-
Production ArgoCD app will detect the changes but will NOT auto-sync
-
User must manually trigger sync via ArgoCD UI or CLI when ready
View pending changes (safe, read-only)
argocd app get simplex-aws-prod argocd app diff simplex-aws-prod
Manual sync (ONLY when user explicitly requests)
argocd app sync simplex-aws-prod
NEVER run argocd app sync simplex-aws-prod automatically.
File Locations
kubernetes/overlays/aws-staging/kustomization.yaml # Staging config kubernetes/overlays/aws-prod/kustomization.yaml # Production config
Quick Commands
View Image Differences
Using the sync script
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --diff
Or using make target (if in kubernetes/ directory)
make compare-images
Sync Images
Sync specific services
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images front,anotherme-agent
Sync all images (dry-run first)
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --all --dry-run
Sync all images (apply changes)
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --all
Sync Workflow
Step 1: Compare Environments
Run the diff command to see what's different between staging and production:
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --diff
This shows:
-
🔄 DIFFERENT TAGS: Services with different versions
-
✅ SAME TAGS: Services already in sync
-
⚠️ STAGING ONLY: Services only in staging
-
⚠️ PROD ONLY: Services only in production
Step 2: Review and Select Services
Decide which services to promote. Common patterns:
Promote a single critical service
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images front --dry-run
Promote frontend services
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images front,front-homepage --dry-run
Promote all AI services
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images anotherme-agent,anotherme-api,anotherme-search,anotherme-worker --dry-run
Promote everything
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --all --dry-run
Step 3: Apply Changes
After reviewing dry-run output, apply the changes:
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images <services>
Step 4: Commit and Push
cd /path/to/simplex-gitops git add kubernetes/overlays/aws-prod/kustomization.yaml git commit -m "chore: promote <services> to production" git push
重要:推送后 ArgoCD 会检测到变更,但不会自动同步到生产集群。
Step 5: Manual Production Sync (User Action Required)
推送完成后,需要用户手动触发生产环境同步:
查看待同步的变更
argocd app get simplex-aws-prod argocd app diff simplex-aws-prod
用户确认后手动同步
argocd app sync simplex-aws-prod
或通过 ArgoCD Web UI 手动点击 Sync 按钮:
-
找到 simplex-aws-prod 应用
-
点击 "SYNC" 按钮
Configuration Sections That May Need Sync
Beyond image tags, these sections may differ between environments:
- Image Tags (Primary Sync Target)
Located in the images: section. This is what the sync script handles.
- ConfigMap Patches
Files in patches/ directory may contain environment-specific values:
Patch File Purpose Sync Consideration
api-cm0-configmap.yaml
API config Usually environment-specific, don't sync
gateway-cm0-configmap.yaml
Gateway config Usually environment-specific
anotherme-agent-env-configmap.yaml
Agent config May need selective sync
anotherme-agent-secrets.yaml
Agent secrets Never sync, environment-specific
anotherme-search-env-configmap.yaml
Search config May need selective sync
simplex-cron-env-configmap.yaml
Cron config Usually environment-specific
simplex-router-cm0-configmap.yaml
Router config Usually environment-specific
frontend-env.yaml
Frontend env vars Usually environment-specific
ingress.yaml
Ingress rules Never sync, different domains
- Replica Counts
Staging often runs with fewer replicas. Production uses base defaults or higher. This is intentional and should NOT be synced.
- Node Pool Assignments
-
Staging: karpenter.sh/nodepool: staging / singleton-staging
-
Production: karpenter.sh/nodepool: production / singleton-production
These are environment-specific and should NOT be synced.
- Storage Classes
Both environments use similar patterns but production uses gp3 while staging uses ebs-gp3-auto . Usually no sync needed.
- High Availability Settings
Production has additional HA configurations:
-
topologySpreadConstraints for cross-AZ distribution
-
terminationGracePeriodSeconds: 60 for graceful shutdown
These are production-specific optimizations and should NOT be synced to staging.
Manual Sync Patterns
For configurations not handled by the script:
Sync a Specific ConfigMap Patch
Compare
diff kubernetes/overlays/aws-staging/patches/anotherme-agent-env-configmap.yaml
kubernetes/overlays/aws-prod/patches/anotherme-agent-env-configmap.yaml
Copy if needed (carefully review first!)
cp kubernetes/overlays/aws-staging/patches/anotherme-agent-env-configmap.yaml
kubernetes/overlays/aws-prod/patches/anotherme-agent-env-configmap.yaml
Sync New Resources
If staging has new resources (PV, PVC, etc.) that production needs:
-
Check staging resources: section for new entries
-
Copy the resource files to aws-prod
-
Add to aws-prod kustomization.yaml resources section
-
Adjust environment-specific values (namespace, labels, etc.)
Verification After Sync
Check ArgoCD Status (Read-Only, Safe)
查看应用状态和待同步变更
argocd app get simplex-aws-prod argocd app diff simplex-aws-prod
Manual Sync (User Must Explicitly Request)
⛔ 仅在用户明确要求时执行
argocd app sync simplex-aws-prod
Check Deployed Versions
Production namespace
k1 get pods -n production -o jsonpath='{range .items[]}{.metadata.name}{"\t"}{.spec.containers[].image}{"\n"}{end}'
Staging namespace
k2 get pods -n staging -o jsonpath='{range .items[]}{.metadata.name}{"\t"}{.spec.containers[].image}{"\n"}{end}'
Validate Manifests
kubectl kustomize kubernetes/overlays/aws-prod > /tmp/prod-manifests.yaml kubectl kustomize kubernetes/overlays/aws-staging > /tmp/staging-manifests.yaml diff /tmp/staging-manifests.yaml /tmp/prod-manifests.yaml
Troubleshooting
Script Not Finding Repository
Ensure you're in the simplex-gitops directory or set the path explicitly:
cd /path/to/simplex-gitops python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --diff
Image Not Found in Staging
The service may use a different image name format (Aliyun vs ECR). Check both formats in the kustomization files.
ArgoCD Not Syncing
查看应用状态(只读)
argocd app get simplex-aws-prod --show-operation
刷新应用检测最新变更(只读,安全)
argocd app refresh simplex-aws-prod
⛔ 手动同步 - 仅在用户明确要求时执行
argocd app sync simplex-aws-prod
Service Categories Reference
Category Services
AI Core anotherme-agent , anotherme-api , anotherme-search , anotherme-worker
Frontend front , front-homepage
Backend simplex-cron , simplex-gateway-api , simplex-gateway-worker
Data data-search-api , crawler
Infrastructure litellm , node-server , simplex-router , simplex-router-backend , simplex-router-fronted