promotion-pipeline

The homelab uses an OCI artifact promotion pipeline for immutable, auditable deployments. Changes flow through three stages: build, validate in integration, promote to live. This skill covers end-to-end tracing and debugging.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "promotion-pipeline" with this command: npx skills add ionfury/homelab/ionfury-homelab-promotion-pipeline

Promotion Pipeline

The homelab uses an OCI artifact promotion pipeline for immutable, auditable deployments. Changes flow through three stages: build, validate in integration, promote to live. This skill covers end-to-end tracing and debugging.

Pipeline Overview

PR merged to main (kubernetes/ changed) | v build-platform-artifact.yaml (GHA)

  • Discovers latest stable tag in GHCR, bumps patch
  • Pushes OCI artifact with tag X.Y.Z-rc.N
  • Adds tags: sha-<short>, integration-<short> | v Integration Cluster
  • OCIRepository polls GHCR with semver ">= 0.0.0-0" (includes RCs)
  • Detects new X.Y.Z-rc.N (higher than previous stable)
  • Flux reconciles platform Kustomization | v Flux Alert (validation-success)
  • Watches platform Kustomization for "Reconciliation finished"
  • Fires repository_dispatch to GitHub (event_type: Kustomization/platform.flux-system)
  • Idempotency guard: workflow skips if artifact already has validated-<sha> tag | v tag-validated-artifact.yaml (GHA)
  • Finds integration-<sha> artifact, extracts RC tag
  • Strips RC suffix: X.Y.Z-rc.N --> X.Y.Z
  • Tags artifact: validated-<sha> + X.Y.Z (stable semver) | v Live Cluster
  • OCIRepository polls GHCR with semver ">= 0.0.0" (stable only)
  • Detects new X.Y.Z stable tag
  • Flux reconciles platform (production deployment)

Artifact Tagging Strategy

Each artifact accumulates tags as it progresses through the pipeline:

Tag Created By Stage Purpose

X.Y.Z-rc.N

build workflow Build Pre-release semver for integration polling

sha-<7char>

build workflow Build Immutable commit reference

integration-<7char>

build workflow Build Marks artifact for integration consumption

validated-<7char>

tag workflow Promotion Traceability for validated artifacts

X.Y.Z

tag workflow Promotion Stable semver for live polling

Version numbering: The build workflow queries GHCR for the highest stable X.Y.Z tag, bumps patch to X.Y.(Z+1) , then creates X.Y.(Z+1)-rc.N . When validated, the RC suffix is stripped to produce X.Y.(Z+1) .

Source Types by Cluster

Cluster Source Type Semver Constraint What It Accepts

dev GitRepository N/A Git main branch directly

integration OCIRepository

= 0.0.0-0

All versions including pre-releases (-rc.N )

live OCIRepository

= 0.0.0

Stable versions only (no -rc suffix)

The semver constraint is set in the config module (infrastructure/modules/config/main.tf ) and applied via flux-operator bootstrap. The -0 suffix in >= 0.0.0-0 is what allows pre-release versions per semver specification.

Tracing a Change End-to-End

Stage 1: GitHub Actions Build

Check if build workflow triggered

gh run list --workflow=build-platform-artifact.yaml --limit=5

View specific run details

gh run view <run-id>

Check workflow logs

gh run view <run-id> --log

The build triggers on push to main when kubernetes/** files change. If no Kubernetes files changed, the workflow does not run.

Stage 2: OCI Artifact in GHCR

List recent artifacts and their tags

flux list artifact oci://ghcr.io/<owner>/homelab/platform --limit=10

Find artifact for a specific commit

flux list artifact oci://ghcr.io/<owner>/homelab/platform | grep <short-sha>

Stage 3: Integration Cluster Pickup

Check OCIRepository status (is it seeing the new artifact?)

KUBECONFIG=~/.kube/integration.yaml kubectl get ocirepository -n flux-system -o wide

Check what version is currently deployed

KUBECONFIG=~/.kube/integration.yaml kubectl get ocirepository flux-system -n flux-system -o jsonpath='{.status.artifact.revision}'

Check platform Kustomization reconciliation

KUBECONFIG=~/.kube/integration.yaml kubectl get kustomization platform -n flux-system

Force reconciliation if stuck

KUBECONFIG=~/.kube/integration.yaml flux reconcile source oci flux-system -n flux-system

Stage 4: Validation Alert

Check the validation-success Alert status

KUBECONFIG=~/.kube/integration.yaml kubectl describe alert validation-success -n flux-system

Check the github-dispatch Provider

KUBECONFIG=~/.kube/integration.yaml kubectl get providers -n flux-system

Check if Alert fired recently (events)

KUBECONFIG=~/.kube/integration.yaml kubectl get events -n flux-system --field-selector involvedObject.name=validation-success

Stage 5: Tag Workflow

Check if tag workflow triggered

gh run list --workflow=tag-validated-artifact.yaml --limit=5

If using workflow_dispatch for manual promotion

gh workflow run tag-validated-artifact.yaml -f artifact_sha=<7char-sha>

Stage 6: Live Cluster Pickup

Check OCIRepository status

KUBECONFIG=~/.kube/live.yaml kubectl get ocirepository -n flux-system -o wide

Check current deployed version

KUBECONFIG=~/.kube/live.yaml kubectl get ocirepository flux-system -n flux-system -o jsonpath='{.status.artifact.revision}'

Check platform Kustomization

KUBECONFIG=~/.kube/live.yaml kubectl get kustomization platform -n flux-system

Debugging: Artifact Stuck in Integration

Is the OCI artifact in GHCR? | +-- NO --> Check build-platform-artifact workflow | - Did the workflow trigger? (push to main with kubernetes/ changes) | - Check GHCR auth: GITHUB_TOKEN must have packages:write | - Check workflow logs for "flux push artifact" errors | +-- YES -> Is integration OCIRepository seeing it? | +-- NO --> Check semver constraint | - Must be ">= 0.0.0-0" to accept RC versions | - Run: kubectl get ocirepository -n flux-system -o yaml | grep semver | - Check OCIRepository .status.conditions for errors | +-- YES -> Is platform Kustomization reconciling? | +-- NO --> Check Kustomization status | - kubectl describe kustomization platform -n flux-system | - Look for dependency failures, schema errors | +-- YES -> Is the Alert firing repository_dispatch? | +-- NO --> Check Alert and Provider | - Alert "validation-success" must watch platform Kustomization | - Provider "github-dispatch" needs flux-system secret with GitHub token | - Token needs repo scope for repository_dispatch | +-- YES -> Check tag-validated-artifact workflow - Idempotency guard: already has validated-<sha> tag? - Check workflow logs for tag errors

Debugging: Live Not Updating

Is the artifact tagged with stable semver (X.Y.Z)? | +-- NO --> Promotion did not complete | - Check tag-validated-artifact workflow ran successfully | - Verify it created both validated-<sha> and X.Y.Z tags | +-- YES -> Is live OCIRepository seeing the stable tag? | +-- NO --> Check semver constraint | - Must be ">= 0.0.0" (excludes pre-releases) | - Verify the stable tag is higher than current deployed version | - Force poll: flux reconcile source oci flux-system -n flux-system | +-- YES -> Is Kustomization reconciling? | +-- NO --> Check Kustomization status and dependencies +-- YES -> Deployment should be in progress - Check HelmRelease statuses: flux get helmreleases -A - Check for failing health checks blocking rollout

Canary-Checker Validation

The platform-validation Canary in the monitoring namespace runs health checks every 60 seconds:

Check Type What It Validates

kubernetes-api

HTTP Kubernetes API responds (200 or 401)

flux-pods-healthy

Kubernetes All Flux pods in Running state with Ready condition

Check canary status

KUBECONFIG=~/.kube/integration.yaml kubectl get canaries -n monitoring

Check individual check results

KUBECONFIG=~/.kube/integration.yaml kubectl describe canary platform-validation -n monitoring

Check canary-checker metrics in Prometheus

canary_check{name="platform-validation"} == 0 means healthy

Alerts fire if canary checks fail:

Alert Condition Severity

CanaryCheckFailure

canary_check == 1 for 2m critical

CanaryCheckHighFailureRate

20% failure rate over 15m warning

Manual Promotion (Emergency)

When automatic promotion fails, manually tag the artifact:

Authenticate to GHCR

echo $GITHUB_TOKEN | docker login ghcr.io -u $GITHUB_USER --password-stdin

Find the integration artifact

flux list artifact oci://ghcr.io/<owner>/homelab/platform | grep integration

Tag manually (replace <sha> with 7-char commit SHA)

flux tag artifact
oci://ghcr.io/<owner>/homelab/platform:integration-<sha>
--tag validated-<sha>

flux tag artifact
oci://ghcr.io/<owner>/homelab/platform:integration-<sha>
--tag <X.Y.Z> # The stable semver to assign

Alternatively, use workflow_dispatch to trigger the tag workflow manually:

gh workflow run tag-validated-artifact.yaml -f artifact_sha=<7char-sha>

Rollback Procedure

Option 1: Pin OCIRepository to a Specific Version

Find previous stable artifact

flux list artifact oci://ghcr.io/<owner>/homelab/platform | grep -E '^\d+.\d+.\d+$'

Patch live OCIRepository to pin a specific tag

KUBECONFIG=~/.kube/live.yaml kubectl patch ocirepository flux-system -n flux-system
--type=merge
-p '{"spec":{"ref":{"tag":"<previous-stable-tag>"}}}'

Remember to revert the pin after fixing the issue -- otherwise new promotions will be ignored.

Option 2: Revert the PR and Let Pipeline Run

The safest rollback is to revert the breaking PR on main. The pipeline will build a new artifact with the reverted state, which will naturally promote through integration to live.

Option 3: Re-tag a Previous Artifact

Tag a known-good artifact with a higher stable semver

flux tag artifact
oci://ghcr.io/<owner>/homelab/platform:validated-<old-sha>
--tag <higher-X.Y.Z>

This works because the live OCIRepository picks the highest semver. Ensure the new tag is higher than the current one.

Common Failure Modes

Symptom Cause Fix

Build succeeds, integration does not update OCIRepository semver does not match RC tags Verify >= 0.0.0-0 in OCIRepository spec

Validation passes, live does not update Tag workflow did not create stable semver tag Check tag-validated-artifact workflow logs

repository_dispatch not received by GHA GitHub token in flux-system secret lacks repo scope Update token with correct scopes

Tag workflow fires repeatedly (~10min) Alert fires on every Flux reconciliation cycle Normal -- idempotency guard skips already-validated artifacts

Artifact push fails in build workflow GHCR auth issue Check GITHUB_TOKEN has packages:write permission

Live picks up wrong version Semver ordering issue with RC numbering Verify stable tag is strictly higher than current

Integration shows "no matching artifact" OCIRepository URL or semver misconfigured Check oci_url and oci_semver in cluster bootstrap config

Key Files Reference

File Purpose

.github/workflows/build-platform-artifact.yaml

Build and push OCI artifact on merge to main

.github/workflows/tag-validated-artifact.yaml

Promote validated artifact (tag stable semver)

kubernetes/platform/config/flux-notifications/canary-alert.yaml

Alert that triggers repository_dispatch

kubernetes/platform/config/flux-notifications/github-provider.yaml

GitHub dispatch provider for Flux alerts

kubernetes/platform/config/canary-checker/platform-health.yaml

Platform health validation checks

infrastructure/modules/config/main.tf

OCI semver constraints per cluster

infrastructure/modules/bootstrap/resources/instance-oci.yaml.tftpl

OCIRepository bootstrap template

Cross-References

Document Focus

.github/CLAUDE.md

Complete pipeline architecture and debugging guide

kubernetes/clusters/CLAUDE.md

Per-cluster source types and promotion path

kubernetes/platform/CLAUDE.md

Flux patterns, version management

flux-gitops skill Adding Helm releases and ResourceSet patterns

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

security-testing

No summary provided by upstream source.

Repository SourceNeeds Review
General

prometheus

No summary provided by upstream source.

Repository SourceNeeds Review
General

opentofu-modules

No summary provided by upstream source.

Repository SourceNeeds Review
General

taskfiles

No summary provided by upstream source.

Repository SourceNeeds Review