fix-buildkite-ci

Diagnose Buildkite failures programmatically and avoid guessing from UI screenshots. Prefer structured build/job JSON plus artifact inspection to find the exact failing test case and mismatch, then implement the smallest correct fix.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "fix-buildkite-ci" with this command: npx skills add risingwavelabs/risingwave/risingwavelabs-risingwave-fix-buildkite-ci

Fix Buildkite CI

Overview

Diagnose Buildkite failures programmatically and avoid guessing from UI screenshots. Prefer structured build/job JSON plus artifact inspection to find the exact failing test case and mismatch, then implement the smallest correct fix.

Target Selection

Resolve triage target with this precedence:

  • If user provides a Buildkite build URL, use that build directly.

  • Else if user specifies a branch and/or a pipeline (for example pull-request , main-cron ), use the specified scope.

  • Else default to the current git branch and inspect the checks for the PR associated with that branch.

Workflow

  • Identify the failing Buildkite build(s).

  • Retrieve build JSON and list failed jobs.

  • Pull job logs and extract the first concrete failure signal.

  • Inspect artifacts when top-level logs are truncated.

  • Map failure to root cause and apply a focused fix.

  • Verify locally where feasible and summarize evidence.

Use bk CLI first. If auth is unavailable, use public Buildkite JSON/log/artifact endpoints via curl .

For exact commands and endpoint patterns, read references/buildkite-ci-triage.md .

Step 1: Identify Failing Buildkite Checks

When no explicit target is given, find the PR for the current branch first, then run gh pr checks <PR_NUMBER> to find failing checks and capture Buildkite URLs (.../builds/<N> ).

If user specifies a branch/pipeline, list and filter builds with bk build list using those parameters. If user provides a Buildkite build URL, skip discovery and start from that build number.

Step 2: Pull Build JSON and Failed Jobs

Fetch builds/<N>.json , then list failed jobs by non-zero exit_status .

Capture at least:

  • pipeline

  • build number

  • job id

  • job name

  • exit status

Step 3: Extract the Concrete Failure

Fetch each failed job log and search for high-signal patterns:

  • query result mismatch

  • [Diff] (-expected|+actual)

  • query is expected to fail with error:

  • panic/assertion lines

  • deterministic simulation error markers

  • OOM/timeout/cancellation markers

Stop once you have one concrete failing file/case and mismatch.

Step 4: Fall Back to Artifacts

If logs only show wrapper errors (for example, command exited with status), inspect artifacts from the same job, especially:

  • risedev-logs.zip

  • risedev-logs/nodetype-*.log

Extract and search artifact logs for the exact mismatch.

Step 5: Apply Focused Fixes

Prefer minimal fixes tied to evidence:

  • SQLLogicTest mismatch: update expected sections in the correct .slt /.slt.part file only when query output change is intentional.

  • Wrong runtime behavior: fix source code and keep tests as-is.

  • Flaky/cancellation-only signal (143 ): treat as infra/cancel unless corroborated by product errors.

Avoid broad "retry and hope" actions without root-cause evidence.

Step 6: Verify and Report

Run the narrowest local check that validates the fix when possible. If full validation is not feasible, state it explicitly.

Always report:

  • failing check/build/job identifiers

  • failing file/test/case

  • exact mismatch/error evidence

  • applied fix (files changed)

  • verification status and remaining risk

Buildkite-Specific Heuristics

  • Exit code 105 : often wrapper failure from docker-compose/plugin; inspect SLT/e2e logs for true mismatch.

  • Exit code 4 : common in simulation/recovery steps; inspect uploaded simulation logs.

  • Exit code 143 : usually cancellation/termination, not a deterministic product regression.

  • raw_log_url may be null in JSON; use explicit job log endpoints by job id.

  • Prefer JSON endpoints plus jq ; avoid scraping large HTML pages.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

openclaw-version-monitor

监控 OpenClaw GitHub 版本更新,获取最新版本发布说明,翻译成中文, 并推送到 Telegram 和 Feishu。用于:(1) 定时检查版本更新 (2) 推送版本更新通知 (3) 生成中文版发布说明

Archived SourceRecently Updated
Coding

ask-claude

Delegate a task to Claude Code CLI and immediately report the result back in chat. Supports persistent sessions with full context memory. Safe execution: no data exfiltration, no external calls, file operations confined to workspace. Use when the user asks to run Claude, delegate a coding task, continue a previous Claude session, or any task benefiting from Claude Code's tools (file editing, code analysis, bash, etc.).

Archived SourceRecently Updated
Coding

ai-dating

This skill enables dating and matchmaking workflows. Use it when a user asks to make friends, find a partner, run matchmaking, or provide dating preferences/profile updates. The skill should execute `dating-cli` commands to complete profile setup, task creation/update, match checking, contact reveal, and review.

Archived SourceRecently Updated
Coding

clawhub-rate-limited-publisher

Queue and publish local skills to ClawHub with a strict 5-per-hour cap using the local clawhub CLI and host scheduler.

Archived SourceRecently Updated