Fix CI Failures
Diagnose and fix failed GitHub Actions CI jobs for the current branch/PR using gh CLI and git commands.
When to Use
-
CI checks have failed on a PR
-
You need to understand why a workflow failed
-
You want to apply fixes and verify locally
Workflow
Copy this checklist to track progress:
- Verify authentication
- Gather context & find failed jobs
- Download & analyze logs
- Present diagnosis to user
- Apply fix & verify locally
- Push & recheck CI
- Verify Authentication
gh auth status
If authentication fails, prompt user to run gh auth login with appropriate scopes.
- Gather PR Context
Get PR for current branch
gh pr view --json number,title,url,headRefName
Get PR description and metadata
gh pr view --json title,body,labels,author
List changed files
gh pr diff --name-only
All changes
gh pr diff
- Check CI Status
List all checks (shows pass/fail status)
gh pr checks
Get detailed check info
gh pr checks --json name,state,conclusion,detailsUrl,startedAt,completedAt
List only failed runs
gh run list --branch $(git branch --show-current) --status failure --limit 10
Check if CI is still running
gh run list --branch $(git branch --show-current) --status in_progress
- Find Failed Jobs
View run details (get RUN_ID from previous step)
gh run view {RUN_ID}
List failed jobs with IDs
gh run view {RUN_ID} --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {id: .databaseId, name: .name}'
List failed jobs with their failed steps
gh run view {RUN_ID} --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name: .name, steps: [.steps[] | select(.conclusion == "failure") | .name]}'
- Download & Analyze Logs
Primary method:
Get failed logs (last 250 lines usually contains the error)
gh run view {RUN_ID} --log-failed 2>&1 | tail -250
Target a specific failed job by ID
gh run view {RUN_ID} --job {JOB_ID} --log-failed 2>&1 | tail -100
Fallback for pending logs:
REPO=$(gh repo view --json nameWithOwner --jq '.nameWithOwner') gh api "/repos/${REPO}/actions/jobs/{JOB_ID}/logs"
Smart log extraction (examples):
Context around failure markers
gh run view {RUN_ID} --log-failed 2>&1 | grep -B 5 -A 10 -iE "error|fail|exception|traceback|panic|fatal" | head -100
Python tests - pytest summary
gh run view {RUN_ID} --log-failed 2>&1 | grep -E -A 50 "FAILED|ERROR|short test summary"
TypeScript/ESLint errors
gh run view {RUN_ID} --log-failed 2>&1 | grep -E -B 2 -A 5 "error TS|error "
E2E snapshot mismatches
gh run view {RUN_ID} --log-failed 2>&1 | grep -E -B 2 -A 5 "Missing snapshot for|Snapshot mismatch for"
- Analyze Failure
Identify:
-
Error type: Lint, type check, test failure, build error
-
Root cause: First/primary error (not cascading failures)
-
Affected files: Which files need changes
-
Error message: Exact error text
Common CI failure categories:
Category Workflow Make Command Auto-fix
Python lint python-tests.yml
make python-lint
✅ make autofix
Python types python-tests.yml
make python-types
❌ Manual
Python tests python-tests.yml
make python-tests
❌ Manual
Frontend lint js-tests.yml
make frontend-lint
✅ make autofix
Frontend types js-tests.yml
make frontend-types
❌ Manual
Frontend tests js-tests.yml
make frontend-tests
❌ Manual
E2E tests playwright.yml
make run-e2e-test <file>
❌ Manual
E2E snapshots playwright.yml
make run-e2e-test <file>
✅ make update-snapshots
NOTICES js-tests.yml
make update-notices
✅ make update-notices
Min constraints python-tests.yml
make update-min-deps
✅ make update-min-deps
Pre-commit enforce-pre-commit.yml
uv run pre-commit run --all-files
✅ Mostly auto-fix
Relative imports ensure-relative-imports.yml
Check script output ❌ Manual
PR Labels require-labels.yml
N/A ⏭️ Ignore
💡 Quick win: Run make autofix first for lint/formatting failures.
- Present Diagnosis
For multiple failures, list all and let user choose:
CI Failure Analysis for PR #{NUMBER}: {TITLE} ═══════════════════════════════════════════════════════════════
Found {N} failed jobs/checks:
─────────────────────────────────────────────────────────────────
-
[LINT] Python Unit Tests → Run Linters Workflow: python-tests.yml (GitHub Actions) Error: Ruff formatting error in lib/streamlit/elements/foo.py Auto-fix: ✅
make autofix -
[TYPE] Javascript Unit Tests → Run type checks Workflow: js-tests.yml (GitHub Actions) Error: TS2322: Type 'string' is not assignable to type 'number' File: frontend/lib/src/components/Bar.tsx:42 Auto-fix: ❌ Manual fix required
─────────────────────────────────────────────────────────────────
Which failures should I address? Recommended: "1" (auto-fixable) Options: "1" | "1,2" | "1-2" | "all" | "only auto-fixable"
For single failure, show detailed analysis:
───────────────────────────────────────────────────────────────── Analyzing: [TYPE] Javascript Unit Tests → Run type checks ─────────────────────────────────────────────────────────────────
Category: TYPE Workflow: js-tests.yml Job: js-unit-tests (ID: 12345678) Step: Run type checks
Error snippet: frontend/lib/src/components/Bar.tsx:42:5 error TS2322: Type 'string' is not assignable to type 'number'.
Proposed Fix: Change type annotation or fix the value type
─────────────────────────────────────────────────────────────────
Would you like me to: [1] Apply the fix automatically [2] Show the proposed changes first [3] Run local verification only [4] Skip this and move to next failure
- Apply Fix & Verify Locally
After user approval, apply fix and run verification:
Run all checks (lint, types, tests) on changed files
make check
Python tests (specific)
uv run pytest lib/tests/path/to/test_file.py::test_name -v
Frontend tests (specific)
cd frontend && yarn test path/to/test.test.tsx
E2E tests
make run-e2e-test {test_file.py}
E2E snapshots
make update-snapshots
- Summary & Push
git status --short git diff --stat
Report what failed, what changed, and local verification result.
git add -A git commit -m "fix: resolve CI failure in {workflow/step}" git push
- Recheck CI Status
gh pr checks --watch
Or re-run failed jobs
gh run rerun {RUN_ID} --failed
Rules
-
Focus on root cause: First error, not cascading failures
-
Minimal fixes: Smallest change that fixes the issue
-
Don't skip tests: Never disable tests to "fix" CI
-
Verify locally: Always run appropriate local command
-
Preserve intent: Understand what code was trying to do
Error Handling
Issue Solution
Auth failed gh auth login with workflow/repo scopes
No PR for branch gh run list to check workflow runs
CI still running gh pr checks --watch
Logs pending Retry with job logs API
No failed checks All passing ✅
Rate limited Wait and retry
Flaky test Re-run: gh run rerun {RUN_ID} --failed