quality-capture-baseline

Captures quality metrics baseline (tests, coverage, type errors, linting, dead code) by running quality gates and storing results in memory for regression detection. Use at feature start, before refactor work, or after major changes to establish baseline. Triggers on "capture baseline", "establish baseline", or PROACTIVELY at start of any feature/refactor work. Works with pytest output, pyright errors, ruff warnings, vulture results, and memory MCP server for baseline storage.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "quality-capture-baseline" with this command: npx skills add dawiddutoit/custom-claude/dawiddutoit-custom-claude-quality-capture-baseline

Capture Quality Baseline

Table of Contents

Core Sections

Purpose

Establish a quality metrics baseline at the start of a feature or refactor so that subsequent quality gate runs can detect regressions. This skill is a critical component of the project's regression detection strategy.

When to Use

Use this skill when:

  • Starting any new feature (before writing code)
  • Beginning refactor work (to document pre-state)
  • After major changes (dependency upgrades, architecture shifts)
  • When @planner creates a new todo.md
  • When @implementer starts task 0 of a feature
  • User asks to "capture baseline" or "establish baseline metrics"
  • Before major architectural changes to track impact

Quick Start

When to invoke this skill:

  • ✅ At the START of any new feature (before writing code)
  • ✅ Before beginning refactor work (to document pre-state)
  • ✅ After major changes (dependency upgrades, architecture shifts)
  • ❌ NOT after you've already started coding (too late)

Basic usage:

1. Run quality checks: ./scripts/check_all.sh
2. Parse metrics (tests, coverage, type errors, linting, dead code)
3. Create memory entity: baseline_{feature}_{date}
4. Return baseline reference for future comparison

Instructions

Step 1: Prepare Context

Before running quality checks, gather:

  • Feature name or refactor identifier (e.g., "auth", "service-result-migration")
  • Current git commit hash: git rev-parse HEAD
  • Current timestamp: date +"%Y-%m-%d %H:%M:%S"

Example:

FEATURE="user_profile"
GIT_COMMIT=$(git rev-parse HEAD)
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
DATE=$(date +"%Y-%m-%d")

Step 2: Run Quality Checks

Primary method (recommended):

./scripts/check_all.sh

Fallback (if check_all.sh fails):

# Run individual checks
uv run pytest tests/ -v --cov=src --cov-report=term-missing
uv run pyright src/
uv run ruff check src/
uv run vulture src/

Capture output:

  • Store stdout and stderr
  • Measure execution time
  • Note any failures (document them, don't skip)

Step 3: Parse Metrics

Use the parsing patterns from references/parsing.md. Quick reference:

Tests:

# Pattern: "X passed" or "X failed" or "X skipped"
PASSED=$(grep -oE "[0-9]+ passed" output.txt | grep -oE "[0-9]+")
FAILED=$(grep -oE "[0-9]+ failed" output.txt | grep -oE "[0-9]+")
SKIPPED=$(grep -oE "[0-9]+ skipped" output.txt | grep -oE "[0-9]+")

Coverage:

# Pattern: "TOTAL ... X%"
COVERAGE=$(grep "TOTAL" output.txt | grep -oE "[0-9]+%" | tail -1)

Type Errors:

# Pattern: "X errors, Y warnings" or "0 errors"
TYPE_ERRORS=$(grep -oE "[0-9]+ error" output.txt | grep -oE "[0-9]+" | head -1)

Linting:

# Pattern: "Found X errors"
LINTING_ERRORS=$(grep -oE "Found [0-9]+ error" output.txt | grep -oE "[0-9]+")

Dead Code:

# Pattern: Count vulture output lines or "X% unused"
DEAD_CODE=$(vulture src/ 2>&1 | grep -v "^$" | wc -l | xargs)

Execution Time:

# Measure with time command or calculate from timestamps
EXEC_TIME=$(echo "scale=1; $END_TIME - $START_TIME" | bc)

Step 4: Create Memory Entity

Entity structure:

name: quality-capture-baseline
type: quality_baseline
observations:
  - "Tests: {passed} passed, {failed} failed, {skipped} skipped"
  - "Coverage: {coverage}%"
  - "Type errors: {type_errors}"
  - "Linting errors: {linting_errors}"
  - "Dead code: {dead_code} items"
  - "Execution time: {exec_time}s"
  - "Git commit: {git_hash}"
  - "Timestamp: {timestamp}"
  - "Feature: {feature_name}"

Example:

mcp__memory__create_entities({
    "entities": [{
        "name": f"baseline_{feature}_{date}",
        "type": "quality_baseline",
        "observations": [
            f"Tests: {passed} passed, {failed} failed, {skipped} skipped",
            f"Coverage: {coverage}%",
            f"Type errors: {type_errors}",
            f"Linting errors: {linting_errors}",
            f"Dead code: {dead_code} items",
            f"Execution time: {exec_time}s",
            f"Git commit: {git_commit}",
            f"Timestamp: {timestamp}",
            f"Feature: {feature}"
        ]
    }]
})

If pre-existing issues exist: Add additional observation:

"Pre-existing issues: 3 type errors in legacy code (documented, will not be fixed)"

Step 5: Return Baseline Reference

Success output format:

✅ Baseline captured: baseline_{feature}_{date}

Metrics:
- Tests: {passed} passed ({coverage}% coverage)
- Type safety: {status} ({type_errors} errors)
- Code quality: {status} ({linting_errors} linting, {dead_code} dead code items)
- Execution: {exec_time}s
- Git: {git_hash}

Use this baseline for regression detection.

With pre-existing issues:

✅ Baseline captured: baseline_{feature}_{date}

Metrics:
- Tests: 152 passed (89% coverage)
- Type safety: 3 errors (pre-existing, documented)
- Code quality: Clean
- Git: abc123f

⚠️  Note: 3 pre-existing type errors documented, will not cause regression failures.

Use this baseline for regression detection.

Agent Integration

@planner Usage

When: At plan creation, BEFORE creating todo.md

Workflow:

  1. Receive feature request from user
  2. Analyze requirements
  3. → Invoke capture-quality-baseline
  4. Receive baseline reference
  5. Create todo.md with baseline in header
  6. Create memory entities linking to baseline

Example:

# Todo: User Profile Feature
Baseline: baseline_user_profile_2025-10-16
Created: 2025-10-16

## Tasks
- [ ] Task 1 (references baseline for quality gates)
...

@implementer Usage

When: At feature start (task 0) or before refactor work

Workflow:

  1. Validate ADR (if refactor) using validate-refactor-adr
  2. → Invoke capture-quality-baseline
  3. Receive baseline reference
  4. Reference baseline in quality gate runs
  5. Compare final metrics against baseline

Example:

User: "Start implementing authentication"

@implementer:
1. Invoke capture-quality-baseline
2. Receives: baseline_auth_2025-10-16
3. Begin task 1 implementation
4. After each task: Run quality gates, compare to baseline
5. Report: "All metrics maintained or improved from baseline"

@statuser Usage

When: User requests baseline recapture or progress check

Workflow:

  1. Receive progress check request
  2. If major changes detected: → Invoke capture-quality-baseline
  3. Compare new baseline to original
  4. Report delta and recommend re-baseline if appropriate

Example:

User: "Check progress after dependency upgrade"

@statuser:
1. Detects major change (dependency upgrade)
2. Invokes capture-quality-baseline
3. Receives: baseline_post_upgrade_2025-10-16
4. Compares to original baseline
5. Reports: "2 tests removed (deprecated APIs), coverage maintained, type errors resolved"

Edge Cases

Quality Checks Fail

Scenario: Some checks fail during baseline capture

Handling:

  1. Still capture the baseline (document failures)
  2. Add observation: "Baseline captured WITH failures"
  3. Flag in output: "⚠️ Baseline has failures - must fix before feature work"
  4. Return baseline reference (still usable for comparison)

Example:

✅ Baseline captured: baseline_auth_2025-10-16

⚠️  WARNING: Baseline has failures

Metrics:
- Tests: 140 passed, 5 FAILED, 2 skipped
- Type safety: 2 errors
- Code quality: 3 linting errors

🛑 FIX these failures before starting feature work.

No Git Repository

Scenario: Not in a git repository (or git not available)

Handling:

  1. Skip git commit capture
  2. Continue with other metrics
  3. Add observation: "No git commit (not in repository)"
  4. Note in output: "Git: Not available"

Script Not Found

Scenario: ./scripts/check_all.sh doesn't exist

Handling:

  1. Try individual checks (pytest, pyright, ruff, vulture)
  2. If any work: Use those results
  3. If none work: Report error and abort

Error message:

❌ Cannot capture baseline: Quality check scripts not found

Tried:
- ./scripts/check_all.sh (not found)
- uv run pytest (not found)
- uv run pyright (not found)

Cannot establish baseline without quality checks.

Memory Service Unavailable

Scenario: Cannot create memory entity (MCP server down)

Handling:

  1. Store baseline in local file: .quality-baseline-{feature}-{date}.json
  2. Return file path as reference
  3. Note in output: "Baseline stored locally (memory unavailable)"

Fallback file format:

{
  "baseline_name": "baseline_auth_2025-10-16",
  "metrics": {
    "tests": {"passed": 145, "failed": 0, "skipped": 2},
    "coverage": 87,
    "type_errors": 0,
    "linting_errors": 0,
    "dead_code": 3,
    "execution_time": 8.3,
    "git_commit": "abc123f",
    "timestamp": "2025-10-16 10:30:00"
  }
}

Anti-Patterns

DON'T:

  • Skip baseline capture (regression detection requires it)
  • Reuse old baselines for new features (each feature needs its own)
  • Capture baseline after starting work (too late)
  • Ignore failing checks in baseline (document them)
  • Forget to reference baseline in todo.md

DO:

  • Capture baseline BEFORE any feature work
  • Create unique baseline per feature
  • Document pre-existing issues in baseline
  • Reference baseline in todo.md and memory
  • Re-baseline after major changes (upgrades, migrations)

Integration with Other Skills

Skill: run-quality-gates

  • capture-quality-baseline: Establishes the reference point
  • run-quality-gates: Compares current metrics against baseline

Skill: validate-refactor-adr

  • validate-refactor-adr: Validates ADR completeness
  • capture-quality-baseline: Documents pre-refactor state

Skill: manage-refactor-markers

  • capture-quality-baseline: Establishes pre-refactor metrics
  • manage-refactor-markers: Tracks refactor progress

Skill: manage-todo

  • capture-quality-baseline: Provides baseline reference
  • manage-todo: Stores baseline name in todo.md header

Examples

See examples.md for comprehensive scenarios:

  1. Feature start baseline (clean state)
  2. Refactor start baseline (with pre-existing issues)
  3. Re-baseline after major change
  4. Baseline with failures
  5. Agent auto-invocation patterns
  6. Cross-agent baseline sharing
  7. Baseline comparison workflows
  8. Recovery from failed baseline capture

Reference

  • Metrics parsing: See references/parsing.md for detailed parsing logic
  • Baseline lifecycle: See references/lifecycle.md for management best practices
  • Quality gates: See project's ../run-quality-gates/references/shared-quality-gates.md for gate definitions

Success Criteria

  • Quality checks execute successfully (or failures documented)
  • All 5 metrics extracted (tests, coverage, type errors, linting, dead code)
  • Memory entity created with baseline data
  • Baseline name returned for reference
  • Execution time < 30 seconds (typically ~8s)
  • Git commit captured (if available)
  • Timestamp recorded
  • Pre-existing issues documented (if any)

Troubleshooting

Problem: Metrics parsing fails (empty values)

Solution:

  1. Check quality check output format (may have changed)
  2. Review parsing patterns in references/parsing.md
  3. Update regex patterns if needed
  4. Fallback: Manually parse critical metrics

Problem: Baseline creation succeeds but can't be found later

Solution:

  1. Verify memory entity name format: baseline_{feature}_{date}
  2. Search memory: mcp__memory__search_memories(query="baseline")
  3. Check fallback file: .quality-baseline-*.json

Problem: Quality checks take too long (> 30s)

Solution:

  1. Check if Neo4j is running (connection timeout)
  2. Skip slow tests: pytest -m "not slow"
  3. Run checks in parallel (check_all.sh does this)
  4. Report timing issue to user

Last Updated: 2025-10-16 Priority: High (critical for regression detection) Dependencies: Quality gate scripts, memory MCP server

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

python-best-practices-fail-fast-imports

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

python-best-practices-async-context-manager

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

uv-python-version-management

No summary provided by upstream source.

Repository SourceNeeds Review