1-min-eval

1-Minute Codebase Evaluation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "1-min-eval" with this command: npx skills add topvibecoder/eval/topvibecoder-eval-1-min-eval

1-Minute Codebase Evaluation

Fast, parallel evaluation of codebases using Claude CLI with structured metrics.

Features

  • ✅ Smart Scanning: Automatically skips .claude/ , node_modules/ , .git/ , and previous eval_* results

  • ✅ Parallel Evaluation: Runs multiple metrics concurrently for speed

  • ✅ Auto Ranking: Submits to TopVibeCoder API and gets your rank

  • ✅ Progress Tracking: Saves ranking history to track improvements over time

  • ✅ Detailed Reports: Generates comprehensive markdown reports with citations

  • ✅ Terminal Bar Chart: Visual score display with Unicode block characters

Quick Start

Evaluate current directory (use by default)

.claude/skills/1-min-eval/scripts/run_eval.sh .

Evaluate with specific metrics

.claude/skills/1-min-eval/scripts/run_eval.sh /path/to/project --metrics impact,technical

Full evaluation with all metrics (DO NOT use by default)

.claude/skills/1-min-eval/scripts/run_eval.sh /path/to/project --all-metrics

How It Works

  • Scan: scan_codebase.py extracts repo tree and source code with line numbers

  • Evaluate: Runs parallel claude -p calls for each metric

  • Aggregate: Combines JSON results into a final report

  • Visualize: Displays terminal bar chart with scores

Example Output

After evaluation completes, you'll see a visual bar chart:

================================================== 📊 Evaluation Scores

presentation 6.25 | ████████████░░░░░░░░ impact 5.25 | ██████████░░░░░░░░░░ technical 1.75 | ███░░░░░░░░░░░░░░░░░ creativity 0.50 | █░░░░░░░░░░░░░░░░░░░ prompt_design 0.00 | ░░░░░░░░░░░░░░░░░░░░

Available Metrics

Metric Description

impact Real-world problem solving, usable experience

technical Architecture, robustness, LLM integration

creativity Originality, novel LLM usage

presentation UX clarity, onboarding, demo quality

prompt_design Prompt structure, staging, constraints

security Secure coding, auth, dependency hygiene

completion Description-to-code alignment

monetization Business potential analysis

Scoring Scale (0.00-10.00)

Range Meaning

0.00-2.50 Barely functional, major gaps

2.51-4.50 Minimal implementation, weak

4.51-6.50 Working but basic, clear gaps

6.51-8.50 Solid implementation, good quality

8.51-10.00 Excellent, production-ready

Configuration

Variable Default Description

EVAL_PARALLEL 4 Number of parallel evaluations

EVAL_TIMEOUT 300 Timeout per metric (seconds)

EVAL_MAX_CHARS 300000 Max chars to include

EVAL_MODEL claude-sonnet-4-5-20250929 Model to use for evaluation

Ranking & Progress Tracking

Results are automatically submitted to the TopVibeCoder ranking API to get:

  • Overall rank and percentile

  • Per-metric rankings (individual rank for each metric)

  • Comparison with nearby apps

  • Historical progress tracking

Rankings are saved to ranking_history.jsonl in the output directory and to .evals/history.jsonl for unified tracking across all evaluations.

Note: The ranking API uses browser-like headers to bypass Cloudflare protection, ensuring reliable submissions. If the API fails, the evaluation continues and results are still saved locally.

Output Structure

Results saved to .evals/<timestamp>_<project>/ (hidden directory):

  • codebase.md

  • Scanned source code

  • codebase.json

  • Structured metadata

  • prompts/

  • Generated evaluation prompts

  • results/

  • JSON results per metric

  • logs/

  • Execution logs

  • report.md

  • Aggregated markdown report with ranking

  • ranking_history.jsonl

  • Historical ranking data (one entry per evaluation)

Note: Evaluation results are saved to a hidden .evals/ directory to keep your workspace clean. Add .evals/ to your .gitignore if you don't want to commit evaluation results.

Manual Usage

You can also run components individually:

1. Scan codebase

python3 .claude/skills/1-min-eval/scripts/scan_codebase.py ./project
--output /tmp/code.md --max-chars 300000

2. Run single metric evaluation

cat /tmp/code.md | claude -p "Evaluate for IMPACT..." --output-format json

3. Aggregate results

python3 .claude/skills/1-min-eval/scripts/aggregate.py
--input-dir ./results --output ./report.md

Adding Custom Metadata

Create metadata.json in project root:

{ "name": "My App", "description": "An AI-powered tool that...", "author": "Your Name" }

Tips

  • Large codebases: Use --max-chars 500000 for more context

  • Debugging: Add --verbose to see detailed output

  • Resume: Results are cached; re-run skips completed metrics

  • Single metric: Use --metrics impact for quick test

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

1-min-eval

No summary provided by upstream source.

Repository SourceNeeds Review
General

eval

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

eval

No summary provided by upstream source.

Repository SourceNeeds Review