evaluate

Comprehensive quality evaluation for any AI-generated artifact. Produces its report as a visualization.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "evaluate" with this command: npx skills add careerhackeralex/visualize/careerhackeralex-visualize-evaluate

Evaluate

Comprehensive quality evaluation for any AI-generated artifact. Produces its report as a visualization.

How It Works

┌──────────────────────────────────────────────┐ │ │ │ Phase 1: SPEC GENERATION │ │ Analyze the artifact type │ │ Generate tailored evaluation criteria │ │ Define scoring dimensions + weights │ │ Set quality gates │ │ │ │ │ ▼ │ │ Phase 2: EVALUATION │ │ Run automated checks (when possible) │ │ Visual/manual inspection │ │ Score each dimension with evidence │ │ Identify systemic vs local issues │ │ │ │ │ ▼ │ │ Phase 3: REPORT (via /visualize) │ │ Generate a beautiful HTML eval report │ │ Scores, charts, screenshots, fix list │ │ Radar chart of dimensions │ │ Before/after tracking │ │ │ └──────────────────────────────────────────────┘

Phase 1: Spec Generation

For any artifact, generate evaluation specs by analyzing:

  1. Identify Artifact Type
  • HTML Visualization → visual design, interactivity, technical, content, shareability

  • Code/Project → correctness, readability, architecture, test coverage, performance

  • Document/Report → clarity, structure, accuracy, completeness, tone

  • Conversation/Agent → helpfulness, accuracy, tone, efficiency, safety

  • Slide Deck → all visualization dims + narrative flow, persuasion, pacing

  • Dashboard → data accuracy, information density, scannability, actionability

  • Custom → derive dimensions from the skill's SKILL.md and stated goals

  1. Generate Dimensions

For each artifact type, produce 6-10 evaluation dimensions. Each dimension needs:

  • Name — short, clear label

  • Description — what this dimension measures

  • Weight — percentage (all weights sum to 100%)

  • Scoring anchors — what does a 10, 8, 6, 4 look like?

  • Automated checks — any programmatic tests (if applicable)

  • Deductions — specific issues and their point costs

  1. Set Quality Gates

Define gates based on the artifact's purpose:

Gate Criteria Meaning

🚀 EXCEPTIONAL Overall ≥ 9.5, all ≥ 9 Best-in-class. Share everywhere.

✅ SHIP Overall ≥ 9.0, all ≥ 8 Production-ready.

⚠️ ACCEPTABLE Overall ≥ 8.0, all ≥ 7 Usable but not impressive.

🔧 NEEDS WORK Overall ≥ 7.0 or any < 7 Fix before releasing.

❌ FAIL Overall < 7.0 or any < 5 Major rework.

  1. Output Spec Document

Write the spec to eval-spec-[artifact-name].md for reference and reuse.

Phase 2: Evaluation

For HTML Visualizations

Open in browser at 3 viewports (1280×720, 768×1024, 375×667).

Automated audit (run in browser console):

(function() { const audit = {}; const style = [...document.querySelectorAll('style')].map(s => s.textContent).join(' '); const html = document.documentElement.outerHTML;

// Structure audit.hasDoctype = /^<!doctype html>/i.test(html); audit.hasLangAttr = !!document.documentElement.lang; audit.hasCharset = !!document.querySelector('meta[charset]'); audit.hasViewport = !!document.querySelector('meta[name="viewport"]'); audit.hasTitle = document.title.length > 0;

// Menu system audit.menuExists = !!document.querySelector('.viz-menu'); audit.menuHasTheme = !!html.match(/cycleTheme|themeLabel/i); audit.menuHasDownload = !!html.match(/htmlToImage|html-to-image/i); audit.menuHasPrint = !!html.match(/window.print/i);

// Theme system audit.hasCSSVars = !!style.match(/--bg\s*:/); audit.hasDarkTheme = !!style.match(/(.theme-dark|:root)[\s\S]*?--bg/); audit.hasLightTheme = !!style.match(/.theme-light/); audit.themePersistedToStorage = !!html.match(/localStorage.*theme/i);

// Typography audit.hasInterFont = !!html.match(/fonts.googleapis.*Inter|font-family.*Inter/i); audit.hasFontFallback = !!style.match(/-apple-system|system-ui/); audit.bodyFontSize = parseFloat(getComputedStyle(document.body).fontSize); audit.bodyFontOK = audit.bodyFontSize >= 14;

// Layout audit.usesFlexOrGrid = !!(style.match(/display\s*:\s*(flex|grid)/)); audit.hasMaxWidth = !!style.match(/max-width/); audit.hasResponsiveBreakpoints = !!style.match(/@media.*max-width|@media.*min-width|sm:|md:|lg:/);

// Print & Accessibility audit.hasPrintStyles = !!style.match(/@media\s*print/); audit.hasPrintColorAdjust = !!style.match(/print-color-adjust/); audit.hasReducedMotion = !!style.match(/prefers-reduced-motion/); audit.hasAriaLabels = !!html.match(/aria-label/); audit.hasSemanticHTML = !!html.match(/<(header|main|nav|section|article|footer)/);

// Animations audit.hasKeyframes = !!style.match(/@keyframes/); audit.hasTransitions = !!style.match(/transition\s*:/);

// Performance audit.fileSizeKB = Math.round(new Blob([html]).size / 1024); audit.fileSizeOK = audit.fileSizeKB < 200; audit.noExternalImages = document.querySelectorAll('img[src^="http"]').length === 0; audit.htmlToImageLoaded = typeof htmlToImage !== 'undefined';

// Summary const bools = Object.entries(audit).filter(([k,v]) => typeof v === 'boolean'); const passed = bools.filter(([k,v]) => v).length; audit._passed = passed; audit._total = bools.length; audit._percent = Math.round(passed / bools.length * 100); audit._failures = bools.filter(([k,v]) => !v).map(([k]) => k);

console.table(audit); return audit; })();

Visual scoring — 8 dimensions for visualizations:

Dimension Weight 10 = 6 =

D1 First Impression 15% Apple keynote quality Generic template feel

D2 Typography 15% Perfect hierarchy, Inter font, fluid sizing All same size, no hierarchy

D3 Color & Contrast 10% Harmonious, WCAG AA, both themes beautiful Clashing, low contrast

D4 Layout & Spacing 15% Consistent rhythm, responsive, generous space Cramped, broken at mobile

D5 Content Quality 15% Clear message in 5 seconds, zero filler Confusing, placeholder text

D6 Interactivity 10% Menu + theme + download + print all flawless Missing features, broken

D7 Technical 10% Zero errors, semantic, accessible, print-ready Console errors, broken layout

D8 Shareability 10% Would tweet this unprompted Worse than Canva

For Code/Projects

Dimensions: Correctness, Readability, Architecture, Error Handling, Performance, Testing, Documentation, Security

For Documents

Dimensions: Clarity, Structure, Accuracy, Completeness, Tone, Formatting, Actionability, Brevity

For Agent Conversations

Dimensions: Helpfulness, Accuracy, Tone, Efficiency, Safety, Context Awareness, Tool Usage, Follow-through

Phase 3: Visual Report (via /visualize)

After scoring, generate the eval report as a beautiful HTML dashboard using the visualize skill:

Report Structure

  • Hero — artifact name, overall score (big number), quality gate badge

  • Radar Chart — all dimensions plotted on a radar/spider chart (Chart.js)

  • Dimension Cards — each dimension as a card with score, bar, key notes

  • Automated Audit — pass/fail checklist with percentages

  • Screenshots — key views embedded (if HTML artifact)

  • Fix List — prioritized fixes as a kanban-style layout (critical / high / medium / low)

  • Systemic Issues — patterns that affect all outputs (flagged for SKILL.md fixes)

  • History — if re-evaluating, show before/after score comparison chart

Report Filename

eval-report-[artifact-name]-[date].html

The report itself must score ≥ 9.0 on the visualize eval criteria.

This is the ultimate dogfood test — our evaluation tool produces evaluations using our visualization tool.

The Improvement Loop

Generate artifact (any skill) ↓ /evaluate → Spec + Score + Visual Report ↓ Review report → identify fixes ↓ Fix (systemic → SKILL.md, local → artifact) ↓ /evaluate again → compare scores ↓ Ship when gate = SHIP or EXCEPTIONAL

Max 3 loops per artifact. If it can't reach SHIP in 3 loops, the problem is in the skill — update the skill's instructions, not the artifact.

Quick Start

Evaluate a visualization

/evaluate path/to/visualization.html

Evaluate with custom context

/evaluate path/to/code-project --type code

Re-evaluate after fixes (tracks improvement)

/evaluate path/to/visualization.html --loop 2

Generate specs only (no scoring)

/evaluate --specs-only --type dashboard

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

visualize

No summary provided by upstream source.

Repository SourceNeeds Review
General

SERP Outline Extractor

Turn a target keyword or query into a search-informed content outline with likely subtopics, questions, and comparison angles. Useful for SEO briefs, blog pl...

Registry SourceRecently Updated
General

Multi-Model Response Comparator

Compare responses from multiple AI models for the same task and summarize differences in quality, style, speed, and likely cost. Best for model selection, ev...

Registry SourceRecently Updated