eval-accuracy

Score assistant responses for accuracy on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate accuracy, grade accuracy, or critique factual correctness.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "eval-accuracy" with this command: npx skills add whitespectre/ai-assistant-evals/whitespectre-ai-assistant-evals-eval-accuracy

Eval Accuracy

Use this skill to evaluate how factually accurate an assistant response is.

Inputs

Require:

  • The assistant response text to evaluate.

Internal Rubric (1–5)

5 = Factually correct, no misleading claims, no hallucinations, claims are well-supported or appropriately qualified
4 = Mostly correct, minor imprecision that does not materially affect meaning
3 = Partially correct, contains one significant inaccuracy or unsupported claim
2 = Multiple inaccuracies or misleading statements
1 = Fundamentally incorrect, fabricated, or contradicts known facts

Workflow

  1. Evaluate factual claims in the response.
  2. Compare them against widely accepted knowledge.
  3. Score accuracy on a 1-5 integer scale using the rubric only.
  4. Write concise rationale tied directly to rubric criteria.
  5. Produce actionable suggestions that improve factual correctness.

Output Contract

Return JSON only. Do not include markdown, backticks, prose, or extra keys.

Use exactly this schema:

{ "dimension": "accuracy", "score": 1, "rationale": "...", "improvement_suggestions": [ "..." ] }

Hard Rules

  • dimension must always equal "accuracy".
  • score must be an integer from 1 to 5.
  • rationale must be concise (max 3 sentences).
  • Do not include step-by-step reasoning.
  • improvement_suggestions must be a non-empty array of concrete edits.
  • Never output text outside the JSON object.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

eval-relevance

No summary provided by upstream source.

Repository SourceNeeds Review
General

eval-clarity

No summary provided by upstream source.

Repository SourceNeeds Review
General

eval-guidance-actionability

No summary provided by upstream source.

Repository SourceNeeds Review