add-golden

Add to Golden Dataset

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "add-golden" with this command: npx skills add yonatangross/orchestkit/yonatangross-orchestkit-add-golden

Add to Golden Dataset

Multi-agent curation workflow with quality score explanations, bias detection, and version tracking.

Quick Start

/add-golden https://example.com/article /add-golden https://arxiv.org/abs/2312.xxxxx

Task Management (CC 2.1.16)

Create main curation task

TaskCreate( subject="Add to golden dataset: {url}", description="Multi-agent curation with quality explanation", activeForm="Curating document" )

Create subtasks for 9-phase process

phases = ["Fetch content", "Run quality analysis", "Explain scores", "Check bias", "Check diversity", "Validate", "Get approval", "Write to dataset", "Update version"] for phase in phases: TaskCreate(subject=phase, activeForm=f"{phase}ing")

Workflow Overview

Phase Activities Output

  1. Input Collection Get URL, detect content type Document metadata

  2. Fetch and Extract Parse document structure Structured content

  3. Quality Analysis 4 parallel agents evaluate Raw scores

  4. Quality Explanation Explain WHY each score Score rationale

  5. Bias Detection Check for bias in content Bias report

  6. Diversity Check Assess dataset balance Diversity metrics

  7. Validation Schema, duplicates, gates Validation status

  8. Silver-to-Gold Promote or mark as silver Classification

  9. Version Tracking Track changes, rollback Version entry

Phase 1-2: Input and Extraction

Detect content type: article, tutorial, documentation, research_paper.

Extract: title, sections, code blocks, key terms, metadata (author, date).

Phase 3: Parallel Quality Analysis (4 Agents)

Launch ALL agents in ONE message with run_in_background=True .

Agent Focus Output

code-quality-reviewer Accuracy, coherence, depth, relevance Quality scores

workflow-architect Keyword directness, paraphrase, reasoning Difficulty level

data-pipeline-engineer Primary/secondary domains, skill level Tags

test-generator Direct, paraphrased, multi-hop queries Test queries

See Quality Scoring for detailed criteria.

Phase 4: Quality Explanation

Each dimension gets WHY explanation:

Accuracy: [N.NN]/1.0

Why this score:

  • [Specific reason with evidence] What would improve it:
  • [Specific improvement]

Phase 5: Bias Detection

See Bias Detection Guide for patterns.

Check for:

  • Technology bias (favors specific tools)

  • Recency bias (ignores LTS versions)

  • Complexity bias (assumed knowledge)

  • Vendor bias (promotes products)

  • Geographic/cultural bias

Bias Score Action

0-2 Proceed normally

3-5 Add disclaimer

6-8 Require user review

9-10 Recommend against

Phase 6: Diversity Dashboard

Track dataset balance across:

  • Domain distribution (AI/ML, Backend, Frontend, DevOps, Security)

  • Difficulty distribution (trivial, easy, medium, hard, adversarial)

Impact assessment: Does new document improve or worsen diversity?

Phase 7: Validation

  • URL validation (no placeholders)

  • Schema validation (required fields)

  • Duplicate check (>80% similarity)

  • Quality gates (min sections, content length)

Phase 8: Silver-to-Gold Workflow

See Silver-Gold Promotion for criteria.

Status Criteria Action

GOLD Score >= 0.75, no bias Add to main dataset

SILVER Score 0.55-0.74 Add to silver, track

REJECT Score < 0.55 Do not add

Promotion criteria: 7+ days in silver, quality >= 0.75, no negative feedback.

Phase 9: Version Tracking

{ "version": "1.2.3", "change_type": "ADD|UPDATE|REMOVE|PROMOTE", "document_id": "doc-123", "quality_score": 0.82, "rollback_available": true }

Update Type Version Bump

Add/Update document Patch (0.0.X)

Remove document Minor (0.X.0)

Schema change Major (X.0.0)

Quality Scoring

Dimension Weight

Accuracy 0.25

Coherence 0.20

Depth 0.25

Relevance 0.30

Formula: quality_score = accuracy0.25 + coherence0.20 + depth0.25 + relevance0.30

Key Decisions

Decision Choice Rationale

Score explanation Required Transparency, actionable feedback

Bias detection Dedicated agent Prevent dataset contamination

Two-tier system Silver + Gold Allow docs time to mature

Version tracking Semantic versioning Clear history, safe rollbacks

Related Skills

  • golden-dataset-validation

  • Validate existing datasets

  • llm-evaluation

  • LLM output evaluation patterns

  • test-data-management

  • Test data strategies

Version: 2.0.0 (January 2026)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

responsive-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

domain-driven-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

dashboard-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

rag-retrieval

No summary provided by upstream source.

Repository SourceNeeds Review