Add to Golden Dataset
Multi-agent curation workflow with quality score explanations, bias detection, and version tracking.
Quick Start
/add-golden https://example.com/article /add-golden https://arxiv.org/abs/2312.xxxxx
Task Management (CC 2.1.16)
Create main curation task
TaskCreate( subject="Add to golden dataset: {url}", description="Multi-agent curation with quality explanation", activeForm="Curating document" )
Create subtasks for 9-phase process
phases = ["Fetch content", "Run quality analysis", "Explain scores", "Check bias", "Check diversity", "Validate", "Get approval", "Write to dataset", "Update version"] for phase in phases: TaskCreate(subject=phase, activeForm=f"{phase}ing")
Workflow Overview
Phase Activities Output
-
Input Collection Get URL, detect content type Document metadata
-
Fetch and Extract Parse document structure Structured content
-
Quality Analysis 4 parallel agents evaluate Raw scores
-
Quality Explanation Explain WHY each score Score rationale
-
Bias Detection Check for bias in content Bias report
-
Diversity Check Assess dataset balance Diversity metrics
-
Validation Schema, duplicates, gates Validation status
-
Silver-to-Gold Promote or mark as silver Classification
-
Version Tracking Track changes, rollback Version entry
Phase 1-2: Input and Extraction
Detect content type: article, tutorial, documentation, research_paper.
Extract: title, sections, code blocks, key terms, metadata (author, date).
Phase 3: Parallel Quality Analysis (4 Agents)
Launch ALL agents in ONE message with run_in_background=True .
Agent Focus Output
code-quality-reviewer Accuracy, coherence, depth, relevance Quality scores
workflow-architect Keyword directness, paraphrase, reasoning Difficulty level
data-pipeline-engineer Primary/secondary domains, skill level Tags
test-generator Direct, paraphrased, multi-hop queries Test queries
See Quality Scoring for detailed criteria.
Phase 4: Quality Explanation
Each dimension gets WHY explanation:
Accuracy: [N.NN]/1.0
Why this score:
- [Specific reason with evidence] What would improve it:
- [Specific improvement]
Phase 5: Bias Detection
See Bias Detection Guide for patterns.
Check for:
-
Technology bias (favors specific tools)
-
Recency bias (ignores LTS versions)
-
Complexity bias (assumed knowledge)
-
Vendor bias (promotes products)
-
Geographic/cultural bias
Bias Score Action
0-2 Proceed normally
3-5 Add disclaimer
6-8 Require user review
9-10 Recommend against
Phase 6: Diversity Dashboard
Track dataset balance across:
-
Domain distribution (AI/ML, Backend, Frontend, DevOps, Security)
-
Difficulty distribution (trivial, easy, medium, hard, adversarial)
Impact assessment: Does new document improve or worsen diversity?
Phase 7: Validation
-
URL validation (no placeholders)
-
Schema validation (required fields)
-
Duplicate check (>80% similarity)
-
Quality gates (min sections, content length)
Phase 8: Silver-to-Gold Workflow
See Silver-Gold Promotion for criteria.
Status Criteria Action
GOLD Score >= 0.75, no bias Add to main dataset
SILVER Score 0.55-0.74 Add to silver, track
REJECT Score < 0.55 Do not add
Promotion criteria: 7+ days in silver, quality >= 0.75, no negative feedback.
Phase 9: Version Tracking
{ "version": "1.2.3", "change_type": "ADD|UPDATE|REMOVE|PROMOTE", "document_id": "doc-123", "quality_score": 0.82, "rollback_available": true }
Update Type Version Bump
Add/Update document Patch (0.0.X)
Remove document Minor (0.X.0)
Schema change Major (X.0.0)
Quality Scoring
Dimension Weight
Accuracy 0.25
Coherence 0.20
Depth 0.25
Relevance 0.30
Formula: quality_score = accuracy0.25 + coherence0.20 + depth0.25 + relevance0.30
Key Decisions
Decision Choice Rationale
Score explanation Required Transparency, actionable feedback
Bias detection Dedicated agent Prevent dataset contamination
Two-tier system Silver + Gold Allow docs time to mature
Version tracking Semantic versioning Clear history, safe rollbacks
Related Skills
-
golden-dataset-validation
-
Validate existing datasets
-
llm-evaluation
-
LLM output evaluation patterns
-
test-data-management
-
Test data strategies
Version: 2.0.0 (January 2026)