repo-rag

<usage_patterns>

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "repo-rag" with this command: npx skills add oimiragieo/agent-studio/oimiragieo-agent-studio-repo-rag

<usage_patterns>

  • Architecture Review: Run symbol searches on key interfaces to understand the dependency graph.

  • Plan Mode: Use this skill to populate the "Context" section of a Plan Mode artifact.

  • Refactoring: Identify all usages of a symbol before renaming or modifying it. </usage_patterns>

symbols "UserAuthentication"

Semantic Search:

search "authentication middleware logic"

</code_example>

RAG Evaluation

Overview

Systematic evaluation of RAG quality using retrieval and end-to-end metrics. Based on Claude Cookbooks patterns.

Evaluation Metrics

Retrieval Metrics (from .claude/tools/repo-rag/metrics.py ):

  • Precision: Proportion of retrieved chunks that are actually relevant

  • Formula: Precision = True Positives / Total Retrieved

  • High precision (0.8-1.0): System retrieves mostly relevant items

  • Recall: Completeness of retrieval - how many relevant items were found

  • Formula: Recall = True Positives / Total Correct

  • High recall (0.8-1.0): System finds most relevant items

  • F1 Score: Harmonic mean of precision and recall

  • Formula: F1 = 2 × (Precision × Recall) / (Precision + Recall)

  • Balanced measure when both precision and recall matter

  • MRR (Mean Reciprocal Rank): Measures ranking quality

  • Formula: MRR = 1 / rank of first correct item

  • High MRR (0.8-1.0): Correct items ranked first

End-to-End Metrics (from .claude/tools/repo-rag/evaluation.py ):

  • Accuracy (LLM-as-Judge): Overall correctness using Claude evaluation

  • Compares generated answer to correct answer

  • Focuses on substance and meaning, not exact wording

  • Checks for completeness and absence of contradictions

Evaluation Process

Create Evaluation Dataset:

{ "query": "How is user authentication implemented?", "correct_chunks": ["src/auth/middleware.ts", "src/auth/types.ts"], "correct_answer": "User authentication uses JWT tokens...", "category": "authentication" }

Run Retrieval Evaluation:

Using Python directly

from .claude.tools.repo_rag.metrics import evaluate_retrieval metrics = evaluate_retrieval(retrieved_chunks, correct_chunks) print(f"Precision: {metrics['precision']}, Recall: {metrics['recall']}, F1: {metrics['f1']}, MRR: {metrics['mrr']}")

Run End-to-End Evaluation:

Using Python directly

from .claude.tools.repo_rag.evaluation import evaluate_end_to_end result = evaluate_end_to_end(query, generated_answer, correct_answer) print(f"Correct: {result['is_correct']}, Explanation: {result['explanation']}")

Expected Performance

Based on Claude Cookbooks results:

  • Basic RAG: Precision 0.43, Recall 0.66, F1 0.52, MRR 0.74, Accuracy 71%

  • With Re-ranking: Precision 0.44, Recall 0.69, F1 0.54, MRR 0.87, Accuracy 81%

Best Practices

  • Separate Evaluation: Evaluate retrieval and end-to-end separately

  • Create Comprehensive Datasets: Cover common and edge cases

  • Evaluate Regularly: Run evaluations after codebase changes

  • Track Metrics Over Time: Monitor improvements

  • Use Both Metrics: Precision/Recall for retrieval, Accuracy for end-to-end

References

  • RAG Patterns Guide - Implementation patterns

  • Retrieval Metrics - Metric calculations

  • End-to-End Evaluation - LLM-as-judge

  • Evaluation Guide - Comprehensive evaluation guide

Memory Protocol (MANDATORY)

Before starting: Read .claude/context/memory/learnings.md

After completing:

  • New pattern -> .claude/context/memory/learnings.md

  • Issue found -> .claude/context/memory/issues.md

  • Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

filesystem

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

slack-notifications

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

chrome-browser

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

text-to-sql

No summary provided by upstream source.

Repository SourceNeeds Review