rag-observability-evals | V50.AI

rag-observability-evals

RAG Observability and Evaluations

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rag-observability-evals" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-rag-observability-evals

RAG Observability and Evaluations

Run retrieval-augmented generation like a measurable production system, not a black box.

What to Measure

Retrieval Quality

Recall@k and MRR for top-k chunks
Citation coverage and source freshness
Embedding drift and index staleness

Generation Quality

Groundedness score (answer supported by retrieved context)
Hallucination rate by route/use case
Instruction adherence and format validity

Reliability and Cost

p50/p95 latency split by retrieval vs generation
Token usage per stage
Cache hit rate and cost per successful answer

Evaluation Pipeline

Curate a benchmark set with gold answers and source docs.
Run nightly offline evals for every retriever/model configuration.
Execute online shadow evals on sampled production traffic.
Gate releases on minimum quality + safety + latency thresholds.

Alerting Strategy

Page on:

sharp decline in groundedness,
spike in unanswered or fallback responses,
index freshness SLA breach,
cost-per-answer anomaly.

Practical Guardrails

Force citations for high-risk domains.
Return abstain/fallback when confidence is below threshold.
Re-rank retrieved chunks before final generation.
Use query rewriting only with strict regression tests.

Incident Triage Checklist

Did embedding model change?
Did chunking/indexing logic change?
Did source corpus ingestion fail?
Did gateway route to unintended model tier?

Related Skills

rag-infrastructure - Deploy robust RAG backends
agent-observability - Instrument requests, traces, and costs
agent-evals - Build repeatable eval suites

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open in GitHub Open in ClawHub

Related Skills

Related by shared tags or category signals.

Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review

-31

Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review

-29

Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review

-26