rag-observability-evals

RAG Observability and Evaluations

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rag-observability-evals" with this command: npx skills add bagelhole/devops-security-agent-skills/bagelhole-devops-security-agent-skills-rag-observability-evals

RAG Observability and Evaluations

Run retrieval-augmented generation like a measurable production system, not a black box.

What to Measure

Retrieval Quality

  • Recall@k and MRR for top-k chunks

  • Citation coverage and source freshness

  • Embedding drift and index staleness

Generation Quality

  • Groundedness score (answer supported by retrieved context)

  • Hallucination rate by route/use case

  • Instruction adherence and format validity

Reliability and Cost

  • p50/p95 latency split by retrieval vs generation

  • Token usage per stage

  • Cache hit rate and cost per successful answer

Evaluation Pipeline

  • Curate a benchmark set with gold answers and source docs.

  • Run nightly offline evals for every retriever/model configuration.

  • Execute online shadow evals on sampled production traffic.

  • Gate releases on minimum quality + safety + latency thresholds.

Alerting Strategy

Page on:

  • sharp decline in groundedness,

  • spike in unanswered or fallback responses,

  • index freshness SLA breach,

  • cost-per-answer anomaly.

Practical Guardrails

  • Force citations for high-risk domains.

  • Return abstain/fallback when confidence is below threshold.

  • Re-rank retrieved chunks before final generation.

  • Use query rewriting only with strict regression tests.

Incident Triage Checklist

  • Did embedding model change?

  • Did chunking/indexing logic change?

  • Did source corpus ingestion fail?

  • Did gateway route to unintended model tier?

Related Skills

  • rag-infrastructure - Deploy robust RAG backends

  • agent-observability - Instrument requests, traces, and costs

  • agent-evals - Build repeatable eval suites

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

sops-encryption

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-administration

No summary provided by upstream source.

Repository SourceNeeds Review
Security

linux-hardening

No summary provided by upstream source.

Repository SourceNeeds Review