zotero-rag

Library RAG: Semantic Search

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "zotero-rag" with this command: npx skills add nealcaren/sociology-skillset/nealcaren-sociology-skillset-zotero-rag

Library RAG: Semantic Search

Semantic search over your local library of markdown-converted papers using sentence-transformers embeddings and ChromaDB.

Prerequisites

  • uv installed (standard in this project)

  • Papers ingested via ingest.py (which converts to markdown, organizes files, and adds metadata to references.bib )

  • references.bib with md_path fields linking citation keys to markdown files

Important: Only files registered in references.bib are indexed. Loose markdown files in library/markdown/ without a bib entry will be flagged as "unlinked" during indexing. Run ingest.py to register them.

Quick Start

Index your library (first time or after adding papers)

uv run plugins/sociology-skillset/scripts/rag.py index

Search by meaning

uv run plugins/sociology-skillset/scripts/rag.py search "cultural capital and educational attainment"

Script Location

All commands use:

uv run plugins/sociology-skillset/scripts/rag.py <command>

Dependencies (sentence-transformers , chromadb ) are auto-installed by uv on first run via PEP 723 inline metadata. No manual installation needed.

Commands

Index

Build or update the vector index from library/markdown/ files.

Index all markdown files (incremental — skips unchanged files)

uv run plugins/sociology-skillset/scripts/rag.py index

Index specific citation keys only

uv run plugins/sociology-skillset/scripts/rag.py index --keys Smith2020_Cultural Jones2019_Institutional

The index is stored at library/.rag-index/ . First run downloads the all-MiniLM-L6-v2 embedding model (~80MB, cached by sentence-transformers).

Run this after adding new papers to keep the index current.

Search

Semantic search across all indexed documents. Returns JSON lines ranked by similarity.

uv run plugins/sociology-skillset/scripts/rag.py search "social movements and collective identity" uv run plugins/sociology-skillset/scripts/rag.py search "interview methodology" --top-k 5 uv run plugins/sociology-skillset/scripts/rag.py search "Bourdieu field theory" --min-score 0.3

Each result includes: chunk_id , citation_key , section_title , score , text (truncated), plus title , author , year from references.bib .

Similar

Find passages similar to a given chunk (from search results).

uv run plugins/sociology-skillset/scripts/rag.py similar <chunk_id> uv run plugins/sociology-skillset/scripts/rag.py similar abc123def456 --top-k 5

Use this to explore thematic connections: find a relevant passage via search , then use similar to discover related content across other papers.

Context

Show the full context around a chunk — the target chunk plus surrounding chunks from the same document.

uv run plugins/sociology-skillset/scripts/rag.py context <chunk_id> uv run plugins/sociology-skillset/scripts/rag.py context abc123def456 --window 3

Returns the target chunk and neighboring chunks (default: 2 on each side), so you can read the passage in its original context.

Status

Show index statistics: number of documents, chunks, and last modified time.

uv run plugins/sociology-skillset/scripts/rag.py status

List

List all indexed documents with chunk counts.

uv run plugins/sociology-skillset/scripts/rag.py list

Remove

Remove a document from the index by citation key.

uv run plugins/sociology-skillset/scripts/rag.py remove Smith2020_Cultural

Typical Workflows

First-time setup

  • Ensure papers are in library/markdown/ (run ingest.py for each PDF/EPUB)

  • Run uv run rag.py index to build the index

  • Search with uv run rag.py search "your topic"

Adding new papers

  • Ingest the paper: uv run plugins/sociology-skillset/scripts/ingest.py --file paper.pdf

  • Update the index: uv run plugins/sociology-skillset/scripts/rag.py index

Adding a PDF for a paper already in references.bib

  • Ingest with update: uv run plugins/sociology-skillset/scripts/ingest.py --file paper.pdf --citekey ExistingKey2022 --update

  • Update the index: uv run plugins/sociology-skillset/scripts/rag.py index

Deep exploration

  • Search for a topic: search "concept or question"

  • Read context of a promising hit: context <chunk_id>

  • Find similar passages across other papers: similar <chunk_id>

  • Read the full paper if needed: open the source_file path from results

When to Use RAG vs. Grep

Need Tool

Conceptual/semantic search (find passages about a concept even if they don't use the exact words) rag.py search

Exact keyword/phrase search (find specific terms, author names, method names) grep library/markdown/

Metadata search (by author, year, journal) grep references.bib

Both approaches complement each other. Use semantic search for exploratory discovery and grep for precise retrieval.

Technical Details

  • Embedding model: all-MiniLM-L6-v2 (384 dimensions, same as old Zotero RAG)

  • Vector store: ChromaDB with file-based persistence at library/.rag-index/

  • Chunking: Split by ## headers (section-level); fallback to ~512-token fixed chunks for headerless documents

  • Incremental indexing: Content hashes stored in metadata; unchanged files are skipped on re-index

  • Output format: JSON lines for easy parsing by Claude or other tools

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

writing-editor

No summary provided by upstream source.

Repository SourceNeeds Review
General

bibliography-builder

No summary provided by upstream source.

Repository SourceNeeds Review
General

qual-findings-writer

No summary provided by upstream source.

Repository SourceNeeds Review
General

mixed-methods-findings-writer

No summary provided by upstream source.

Repository SourceNeeds Review