qmd - Quick Markdown Search

Full-text (BM25) and vector similarity search with query expansion and reranking.

Overview

qmd provides semantic search across collections of text files (markdown, JSONL, etc.) using:

BM25 full-text search - Fast keyword matching
Vector embeddings - Semantic similarity (embeddinggemma-300M)
Reranking - Quality filtering (qwen3-reranker-0.6b)
Query expansion - Automatic query refinement

Installation:

bun install -g https://github.com/tobi/qmd

Binary location: ~/.bun/bin/qmd

When to Use

Use qmd for:

Searching conversation history across Claude Code, claudesp, Clawdbot
Finding discussions about specific topics
Semantic similarity search (similar concepts, different words)
Documentation search
Any large text corpus search

Session Retention Policy:

Only keep sessions < 2 months old in qmd index
Older sessions: grep on demand from raw JSONL files
Keeps index size manageable, search fast

Core Commands

Search Commands

Combined search (BM25 + vector + reranking) - BEST

qmd query "{text}" -c <collection>

Full-text search only - FAST

qmd search "{text}" -c <collection>

Vector similarity only - SEMANTIC

qmd vsearch "{text}" -c <collection>

Collection Management

Create/index collection

qmd collection add <path> --name <name> --mask <pattern>

List all collections

qmd collection list

Remove collection

qmd collection remove <name>

Rename collection

qmd collection rename <old> <new>

Index Management

Update index (re-scan for changes)

qmd update

Update with git pull

qmd update --pull

Generate embeddings (required for vsearch)

qmd embed -f

Check index status

qmd status

Clean up and vacuum

qmd cleanup

File Operations

Get specific document

qmd get <file>[:line] [-l N] [--from N]

Get multiple documents by pattern

qmd multi-get <pattern> [-l N] [--max-bytes N]

List files in collection

qmd ls [collection[/path]]

Context Management

Add context for path

qmd context add [path] "text"

List all contexts

qmd context list

Remove context

qmd context rm <path>

Search Options

Limit results

-n <num> # Number of results (default: 5)

Scoring thresholds

--min-score <num> # Minimum similarity score --all # Return all matches

Output formats

--full # Full document instead of snippet --line-numbers # Add line numbers --json # JSON output --csv # CSV output --md # Markdown output --xml # XML output --files # Output docid,score,filepath,context

Collection filtering

-c <name> # Filter to specific collection

Multi-get options

-l <num> # Maximum lines per file --max-bytes <num> # Skip files larger than N bytes

Common Patterns

Search All Collections

qmd query "authentication"
-c claude-sessions
-c claudesp-sessions
-c clawdbot-sessions
--full -n 10

Search with Score Threshold

qmd query "deployment bug" --min-score 0.7 --json

Get Recent Files

qmd ls claude-sessions | head -20

Semantic Search

Find similar concepts (not just keywords)

qmd vsearch "how do we handle errors in the gateway"

Bulk Retrieval

Get all files matching pattern

qmd multi-get "2026-01-28*.jsonl" --json

Output Formats

Default (Snippet)

Result 1 (score: 0.85): File: ~/.claude/sessions/abc123.jsonl:42 Snippet: ...relevant text around match...

Full Document

qmd query "text" --full --line-numbers

JSON

qmd query "text" --json | jq '.results[] | {score, file: .docid}'

Files Only

qmd query "text" --files

Output: docid,score,filepath,context

MCP Server

qmd includes an MCP server for agent integration:

Start MCP server

qmd mcp

Add to claude_desktop_config.json:

{ "mcpServers": { "qmd": { "command": "qmd", "args": ["mcp"] } } }

MCP tools exposed:

search
Full-text search
vsearch
Vector search
query
Combined search
get
Get document
multi-get
Get multiple documents
collection_*
Collection operations

Index Details

Location: ~/.cache/qmd/index.sqlite

Models (auto-downloaded from HuggingFace):

Embedding: embeddinggemma-300M-Q8_0
Reranking: qwen3-reranker-0.6b-q8_0
Generation: Qwen3-0.6B-Q8_0

Collection structure:

CREATE TABLE collections ( name TEXT PRIMARY KEY, path TEXT, mask TEXT );

CREATE TABLE documents ( docid TEXT PRIMARY KEY, collection TEXT, path TEXT, hash TEXT, content TEXT );

CREATE TABLE embeddings ( hash TEXT PRIMARY KEY, embedding BLOB );

Troubleshooting

"Collection not found"

qmd collection list # Check what exists qmd collection add <path> --name <name> --mask "*.md"

"No embeddings found"

qmd embed -f # Generate embeddings

No results

Try broader search

qmd search "keyword" --all --min-score 0.3

Check collection has files

qmd ls <collection>

Re-index

qmd update

Large index

qmd cleanup # Vacuum DB

Examples

Example 1: Find Authentication Discussions

qmd query "authentication jwt middleware"
-c claude-sessions
-c clawdbot-sessions
--full --line-numbers -n 5

Example 2: Search Clawdbot Only

qmd search "gateway bug" -c clawdbot-sessions --files

Example 3: Semantic Search

Find conceptually similar content

qmd vsearch "deploying containers to production"
--full -n 3

Example 4: Get Session by ID

If you know the session ID

qmd get ~/.clawdbot/agents/main/sessions/abc-123.jsonl --full

Example 5: Search Recent Sessions

Find files, then search within them

find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -7 |
xargs qmd multi-get --json |
jq -r '.[] | select(.content | contains("voice"))'

Related Search Tools

qmd specializes in local markdown/JSONL search. For external search:

Tool Specialty Use When

qmd (this) Local session/doc search (BM25 + vector) Conversation history, markdown collections

lev-find Unified local + external search Cross-domain discovery, default choice

lev-research Multi-perspective orchestration Architecture analysis, research workflows

valyu Recursive turn-based research valyu research "query" --turns 5

deep-research Multi-query Tavily synthesis deep-research "query" --deep

brave-search Quick web search brave-search "query"

tavily-search AI-optimized snippets tavily-search "query"

exa-plus Neural search, GitHub, papers exa search "query"

grok-research Real-time X/Twitter grok-research "query"

firecrawl Web scraping firecrawl scrape <url>

QMD's unique capabilities:

✅ Local-only (no external API calls)
✅ BM25 full-text + vector embeddings + reranking
✅ Conversation history across Claude Code/claudesp/Clawdbot
✅ Fast markdown collection search
✅ MCP server for agent integration
❌ External web search (use brave/tavily/exa)
❌ Multi-perspective (use lev-research)

Integration pattern:

1. Search local history

qmd query "authentication discussion" -c claude-sessions --full

2. If not found locally, search external

valyu research "authentication patterns 2026" --turns 5

3. Or use unified search

lev get "authentication" --scope=all # Searches both local + external

Integration with Other Skills

lev-clwd

lev-clwd uses qmd for conversation history search across all 3 session stores.

lev-find

Future: lev-find will abstract qmd collections with unified interface.

See skill://lev-research for comprehensive research workflows.

Claudesp Variant (~/dcs)

The claudesp variant lives at ~/.claude-sneakpeek/claudesp/config/ with shortcut:

~/dcs → ~/.claude-sneakpeek/claudesp/config/

Directory Structure

~/.claude-sneakpeek/ └── claudesp/ └── config/ # ← ~/dcs points here ├── CLAUDE.md # Variant-specific instructions ├── .claude.json # Variant settings + hooks ├── settings.json # Variant hook configuration ├── commands/ # Commands (copies, allow variant edits) ├── skills/ # Skills (symlinked from ~/.claude/skills/) ├── hooks/ # Same hooks as ~/.claude/hooks/ ├── plans/ # Session plans ├── history.jsonl # Claudesp-specific command history ├── projects/ # Project session indexes └── session-env/ # Session environments

Session Collections

Collection Path Files

claude-sessions

~/.claude/transcripts/

~1558

claudesp-sessions

~/dcs/transcripts/ (or ~/.claude-sneakpeek/claudesp/config/transcripts/ ) ~163

clawdbot-sessions

~/.clawdbot/agents/main/sessions/

~1165

Searching Claudesp History

Search claudesp sessions specifically

qmd query "entity dashboard" -c claudesp-sessions --full -n 5

Cross-variant search (all 3 session stores)

qmd query "lev cms" -c claude-sessions -c claudesp-sessions -c clawdbot-sessions -n 10

Auto-Refresh (Staleness Detection)

How qmd Handles Incremental Updates

qmd tracks file hashes in the index. On qmd update :

New files → indexed and added
Changed files (hash differs) → re-indexed
Unchanged files → skipped (fast)
Deleted files → removed from index

This means qmd update is always safe and incremental.

XDG Cache Staleness Check

Index lives at ~/.cache/qmd/index.sqlite (XDG-compliant).

Auto-refresh pattern for hooks/session start:

#!/bin/bash

qmd-auto-refresh.sh - Run on SessionStart or as needed

Checks if index is stale and refreshes incrementally

QMD_INDEX="$HOME/.cache/qmd/index.sqlite" STALENESS_THRESHOLD=86400 # 1 day in seconds

if [ ! -f "$QMD_INDEX" ]; then echo "qmd index missing, creating..." qmd update exit 0 fi

Check last modified time

INDEX_MTIME=$(stat -f %m "$QMD_INDEX" 2>/dev/null || stat -c %Y "$QMD_INDEX" 2>/dev/null) NOW=$(date +%s) AGE=$(( NOW - INDEX_MTIME ))

if [ "$AGE" -gt "$STALENESS_THRESHOLD" ]; then echo "qmd index stale (${AGE}s old), refreshing..." qmd update # Incremental: skips unchanged files via hash else echo "qmd index fresh (${AGE}s old)" fi

Hook Integration

Add to ~/.claude/settings.json SessionStart hooks:

{ "hooks": { "SessionStart": [ { "matcher": "", "hooks": [ { "type": "command", "command": "~/.claude/hooks/qmd-auto-refresh.sh" } ] } ] } }

Session Retention Policy

< 2 months old: Keep in qmd index (fast semantic search)
2 months old: Grep on demand from raw JSONL files
Cleanup: qmd cleanup removes orphaned data, vacuums DB

Collection-Level Staleness

Check which collections need refresh

qmd status | grep "updated" | awk '{print $1, $NF}'

Force refresh specific collection

qmd update -c claude-sessions qmd update -c claudesp-sessions

Refresh all (incremental, safe)

qmd update

Maintenance

Daily Update

Add to jared cron or SessionStart hook:

Incremental update (skips unchanged files)

qmd update

Generate embeddings for new files

qmd embed -f

Weekly Cleanup

qmd cleanup # Remove orphaned data, vacuum DB

Reference

Repository: https://github.com/tobi/qmd Models: HuggingFace (auto-downloaded) Index: ~/.cache/qmd/index.sqlite

Binary: ~/.bun/bin/qmd

Shortcut: ~/dcs → ~/.claude-sneakpeek/claudesp/config/

Technique Map

Role definition - Clarifies operating scope and prevents ambiguous execution.
Context enrichment - Captures required inputs before actions.
Output structuring - Standardizes deliverables for consistent reuse.
Step-by-step workflow - Reduces errors by making execution order explicit.
Edge-case handling - Documents safe fallbacks when assumptions fail.

Technique Notes

These techniques improve reliability by making intent, inputs, outputs, and fallback paths explicit. Keep this section concise and additive so existing domain guidance remains primary.

Prompt Architect Overlay

Role Definition

You are the prompt-architect-enhanced specialist for lev-find-qmd, responsible for deterministic execution of this skill's guidance while preserving existing workflow and constraints.

Input Contract

Required: clear user intent and relevant context for this skill.
Preferred: repository/project constraints, existing artifacts, and success criteria.
If context is missing, ask focused questions before proceeding.

Output Contract

Provide structured, actionable outputs aligned to this skill's existing format.
Include assumptions and next steps when appropriate.
Preserve compatibility with existing sections and related skills.

Edge Cases & Fallbacks

If prerequisites are missing, provide a minimal safe path and request missing inputs.
If scope is ambiguous, narrow to the highest-confidence sub-task.
If a requested action conflicts with existing constraints, explain and offer compliant alternatives.

qmd

Safety Notice

Copy this and send it to your AI assistant to learn

Combined search (BM25 + vector + reranking) - BEST

Full-text search only - FAST

Vector similarity only - SEMANTIC

Create/index collection

List all collections

Remove collection

Rename collection

Update index (re-scan for changes)

Update with git pull

Generate embeddings (required for vsearch)

Check index status

Clean up and vacuum

Get specific document

Get multiple documents by pattern

List files in collection

Add context for path

List all contexts

Remove context

Limit results

Scoring thresholds

Output formats

Collection filtering

Multi-get options

Find similar concepts (not just keywords)

Get all files matching pattern

Output: docid,score,filepath,context

Start MCP server

Add to claude_desktop_config.json:

Try broader search

Check collection has files

Re-index

Find conceptually similar content

If you know the session ID

Find files, then search within them

1. Search local history

2. If not found locally, search external

3. Or use unified search

Search claudesp sessions specifically

Cross-variant search (all 3 session stores)

qmd-auto-refresh.sh - Run on SessionStart or as needed

Checks if index is stale and refreshes incrementally

Check last modified time

Check which collections need refresh

Force refresh specific collection

Refresh all (incremental, safe)

Incremental update (skips unchanged files)

Generate embeddings for new files

Source Transparency

Related Skills

agent-browser

work

lev-intake