qmd - Quick Markdown Search
Full-text (BM25) and vector similarity search with query expansion and reranking.
Overview
qmd provides semantic search across collections of text files (markdown, JSONL, etc.) using:
-
BM25 full-text search - Fast keyword matching
-
Vector embeddings - Semantic similarity (embeddinggemma-300M)
-
Reranking - Quality filtering (qwen3-reranker-0.6b)
-
Query expansion - Automatic query refinement
Installation:
bun install -g https://github.com/tobi/qmd
Binary location: ~/.bun/bin/qmd
When to Use
Use qmd for:
-
Searching conversation history across Claude Code, claudesp, Clawdbot
-
Finding discussions about specific topics
-
Semantic similarity search (similar concepts, different words)
-
Documentation search
-
Any large text corpus search
Session Retention Policy:
-
Only keep sessions < 2 months old in qmd index
-
Older sessions: grep on demand from raw JSONL files
-
Keeps index size manageable, search fast
Core Commands
Search Commands
Combined search (BM25 + vector + reranking) - BEST
qmd query "{text}" -c <collection>
Full-text search only - FAST
qmd search "{text}" -c <collection>
Vector similarity only - SEMANTIC
qmd vsearch "{text}" -c <collection>
Collection Management
Create/index collection
qmd collection add <path> --name <name> --mask <pattern>
List all collections
qmd collection list
Remove collection
qmd collection remove <name>
Rename collection
qmd collection rename <old> <new>
Index Management
Update index (re-scan for changes)
qmd update
Update with git pull
qmd update --pull
Generate embeddings (required for vsearch)
qmd embed -f
Check index status
qmd status
Clean up and vacuum
qmd cleanup
File Operations
Get specific document
qmd get <file>[:line] [-l N] [--from N]
Get multiple documents by pattern
qmd multi-get <pattern> [-l N] [--max-bytes N]
List files in collection
qmd ls [collection[/path]]
Context Management
Add context for path
qmd context add [path] "text"
List all contexts
qmd context list
Remove context
qmd context rm <path>
Search Options
Limit results
-n <num> # Number of results (default: 5)
Scoring thresholds
--min-score <num> # Minimum similarity score --all # Return all matches
Output formats
--full # Full document instead of snippet --line-numbers # Add line numbers --json # JSON output --csv # CSV output --md # Markdown output --xml # XML output --files # Output docid,score,filepath,context
Collection filtering
-c <name> # Filter to specific collection
Multi-get options
-l <num> # Maximum lines per file --max-bytes <num> # Skip files larger than N bytes
Common Patterns
Search All Collections
qmd query "authentication"
-c claude-sessions
-c claudesp-sessions
-c clawdbot-sessions
--full -n 10
Search with Score Threshold
qmd query "deployment bug" --min-score 0.7 --json
Get Recent Files
qmd ls claude-sessions | head -20
Semantic Search
Find similar concepts (not just keywords)
qmd vsearch "how do we handle errors in the gateway"
Bulk Retrieval
Get all files matching pattern
qmd multi-get "2026-01-28*.jsonl" --json
Output Formats
Default (Snippet)
Result 1 (score: 0.85): File: ~/.claude/sessions/abc123.jsonl:42 Snippet: ...relevant text around match...
Full Document
qmd query "text" --full --line-numbers
JSON
qmd query "text" --json | jq '.results[] | {score, file: .docid}'
Files Only
qmd query "text" --files
Output: docid,score,filepath,context
MCP Server
qmd includes an MCP server for agent integration:
Start MCP server
qmd mcp
Add to claude_desktop_config.json:
{ "mcpServers": { "qmd": { "command": "qmd", "args": ["mcp"] } } }
MCP tools exposed:
-
search
-
Full-text search
-
vsearch
-
Vector search
-
query
-
Combined search
-
get
-
Get document
-
multi-get
-
Get multiple documents
-
collection_*
-
Collection operations
Index Details
Location: ~/.cache/qmd/index.sqlite
Models (auto-downloaded from HuggingFace):
-
Embedding: embeddinggemma-300M-Q8_0
-
Reranking: qwen3-reranker-0.6b-q8_0
-
Generation: Qwen3-0.6B-Q8_0
Collection structure:
CREATE TABLE collections ( name TEXT PRIMARY KEY, path TEXT, mask TEXT );
CREATE TABLE documents ( docid TEXT PRIMARY KEY, collection TEXT, path TEXT, hash TEXT, content TEXT );
CREATE TABLE embeddings ( hash TEXT PRIMARY KEY, embedding BLOB );
Troubleshooting
"Collection not found"
qmd collection list # Check what exists qmd collection add <path> --name <name> --mask "*.md"
"No embeddings found"
qmd embed -f # Generate embeddings
No results
Try broader search
qmd search "keyword" --all --min-score 0.3
Check collection has files
qmd ls <collection>
Re-index
qmd update
Large index
qmd cleanup # Vacuum DB
Examples
Example 1: Find Authentication Discussions
qmd query "authentication jwt middleware"
-c claude-sessions
-c clawdbot-sessions
--full --line-numbers -n 5
Example 2: Search Clawdbot Only
qmd search "gateway bug" -c clawdbot-sessions --files
Example 3: Semantic Search
Find conceptually similar content
qmd vsearch "deploying containers to production"
--full -n 3
Example 4: Get Session by ID
If you know the session ID
qmd get ~/.clawdbot/agents/main/sessions/abc-123.jsonl --full
Example 5: Search Recent Sessions
Find files, then search within them
find ~/.clawdbot/agents/main/sessions -name "*.jsonl" -mtime -7 |
xargs qmd multi-get --json |
jq -r '.[] | select(.content | contains("voice"))'
Related Search Tools
qmd specializes in local markdown/JSONL search. For external search:
Tool Specialty Use When
qmd (this) Local session/doc search (BM25 + vector) Conversation history, markdown collections
lev-find Unified local + external search Cross-domain discovery, default choice
lev-research Multi-perspective orchestration Architecture analysis, research workflows
valyu Recursive turn-based research valyu research "query" --turns 5
deep-research Multi-query Tavily synthesis deep-research "query" --deep
brave-search Quick web search brave-search "query"
tavily-search AI-optimized snippets tavily-search "query"
exa-plus Neural search, GitHub, papers exa search "query"
grok-research Real-time X/Twitter grok-research "query"
firecrawl Web scraping firecrawl scrape <url>
QMD's unique capabilities:
-
✅ Local-only (no external API calls)
-
✅ BM25 full-text + vector embeddings + reranking
-
✅ Conversation history across Claude Code/claudesp/Clawdbot
-
✅ Fast markdown collection search
-
✅ MCP server for agent integration
-
❌ External web search (use brave/tavily/exa)
-
❌ Multi-perspective (use lev-research)
Integration pattern:
1. Search local history
qmd query "authentication discussion" -c claude-sessions --full
2. If not found locally, search external
valyu research "authentication patterns 2026" --turns 5
3. Or use unified search
lev get "authentication" --scope=all # Searches both local + external
Integration with Other Skills
lev-clwd
lev-clwd uses qmd for conversation history search across all 3 session stores.
lev-find
Future: lev-find will abstract qmd collections with unified interface.
See skill://lev-research for comprehensive research workflows.
Claudesp Variant (~/dcs)
The claudesp variant lives at ~/.claude-sneakpeek/claudesp/config/ with shortcut:
~/dcs → ~/.claude-sneakpeek/claudesp/config/
Directory Structure
~/.claude-sneakpeek/ └── claudesp/ └── config/ # ← ~/dcs points here ├── CLAUDE.md # Variant-specific instructions ├── .claude.json # Variant settings + hooks ├── settings.json # Variant hook configuration ├── commands/ # Commands (copies, allow variant edits) ├── skills/ # Skills (symlinked from ~/.claude/skills/) ├── hooks/ # Same hooks as ~/.claude/hooks/ ├── plans/ # Session plans ├── history.jsonl # Claudesp-specific command history ├── projects/ # Project session indexes └── session-env/ # Session environments
Session Collections
Collection Path Files
claude-sessions
~/.claude/transcripts/
~1558
claudesp-sessions
~/dcs/transcripts/ (or ~/.claude-sneakpeek/claudesp/config/transcripts/ ) ~163
clawdbot-sessions
~/.clawdbot/agents/main/sessions/
~1165
Searching Claudesp History
Search claudesp sessions specifically
qmd query "entity dashboard" -c claudesp-sessions --full -n 5
Cross-variant search (all 3 session stores)
qmd query "lev cms" -c claude-sessions -c claudesp-sessions -c clawdbot-sessions -n 10
Auto-Refresh (Staleness Detection)
How qmd Handles Incremental Updates
qmd tracks file hashes in the index. On qmd update :
-
New files → indexed and added
-
Changed files (hash differs) → re-indexed
-
Unchanged files → skipped (fast)
-
Deleted files → removed from index
This means qmd update is always safe and incremental.
XDG Cache Staleness Check
Index lives at ~/.cache/qmd/index.sqlite (XDG-compliant).
Auto-refresh pattern for hooks/session start:
#!/bin/bash
qmd-auto-refresh.sh - Run on SessionStart or as needed
Checks if index is stale and refreshes incrementally
QMD_INDEX="$HOME/.cache/qmd/index.sqlite" STALENESS_THRESHOLD=86400 # 1 day in seconds
if [ ! -f "$QMD_INDEX" ]; then echo "qmd index missing, creating..." qmd update exit 0 fi
Check last modified time
INDEX_MTIME=$(stat -f %m "$QMD_INDEX" 2>/dev/null || stat -c %Y "$QMD_INDEX" 2>/dev/null) NOW=$(date +%s) AGE=$(( NOW - INDEX_MTIME ))
if [ "$AGE" -gt "$STALENESS_THRESHOLD" ]; then echo "qmd index stale (${AGE}s old), refreshing..." qmd update # Incremental: skips unchanged files via hash else echo "qmd index fresh (${AGE}s old)" fi
Hook Integration
Add to ~/.claude/settings.json SessionStart hooks:
{ "hooks": { "SessionStart": [ { "matcher": "", "hooks": [ { "type": "command", "command": "~/.claude/hooks/qmd-auto-refresh.sh" } ] } ] } }
Session Retention Policy
-
< 2 months old: Keep in qmd index (fast semantic search)
-
2 months old: Grep on demand from raw JSONL files
-
Cleanup: qmd cleanup removes orphaned data, vacuums DB
Collection-Level Staleness
Check which collections need refresh
qmd status | grep "updated" | awk '{print $1, $NF}'
Force refresh specific collection
qmd update -c claude-sessions qmd update -c claudesp-sessions
Refresh all (incremental, safe)
qmd update
Maintenance
Daily Update
Add to jared cron or SessionStart hook:
Incremental update (skips unchanged files)
qmd update
Generate embeddings for new files
qmd embed -f
Weekly Cleanup
qmd cleanup # Remove orphaned data, vacuum DB
Reference
Repository: https://github.com/tobi/qmd Models: HuggingFace (auto-downloaded) Index: ~/.cache/qmd/index.sqlite
Binary: ~/.bun/bin/qmd
Shortcut: ~/dcs → ~/.claude-sneakpeek/claudesp/config/
Technique Map
-
Role definition - Clarifies operating scope and prevents ambiguous execution.
-
Context enrichment - Captures required inputs before actions.
-
Output structuring - Standardizes deliverables for consistent reuse.
-
Step-by-step workflow - Reduces errors by making execution order explicit.
-
Edge-case handling - Documents safe fallbacks when assumptions fail.
Technique Notes
These techniques improve reliability by making intent, inputs, outputs, and fallback paths explicit. Keep this section concise and additive so existing domain guidance remains primary.
Prompt Architect Overlay
Role Definition
You are the prompt-architect-enhanced specialist for lev-find-qmd, responsible for deterministic execution of this skill's guidance while preserving existing workflow and constraints.
Input Contract
-
Required: clear user intent and relevant context for this skill.
-
Preferred: repository/project constraints, existing artifacts, and success criteria.
-
If context is missing, ask focused questions before proceeding.
Output Contract
-
Provide structured, actionable outputs aligned to this skill's existing format.
-
Include assumptions and next steps when appropriate.
-
Preserve compatibility with existing sections and related skills.
Edge Cases & Fallbacks
-
If prerequisites are missing, provide a minimal safe path and request missing inputs.
-
If scope is ambiguous, narrow to the highest-confidence sub-task.
-
If a requested action conflicts with existing constraints, explain and offer compliant alternatives.