Project Knowledge Graph
A local FalkorDB-backed semantic index over all project artifacts. Query by concept across 8 projects: "show me everything about FalkorDB replication" returns the matching skill, the recaps that mention it, and the architecture docs — ranked by TF-IDF relevance with Cypher CONTAINS primary filter.
Architecture
knowledge index
│
▼
Scans project directories for .md artifacts
├── docs/recaps/, docs/daily-recaps/
├── docs/plans/
├── project memory file
├── docs/architecture/, docs/features/, docs/operations/, docs/pipeline/
├── TECHNICAL-DOCUMENTATION.md, FUNCTIONAL-SPECIFICATIONS.md
└── **/SKILL.md (Hermes skills)
│
▼
Chunks by heading-2 boundaries + paragraphs (max 2000 chars)
│
▼
MERGE into FalkorDB graph as :Chunk nodes
│
▼
knowledge query "concept"
│
▼
Stage 1: Cypher CONTAINS (fast primary filter)
Stage 2: TF-IDF re-ranking
│
▼
Ranked results with project, file, heading, snippet
Setup (one-time)
1. Start the FalkorDB container
docker run -d \
--restart=unless-stopped \
-p 127.0.0.1:16379:6379 \
-v knowledge-graph-data:/data \
--name knowledge-graph \
falkordb/falkordb:latest
🔒 Security: The port is bound to
127.0.0.1(localhost only) — FalkorDB is not reachable from other machines on your network. No authentication is configured because the service is only accessible to local processes.📦 Docker image note:
falkordb/falkordb:latestis a mutable tag. To pin by digest (recommended for production), replace the tag with the current SHA256 after pulling:docker pull falkordb/falkordb:latest DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' falkordb/falkordb:latest) echo "Replace :latest with @${DIGEST#*@}"Then use
falkordb/falkordb@sha256:...in thedocker runcommand. Check for updates intentionally rather than relying on automatic pulls.
Auto-starts on Docker daemon start via --restart=unless-stopped. Data persists in the Docker volume.
ℹ️ Persistence note: The container uses
--restart=unless-stoppedto survive Docker restarts but won't auto-restart after a manualdocker stop. To disable persistence entirely, omit--restartand start the container manually when needed.
2. Install the Python dependency
pip install falkordb==1.6.1
3. Verify
python3 ~/.hermes/scripts/project-knowledge-index.py doctor
Should show all 8 project roots with document counts and a healthy FalkorDB connection.
Usage
Index all projects
# First index (~60s) — incremental thereafter (<1s)
python3 ~/.hermes/scripts/project-knowledge-index.py index
# Preview without writing
python3 ~/.hermes/scripts/project-knowledge-index.py index --dry-run
⚠️ Before first real run: Use
--dry-runto preview which files would be indexed. This is especially important when configuringPROJECT_ROOTSfor the first time — it shows you exactly which files the scraper will read before any data is written to FalkorDB.python3 ~/.hermes/scripts/project-knowledge-index.py index --dry-run
Query by concept
# Basic search
python3 ~/.hermes/scripts/project-knowledge-index.py query "FalkorDB replication"
# Filter by project
python3 ~/.hermes/scripts/project-knowledge-index.py query "batch writes" --project CI
# Filter by document type (recap, plan, skill, claude, architecture)
python3 ~/.hermes/scripts/project-knowledge-index.py query "soft delete" --type skill
Stats and health
python3 ~/.hermes/scripts/project-knowledge-index.py stats
python3 ~/.hermes/scripts/project-knowledge-index.py doctor
Query Algorithm
Two-stage ranking:
- Cypher CONTAINS — finds all chunks whose text contains the search terms. Fast, exact-match primary filter. Supports filtering by
--projectand--typeat this stage. - TF-IDF re-ranking — computes term frequency × inverse document frequency for each candidate against the query tokens. Scores are relative within the result set (best match = highest score).
This gives substantially better results than pure CONTAINS without needing LLM embeddings or API calls.
How to use this in a session
When working on a problem you've seen before:
# How did we handle PostgreSQL migrations?
python3 ~/.hermes/scripts/project-knowledge-index.py query "PostgreSQL migration"
# What patterns exist for batch operations across projects?
python3 ~/.hermes/scripts/project-knowledge-index.py query "batch" --type skill
# Did we ever document Ollama rate limiting?
python3 ~/.hermes/scripts/project-knowledge-index.py query "Ollama rate limit"
# What did we learn about soft deletes?
python3 ~/.hermes/scripts/project-knowledge-index.py query "soft delete" --project MyApp
What's indexed
| Project | Typical docs | Sample content |
|---|---|---|
| Any project with a project memory file | Recaps, plans, memory file | Session recaps, implementation plans, methodology rules |
| Any project with docs/ | Architecture, features, operations, pipeline | Topical docs about every component |
| Any project with TECHNICAL/FUNCTIONAL specs | Full technical specs | Architecture decisions, schema docs |
Hermes skills (~/.hermes/skills/) | All SKILL.md files | Every installed skill — procedures, tips, workflows |
| Hermes Agent repo | Built-in skills, docs, plans | Stock skills, plugin docs |
Why FalkorDB over SQLite
| Dimension | SQLite (hash) | FalkorDB (graph — chosen) |
|---|---|---|
| Primary filter | Simhash distance (fuzzy) | Cypher CONTAINS (exact match) |
| Cross-project query | Manual JOINs | MATCH (c:Chunk {project:'CI'}) RETURN c |
| Data model flexibility | Fixed SQL schema | Add node types, edges on the fly |
| Infrastructure | File on disk | Single Docker container (localhost-only, 200MB idle) |
| Query speed | Sub-second | Sub-second |
The Docker container is a one-time setup. After that, it starts on boot automatically (--restart=always) and uses <200MB RAM when idle.
Configuring Your Projects
Edit PROJECT_ROOTS in ~/.hermes/scripts/project-knowledge-index.py to point at your own project directories:
PROJECT_ROOTS = {
"MyApp": os.path.expanduser("~/Projects/myapp"),
"Website": os.path.expanduser("~/Desktop/website"),
"Skills": os.path.expanduser("~/.hermes/skills"),
"Hermes": os.path.expanduser("~/.hermes/hermes-agent"),
}
The Custom-Skills and Hermes-Agent entries are optional but recommended — they index every Hermes skill you have (both installed and stock) so you can search across all known patterns.
Pro tip: Add your projects in priority order. The find_documents() function respects your glob patterns — it specifically scopes to docs/recaps/*.md, docs/plans/*.md, project memory files, docs/architecture/*.md, docs/features/*.md, SKILL.md, etc. It does NOT crawl node_modules/.
Pitfalls
1. CONTAINS is case-sensitive
"FalkorDB" matches, "falkordb" does not. Use the casing you expect in documents. Document text preserves original casing from markdown files.
2. Short queries produce noisy results
"DB" or "pipeline" match too broadly. Mitigation: add --project or --type filters, or use more specific terms (e.g., "FalkorDB replication" not just "FalkorDB").
3. TF-IDF scores are relative within result set
A score of 0.06 on a 10-result query doesn't mean "6% match" — it means "the top result by far." Use scores to rank within a single query, not to compare across queries.
4. FalkorDB container must be running
docker start knowledge-graph # if stopped
5. Massive node_modules directories hang rglob-based scans
The initial version used Path.rglob("*.md") to count files in doctor. Projects with large node_modules directories (Beacon-v2 had 775+ entries in one subdirectory with symlink loops) caused 30s+ hangs. Fix: never rglob a project root. Use targeted root.glob(pattern) with specific paths like docs/recaps/*.md, docs/plans/*.md. The find_documents function already does this correctly — the original doctor function was the only offender.
6. First index is slow (~60s)
Subsequent runs are <1s (only new/changed files are re-indexed). The indexer uses content hash + mtime for change detection.
6. The skill indexes itself
Since the project-knowledge-graph SKILL.md is in ~/.hermes/skills/, it gets indexed. Query results may show matches from this skill's documentation — usually harmless.
Data Retention & Purge
Indexed content persists in the Docker volume until explicitly removed. This means cross-project knowledge is available across sessions without re-indexing, but stale or sensitive content remains searchable until purged.
Important: the indexer uses MERGE (upsert), not sync. When a source file is modified, its chunks are re-indexed on the next knowledge index run. However, if a file is deleted or a chunk is removed, the old chunks remain in FalkorDB until the volume is purged. The indexer does not track deletions — it only adds and updates.
If you need to guarantee no stale content remains after removing source files:
# Full purge and rebuild
docker stop knowledge-graph && docker rm knowledge-graph && docker volume rm knowledge-graph-data
docker run -d --restart=unless-stopped -p 127.0.0.1:16379:6379 -v knowledge-graph-data:/data --name knowledge-graph falkordb/falkordb:latest
python3 ~/.hermes/scripts/project-knowledge-index.py index
Or use the delete command for targeted removal without stopping the container:
# Delete all chunks for a specific project
python3 ~/.hermes/scripts/project-knowledge-index.py delete --project CI
# Delete all chunks of a specific type across all projects
python3 ~/.hermes/scripts/project-knowledge-index.py delete --type recap
# Delete all chunks (rebuild from scratch)
python3 ~/.hermes/scripts/project-knowledge-index.py delete --all
python3 ~/.hermes/scripts/project-knowledge-index.py index
Stop the service (keep data)
docker stop knowledge-graph
To re-enable: docker start knowledge-graph
Clear all indexed data (stop + purge)
docker stop knowledge-graph && docker rm knowledge-graph && docker volume rm knowledge-graph-data
After purging, re-run knowledge index (first run will be ~60s to rebuild the index).
Selective re-indexing (update specific projects)
The indexer uses content hashing — unchanged documents are skipped automatically. To force a full re-index of all projects (e.g. after changing PROJECT_ROOTS):
# Option A: Clear and re-index
docker stop knowledge-graph && docker rm knowledge-graph && docker volume rm knowledge-graph-data
docker run -d --restart=unless-stopped -p 127.0.0.1:16379:6379 -v knowledge-graph-data:/data --name knowledge-graph falkordb/falkordb:latest
python3 ~/.hermes/scripts/project-knowledge-index.py index
# Option B: Just re-index (updates changed files, retains unchanged)
python3 ~/.hermes/scripts/project-knowledge-index.py index
CLI Reference
usage: knowledge index|query|stats|doctor|delete
Commands:
index Scan and index all project documents
--dry-run Preview without indexing
query Search the knowledge graph by concept
terms Search terms (required)
--project, -p Filter by project name
--type, -t Filter by doc type (recap/plan/skill/claude/architecture)
--limit, -l Max results (default: 10)
stats Show corpus statistics (chunks per project/type)
doctor Check environment and connectivity
delete Delete chunks from the graph
--project, -p Delete all chunks for a project
--type, -t Delete all chunks of a type
--all Delete ALL chunks (equivalent to docker volume purge)
Indexed Document Types
| Type | What's included |
|---|---|
recap | docs/recaps/*.md, docs/daily-recaps/*.md |
plan | docs/plans/*.md |
claude | Project memory files (e.g. PROJECT.md) |
architecture | docs/architecture/*.md, docs/features/*.md, docs/operations/*.md, docs/pipeline/*.md, TECHNICAL-DOCUMENTATION.md, FUNCTIONAL-SPECIFICATIONS.md |
skill | **/SKILL.md anywhere in project or ~/.hermes/skills/ |
Auto-projection (via session-wrapup)
New knowledge is projected into the graph automatically at session end. The session-wrapup skill runs knowledge index as one of its steps, which picks up:
- New recaps — every session recap gets indexed automatically
- Updated plans — plan status changes and criteria updates get indexed
- Modified project memory files — any "Today's state" updates get indexed
- New/changed skills — any skill edited or created gets indexed
Additionally, wrapup queries the graph with key terms from the session's work and surfaces cross-project connections as "Did you know?" findings in the wrap-up report.
No manual steps needed. The knowledge graph stays current as a side effect of the normal session lifecycle.
-
knowledge doctorshows all project roots and healthy FalkorDB connection -
knowledge indexcompletes and reports chunk count -
knowledge query "some concept"returns ranked results with project/file/snippet -
knowledge query --project CI --type planfilters correctly - Incremental: running
knowledge indextwice skips unchanged files - Container auto-starts on Docker daemon restart
- Cross-project: query returns results from multiple projects