Notcrawl
Local-first Notion crawler. Pulls pages, databases, and blocks into a SQLite store and emits normalized Markdown alongside, so PaperBrain can absorb Notion content into the vault graph.
Requirements
- Notion internal integration token. Create at https://www.notion.so/profile/integrations → New integration → copy the secret (
secret_…orntn_…). - Share each Notion page or database with the integration (Notion's
…menu → Connections → add your integration). Notcrawl can only see what the integration is invited to. notcrawlbinary on PATH (installed at~/.local/bin/notcrawl).
Setup
export NOTION_API_KEY="ntn_…"
notcrawl init # create ~/.notcrawl/config.toml + db
notcrawl sync --full # initial pull of all shared content
notcrawl export-md --out ~/.notcrawl/md # dump normalized Markdown
State
- Config:
~/.notcrawl/config.toml - Database:
~/.notcrawl/notcrawl.db - Markdown export:
~/.notcrawl/md/(configurable)
Common Commands
notcrawl status --json
notcrawl sync --incremental
notcrawl pages list --json
notcrawl search "OKR" --json
notcrawl export-md --out <dir> # regenerate Markdown
notcrawl sql 'SELECT count(*) FROM pages'
Integration Notes
- Markdown export is the bridge to PaperVault — point
--outat a vault folder (e.g.KNOWLEDGE/notion/) to fold Notion content into the graph. - Schedule
notcrawl sync --incremental+notcrawl export-mdvia PaperFang for hands-free mirroring. - Diff-friendly: Markdown output is deterministic, so changes show up cleanly in git.