duru-obsidian-kb

Build and maintain a personal Obsidian-based knowledge base from articles, papers, repositories, datasets, spreadsheets, and local files. Use when the user wants to collect source material into a local markdown wiki, incrementally compile notes into structured knowledge pages, ask questions against the knowledge base, export research outputs such as markdown briefs or Marp slides, or run health checks for consistency, coverage, and broken links.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "duru-obsidian-kb" with this command: npx skills add durugy/duru-obsidian-kb

Duru Obsidian KB

Overview

Use this skill to operate a local knowledge-base workflow inspired by "raw → compiled wiki → outputs". Keep the system markdown-first, Obsidian-friendly, incremental, and auditable. Prefer producing files in the knowledge base over chat-only answers when the user is doing research or building long-lived notes.

Current scope

Implement and use the current workflow:

  1. Ingest source material into raw/
  2. Normalize metadata into a manifest
  3. Extract web article text when available
  4. Download and ingest papers, including arXiv URLs and PDF-backed sources, when possible
  5. Route incoming content to the most appropriate configured KB repository using rules-first routing with a default fallback, and optionally a local model hook for low-confidence cases
  6. Clone and summarize repositories when requested or when the source is a GitHub repo
  7. Build or refresh source pages, concept pages, backlinks, and indexes in wiki/
  8. Generate output documents in outputs/
  9. Run lightweight lint checks over structure and metadata

Treat the system as an incremental scaffold with growing intelligence. Do not claim high-quality extraction or synthesis when the data is incomplete.

Directory layout

Use this layout inside the chosen knowledge-base root:

<kb-root>/
  raw/
    articles/
    papers/
    repos/
    files/
  assets/
  wiki/
    concepts/
    sources/
    indexes/
    _meta/
  outputs/
  logs/
  manifest.json
  config.json

If the user does not specify a KB root, default to a folder inside the workspace such as knowledge-bases/<name>. Keep generated files deterministic and easy to diff.

Workflow

1. Initialize a knowledge base

When starting a new KB:

  • Create the standard directory tree
  • Create config.json if missing
  • Create manifest.json if missing
  • Create starter index files in wiki/indexes/
  • Keep all paths relative to the KB root where practical

Use scripts/kb_init.py for initialization.

Example:

python3 scripts/kb_init.py --root /path/to/kb

2. Ingest content

When the user shares a URL or local file:

  • Detect whether it is a web article, PDF, repo URL, or local file
  • Copy or record the source into the correct raw/ subdirectory
  • Generate a slug
  • Append an entry to manifest.json
  • Preserve the original source URL/path in metadata
  • Prefer storing content as markdown when already available; otherwise store a stub entry with metadata and acquisition details

Prefer scripts/kb_add.py as the unified entrypoint when multiple KB repositories are configured. It performs route → ingest → build → summarize. Use scripts/kb_route.py separately when you want a route-only decision or explanation before ingestion.

Example:

python3 scripts/kb_ingest.py --root /path/to/kb --source "https://example.com/article"
python3 scripts/kb_ingest.py --root /path/to/kb --source "/path/to/local.pdf" --type paper

If the source is a normal web article, attempt deterministic extraction first and store the extracted markdown in raw/articles/. Run prompt-shield-lite over extracted text before trusting it. Use prompt-shield-lite as a security/injection scan, then use local heuristics for segment-level noise detection so the pipeline does not over-trigger on rate limits. If extraction fails, is partial, or is flagged as suspicious, record that clearly and preserve suspicious segments for review instead of hallucinating content. If the source is an arXiv abstract URL, normalize it to the PDF URL, try to pull title/abstract metadata from the abs page, download the PDF, and extract a text preview with the available local tools. If the source is a direct PDF or local PDF file, route it through the KB paper path and record both the preferred processor (vendor-anthropic/pdf) and the current fallback used locally. If the source is a local spreadsheet (.xlsx, .xlsm, .csv, .tsv), route it through the KB spreadsheet path and record both the preferred processor (vendor-anthropic/xlsx) and the current fallback used locally. If the source is a GitHub repo, clone it into raw/repos/<slug>/repo/ when possible and generate a repository summary stub.

3. Build the wiki

When the user asks to build or refresh the KB:

  • Read manifest.json
  • Create or refresh wiki/sources/<slug>.md pages from manifest entries
  • Create or refresh wiki/concepts/<tag>.md pages from tags
  • Add backlinks from concept pages to source pages and from source pages to concept pages
  • Refresh wiki/indexes/sources.md
  • Refresh wiki/indexes/tags.md
  • Refresh wiki/indexes/timeline.md
  • Refresh wiki/indexes/concepts.md
  • Keep generation incremental and idempotent

Use scripts/kb_build.py.

The build step should summarize known metadata, link raw items, and create deterministic concept pages from existing tags. Do not fabricate deep conceptual synthesis. If a source lacks extracted body text, say so explicitly in the generated page.

4. Ask questions and generate outputs

When the user asks a question against the KB:

  • Prefer creating a markdown deliverable in outputs/
  • Base the answer on manifest entries and existing wiki pages
  • Cite source pages by relative path when possible
  • Distinguish confirmed facts from hypotheses or gaps

Use scripts/kb_ask.py to generate a structured prompt/output scaffold for the agent.

In Phase 1, this script prepares a research brief shell from the current KB state. The agent may then refine it.

5. Lint the KB

When the user wants a health check:

  • Validate required directories and files
  • Detect manifest entries with missing fields
  • Detect source pages missing from wiki/sources/
  • Detect wiki pages whose manifest entry no longer exists
  • Detect duplicate slugs

Use scripts/kb_lint.py.

Output rules

Prefer these output forms:

  • Research brief: outputs/YYYY-MM-DD-topic.md
  • Marp deck draft: outputs/YYYY-MM-DD-topic.marp.md
  • Topic memo: wiki/concepts/<topic>.md only when the user wants the result filed back into the KB

Keep frontmatter lightweight. Recommended fields:

---
title: ...
slug: ...
source_type: article|paper|repo|file
source_url: ...
ingested_at: ...
tags: []
status: raw|stub|indexed|reviewed
---

Safety and quality

Never silently invent extracted content. If retrieval failed, store a stub and mark it clearly. If prompt-shield-lite flags extracted text or the cleaner detects suspicious/noisy segments, mark the source as suspicious and keep the flagged snippets visible for review. Prefer small deterministic updates over large rewrites. When improving generated wiki pages, preserve provenance and links back to raw sources. Keep the KB usable in Obsidian without requiring proprietary tooling.

Resources

scripts/

Use these scripts as the current backbone:

  • kb_init.py — create folder structure and starter files
  • kb_add.py — unified add flow: route to the best KB repo, ingest content, then optionally build and summarize
  • kb_route.py — route content to the best KB repo using rules-first scoring with default fallback and optional local-model hook
  • kb_ingest.py — register URLs or local files into a chosen KB root, attempt article extraction, ingest arXiv/PDF papers, ingest spreadsheets, and support repo ingest
  • kb_build.py — generate source pages, concept pages, backlinks, and indexes from the manifest
  • kb_summarize_concepts.py — generate first-pass topic memo scaffolds for concept pages
  • kb_ask.py — generate output scaffolds for research questions
  • kb_search.py — run lightweight local relevance search over manifest/wiki sources and return scored snippets
  • kb_chart.py — generate chart artifacts (png + markdown note) from CSV/XLSX data into outputs/ and optionally file back to concept pages
  • kb_lint.py — run structural checks and emit a report
  • kb_healthcheck.py — run lint checks across configured KB repositories from repos.json
  • kb_smoke.py — run end-to-end smoke tests (init/ingest/build/search/ask/lint/chart) in a temporary KB

references/

Read these references when refining the skill:

  • references/layout.md — canonical folder layout and file contracts
  • references/phase-plan.md — roadmap from MVP to richer wiki compilation
  • workspace knowledge-bases/config/repos.json — multi-repo routing configuration with default fallback and optional local-model hook

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Gougoubi Arena Trade

Trade in the Gougoubi AI Trading Arena — a $10,000 simulated-USDT paper trading leaderboard fulfilled against real Binance / OKX / Hyperliquid order books. A...

Registry SourceRecently Updated
Research

Thinkdeep

Structured reasoning protocol for Claude — forces step-by-step analysis, self-critique, and confidence scoring before answering. Reduces wrong answers and ha...

Registry SourceRecently Updated
Research

股票实时行情分析器

A股/港股实时行情查询、基本面分析、深度报告生成与邮件发送一体化工具。触发场景:(1) 用户询问股票价格、市值、PE/PB等数据;(2) 用户要求分析某只或多只股票;(3) 用户要求生成股票分析报告;(4) 用户要求通过邮件发送股票报告。支持AkShare实时行情、聚宽基本面数据、QQ邮箱/Gmail发送。

Registry SourceRecently Updated
260Profile unavailable
Research

Keep 健康记录

Use when users are stating or logging their own health data to Keep or Keep App rather than asking for advice, analysis, or general chat, including weight, b...

Registry SourceRecently Updated
300Profile unavailable