Research Library Skill

A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.

What It Does

Store documents — Code, PDFs, CAD files, images, schematics
Extract automatically — Text from PDFs, EXIF from images, functions from code
Search intelligently — Full-text with material-type weighting (your work ranks higher than external research)
Project isolation — Arduino separate from CNC; no contamination
Cross-reference — Link knowledge: "this servo tuning applies to that project"
Async extraction — Searches never block while OCR runs
Backup daily — 30-day rolling snapshots

Installation

clawhub install research-library
# OR
pip install /path/to/research-library

Quick Start

# Initialize database
reslib status

# Add a project
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference

# Search
reslib search "servo tuning"

# Link knowledge
reslib link 5 12 --type applies_to

Features

CLI Commands

reslib add — Import documents (auto-detect + extract)
reslib search — Full-text search with filters
reslib get — View document details
reslib archive / reslib unarchive — Manage documents
reslib export — Export as JSON/Markdown
reslib link — Create document relationships
reslib projects — Manage projects
reslib tags — Manage tags
reslib status — System overview
reslib backup / reslib restore — Snapshots
reslib smoke_test.sh — Quick validation

Technical

Storage: SQLite 3.45+ with FTS5 virtual table
Extraction: PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
Confidence Scoring: 0.0-1.0 based on quality + source
Material Weighting: Reference (1.0) vs Research (0.5)
Project Isolation: Scoped searches, no contamination
Async Workers: 2-4 configurable extraction workers
Catalog Separation: real_world vs openclaw projects
Backup: Daily snapshots, 30-day retention

Configuration

Copy reslib/config.json and customize:

{
  "db_path": "~/.openclaw/research/library.db",
  "num_workers": 2,
  "worker_timeout_sec": 300,
  "max_retries": 3,
  "backup_retention_days": 30,
  "backup_dir": "~/.openclaw/research/backups",
  "file_size_limit_mb": 200,
  "project_size_limit_gb": 2
}

Integration with War Room

Use RL1 protocol in war room DNA:

from reslib import ResearchDatabase, ResearchSearch

db = ResearchDatabase()
search = ResearchSearch(db)

# Before researching, check existing knowledge
prior = search.search("servo tuning", project="rc-quadcopter")
if prior:
    print(f"Found {len(prior)} prior items")
else:
    # New research needed...
    db.add_research(title="...", content="...", ...)

Performance

All targets exceeded:

Operation	Target	Actual
PDF extraction	<100ms	20.6ms
Search (50 docs)	<100ms	0.33ms
Worker throughput	>6/sec	414.69/sec

Testing

# Run all tests
pytest tests/

# Quick smoke test
bash reslib/smoke_test.sh

# Performance tests
pytest tests/test_integration.py -v -k stress

Known Limitations (Phase 2)

OCR quality varies on hand-drawn sketches
FTS5 designed for <10K documents (PostgreSQL path for scale)
No automatic web research gathering (manual only)
Vector embeddings ready but inactive
CAD file parsing is metadata-only

Documentation

See /docs/:

CLI-REFERENCE.md — All commands + examples
EXTRACTION-GUIDE.md — How extraction works
SEARCH-GUIDE.md — Ranking + weighting
WORKER-GUIDE.md — Async queue details
INTEGRATION.md — War room RL1 protocol

Phase 2 Roadmap

Real-world PDF calibration
FTS5 scaling tests (10K docs)
Auto-detection (reference vs research)
Web research enrichment
Vector embeddings (semantic search)
PostgreSQL upgrade path

Building From Source

cd research-library
pip install -e .
pytest tests/
python -m reslib status

Support

Issues? See TECHNICAL-NOTES.md for troubleshooting.

Production-ready MVP. 214 tests passing. 15K lines. Ready to use.

research-library

Safety Notice

Copy this and send it to your AI assistant to learn