Research Library Skill
A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.
What It Does
- Store documents — Code, PDFs, CAD files, images, schematics
- Extract automatically — Text from PDFs, EXIF from images, functions from code
- Search intelligently — Full-text with material-type weighting (your work ranks higher than external research)
- Project isolation — Arduino separate from CNC; no contamination
- Cross-reference — Link knowledge: "this servo tuning applies to that project"
- Async extraction — Searches never block while OCR runs
- Backup daily — 30-day rolling snapshots
Installation
clawhub install research-library
# OR
pip install /path/to/research-library
Quick Start
# Initialize database
reslib status
# Add a project
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference
# Search
reslib search "servo tuning"
# Link knowledge
reslib link 5 12 --type applies_to
Features
CLI Commands
reslib add— Import documents (auto-detect + extract)reslib search— Full-text search with filtersreslib get— View document detailsreslib archive/reslib unarchive— Manage documentsreslib export— Export as JSON/Markdownreslib link— Create document relationshipsreslib projects— Manage projectsreslib tags— Manage tagsreslib status— System overviewreslib backup/reslib restore— Snapshotsreslib smoke_test.sh— Quick validation
Technical
- Storage: SQLite 3.45+ with FTS5 virtual table
- Extraction: PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
- Confidence Scoring: 0.0-1.0 based on quality + source
- Material Weighting: Reference (1.0) vs Research (0.5)
- Project Isolation: Scoped searches, no contamination
- Async Workers: 2-4 configurable extraction workers
- Catalog Separation: real_world vs openclaw projects
- Backup: Daily snapshots, 30-day retention
Configuration
Copy reslib/config.json and customize:
{
"db_path": "~/.openclaw/research/library.db",
"num_workers": 2,
"worker_timeout_sec": 300,
"max_retries": 3,
"backup_retention_days": 30,
"backup_dir": "~/.openclaw/research/backups",
"file_size_limit_mb": 200,
"project_size_limit_gb": 2
}
Integration with War Room
Use RL1 protocol in war room DNA:
from reslib import ResearchDatabase, ResearchSearch
db = ResearchDatabase()
search = ResearchSearch(db)
# Before researching, check existing knowledge
prior = search.search("servo tuning", project="rc-quadcopter")
if prior:
print(f"Found {len(prior)} prior items")
else:
# New research needed...
db.add_research(title="...", content="...", ...)
Performance
All targets exceeded:
| Operation | Target | Actual |
|---|---|---|
| PDF extraction | <100ms | 20.6ms |
| Search (50 docs) | <100ms | 0.33ms |
| Worker throughput | >6/sec | 414.69/sec |
Testing
# Run all tests
pytest tests/
# Quick smoke test
bash reslib/smoke_test.sh
# Performance tests
pytest tests/test_integration.py -v -k stress
Known Limitations (Phase 2)
- OCR quality varies on hand-drawn sketches
- FTS5 designed for <10K documents (PostgreSQL path for scale)
- No automatic web research gathering (manual only)
- Vector embeddings ready but inactive
- CAD file parsing is metadata-only
Documentation
See /docs/:
CLI-REFERENCE.md— All commands + examplesEXTRACTION-GUIDE.md— How extraction worksSEARCH-GUIDE.md— Ranking + weightingWORKER-GUIDE.md— Async queue detailsINTEGRATION.md— War room RL1 protocol
Phase 2 Roadmap
- Real-world PDF calibration
- FTS5 scaling tests (10K docs)
- Auto-detection (reference vs research)
- Web research enrichment
- Vector embeddings (semantic search)
- PostgreSQL upgrade path
Building From Source
cd research-library
pip install -e .
pytest tests/
python -m reslib status
Support
Issues? See TECHNICAL-NOTES.md for troubleshooting.
Production-ready MVP. 214 tests passing. 15K lines. Ready to use.