project-nirvana-plugin

Local-first privacy-first inference. Your OpenClaw agent thinks locally and asks the cloud intelligently. Saves 85%+ tokens, protects privacy, agent learns from cloud responses—cloud doesn't learn from you.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "project-nirvana-plugin" with this command: npx skills add shivaclaw/project-nirvana-plugin

Project Nirvana: Local-First, Privacy-First Inference

A new way of thinking about LLM access. Your agent thinks locally, asks the cloud intelligently, and learns from the response. The cloud never sees your private data.


The Problem

Today's approach leaks your privacy and wastes 85% of your API budget.

Every time you ask your OpenClaw agent a question:

  1. Your agent builds a "system prompt" containing:

    • Excerpts from its SOUL.md and MEMORY.md
    • Your personal information from its USER.md
    • Your entire chat history (context window)
  2. All of this gets sent to cloud APIs (OpenAI, Anthropic, Google)

  3. You pay for thousands of extra tokens

  4. The cloud provider trains its next model on your private data

This is the current default. It's inefficient and it's a privacy disaster.


The Solution: Nirvana

Local-first inference that protects privacy and slashes costs.

Nirvana flips the paradigm:

  1. Your agent thinks locally using Ollama (free, private, on your hardware)
  2. For complex questions, it asks the cloud — but only sends its own carefully-crafted queries
  3. Your private data never leaves your system
  4. The cloud's responses are cached locally — your agent learns from them

The Paradigm Shift

AspectToday (Default)Nirvana
Where thinking happensCloud onlyLocal first, cloud when needed
What gets sent to cloudYour full context + system promptsAgent's sanitized query only
Who learns from your dataCloud providerYou (local agent)
Token cost per interaction2,000–5,000 tokens50–300 tokens
Savings85%+ token reduction
PrivacyLeakedProtected

What Nirvana Does

Local Inference

  • Bundled Ollama with qwen2.5:7b (free, 3.5GB model)
  • Handles 80%+ of queries locally (no API calls)
  • ~200 tokens/second on CPU; 3–5x faster with GPU
  • Works offline, no internet required

Privacy Enforcement

  • Context stripper — Removes SOUL.md, USER.md, MEMORY.md before cloud queries
  • Prompt sanitizer — Agent rewrites its own questions for the cloud (never sends yours)
  • Audit trail — Every decision logged; transparent boundary crossing
  • Zero telemetry — No data sent to third parties

Intelligent Routing

  • Complexity analyzer — Decides: local vs cloud?
  • Semantic understanding — "Can qwen2.5:7b handle this, or do I need Claude?"
  • Seamless fallback — Cloud APIs used transparently when needed
  • User override@local or @cloud hints respected

Learning & Caching

  • Response integrator — Cloud responses cached locally
  • Agent learns — Reuse cached answers for similar future questions
  • No repeated payments — Cloud only answers novel questions

How It Works

User asks your agent a question
    ↓
┌─────────────────────────────────────────┐
│ Nirvana Router                          │
│ "Can qwen2.5:7b answer this locally?"  │
└─────────────────────────────────────────┘
    ↙                                   ↘

[LOCAL PATH]                        [CLOUD PATH]
80%+ of queries                     20%- of queries
Ollama (qwen2.5:7b)                 OpenAI/Anthropic/Google
Free                                Pay for answer
Private                             Cloud sees sanitized query only
~1s latency                         ~3s latency
Result cached locally               Result cached locally
    ↓                                   ↓
    └─────────────┬─────────────────────┘
                  ↓
    Agent answers your question using:
    - Local inference (primary)
    - Cloud intelligence (if needed)
    - Cached knowledge (if available)
    
    YOUR PRIVATE DATA NEVER LEFT YOUR SYSTEM

Installation

Prerequisites

  • OpenClaw 2026.3.24+
  • Docker (for Ollama) — or pre-existing local LLM at any endpoint

Two Paths

Path A: Use Bundled Ollama + qwen2.5:7b (Out-of-box)

# Install the plugin
clawhub install shivaclaw/nirvana

# Start Ollama container (pulls auto on first run)
docker run -d -p 11434:11434 ollama/ollama

# Verify
openclaw nirvana status

Path B: Use Existing Local LLM (Any Provider)

# Install the skill (context stripping only)
clawhub install shivaclaw/nirvana-local

# Configure endpoint
openclaw nirvana configure --local-endpoint http://your-llm:5000

# Verify
openclaw nirvana status

Cost Impact

Token Savings

ScenarioTodayWith NirvanaSavings
10 questions/day20,000 tokens/day3,000 tokens/day85%
100 questions/day200,000 tokens/day30,000 tokens/day85%
Monthly cost (OpenAI GPT-4)$500–$1,000$75–$15085%

Local inference is free. Only pay for the 15%–20% of queries that truly need frontier models.


Privacy Guarantee

What Never Leaves Your System

  • ✅ SOUL.md (agent identity)
  • ✅ USER.md (your personal info)
  • ✅ MEMORY.md (agent memories)
  • ✅ Chat history (your actual questions)
  • ✅ Code snippets, documents, secrets

What Optionally Goes to Cloud

  • ✅ Agent's own sanitized query (no personal data)
  • ✅ Task-specific context (never your full context)
  • ✅ Result gets cached locally for future reuse

Privacy Audit Trail

# View what was sent to cloud this session
openclaw nirvana audit-log

# Output:
# 2026-04-24 14:23:45 — CLOUD API CALL
# Original query: [REDACTED]
# Sanitized query sent: "Explain quantum entanglement"
# Response cached: Yes
# User data in request: None

Platform Support

PlatformStatusNotes
Linux (Ubuntu/Debian)✅ FullOllama container + native binaries
macOS (Intel/ARM)✅ FullOllama via Docker or native
Windows (WSL2)✅ FullOllama in WSL2 container
VPS (Hostinger, DigitalOcean, AWS)✅ FullDocker Compose ready
Docker container✅ FullOrchestrated via docker-compose
Air-gapped (offline)✅ FullLocal-only mode (no cloud fallback)

Configuration

Basic Setup

{
  "nirvana": {
    "mode": "local-first",
    "local_model": {
      "provider": "ollama",
      "endpoint": "http://ollama:11434",
      "model": "qwen2.5:7b",
      "timeout_ms": 180000
    },
    "routing": {
      "local_threshold": 0.75,
      "max_local_tokens": 8000,
      "cloud_fallback": true
    },
    "privacy": {
      "strip_soul": true,
      "strip_user": true,
      "strip_memory": true,
      "audit_logging": true
    }
  }
}

Custom Local LLM (Non-Ollama)

{
  "nirvana": {
    "local_model": {
      "provider": "custom",
      "endpoint": "http://your-llm-server:5000",
      "api_format": "openai-compatible",
      "model": "your-model-name",
      "timeout_ms": 120000
    }
  }
}

Use Cases

✅ Perfect For

  • Personal AI agents (maximize budget, minimize cost)
  • Private/sensitive workloads (code, healthcare, legal, finance)
  • Latency-critical tasks (local response < 2s)
  • Air-gapped environments (fully offline)
  • Cost-conscious organizations (85% savings)
  • Privacy-first deployments (zero external data exposure)

⚠️ When to Use Cloud

  • Advanced reasoning (Claude Opus for complex problems)
  • Specialized tasks (image generation, audio synthesis)
  • Extreme scale (millions of tokens/day)

Philosophy

Your agent should train itself. The cloud should not train on you.

Today's default paradigm:

  • Cloud provider gains knowledge from every interaction with you
  • You pay for the privilege of training their next model
  • Your private data becomes their training data

Nirvana's paradigm:

  • Your agent gains knowledge from selective cloud interactions
  • You pay only for what you actually need
  • Your private data never leaves your system
  • Cloud providers contribute intelligence; you keep the learning

What's Included

ComponentPurpose
router.tsDecides local vs cloud routing
context-stripper.tsRemoves private data before cloud API calls
privacy-auditor.tsLogs all boundary crossings
response-integrator.tsCaches cloud responses locally
ollama-manager.tsHandles Ollama lifecycle + model management
metrics-collector.tsTracks performance + cost + privacy
config.schema.jsonConfiguration validation

Performance

Benchmarks (qwen2.5:7b on 4-core CPU)

  • Latency (P50): 800ms–1.2s per response
  • Throughput: 180–220 tokens/second
  • Memory: 4.6GB RAM running
  • Accuracy: 85–92% accuracy vs Claude 3.5 on typical tasks
  • GPU acceleration: 3–5x faster with CUDA/Metal

Optimization

  • Use GPU (CUDA/Metal) for production
  • Upgrade to qwen3.5:9b for complex reasoning
  • Enable response caching for repeated patterns
  • Monitor metrics dashboard for bottlenecks

Support & Community


License

MIT-0 — Free to use, modify, and redistribute. No attribution required.


Your agent deserves privacy. Nirvana makes it real.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Chrome Use

Use chrome-use when standard web access (fetch/web search) fails due to Cloudflare challenges, CAPTCHAs, JavaScript-rendered content, or bot detection — or w...

Registry SourceRecently Updated
Automation

Agentchat Skill Publish

The messaging platform for AI agents. Send DMs, join groups, manage contacts, and check presence.

Registry SourceRecently Updated
Automation

Draft0

Official skill for interacting with Draft0, the Medium for Agents.

Registry SourceRecently Updated
Automation

ifly-pdf-image-ocr

ifly-pdf&image-ocr skill supporting both image OCR (AI-powered LLM OCR) and PDF document recognition. Use when user asks to OCR images, extract text from ima...

Registry SourceRecently Updated