ClawText Ingest

Multi-source memory ingestion with Discord support, automatic deduplication, and agent-ready patterns

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ClawText Ingest" with this command: npx skills add ragesaq/clawtext-ingest

ClawText Ingest — Production-Ready Memory Ingestion

Version: 1.3.0 | License: MIT | Status: Production ✅
Author: ragesaq | Category: Memory & Knowledge Management
GitHub: https://github.com/ragesaq/clawtext-ingest


🎯 What It Does

ClawText Ingest transforms external data (Discord forums, files, URLs, JSON, text) into structured, deduplicated memories for AI agents.

The Problem It Solves

  • Manual ingestion — Tedious, error-prone, no metadata
  • Duplicate memories — Same data ingested multiple times
  • Unstructured data — No hierarchy, no context preservation
  • One-time imports — No recurring/scheduled ingestion
  • Discord-specific gaps — Can't preserve forum post↔reply structure

The Solution

One command imports from Discord, files, URLs, or JSON
100% idempotent — Run 1000x, zero duplicates
Automatic metadata — YAML frontmatter with date, project, type, entities
6 agent patterns — Autonomous workflows documented and ready
Discord-native — Forum hierarchy preserved, progress bars, auto-batch mode


✨ Key Features

🎯 Discord Integration (New in v1.3.0)

  • Forum + Channel + Thread support
  • Hierarchy preservation — Post↔reply structure in metadata
  • Real-time progress — Live feedback for large ingestions
  • Auto-batch mode — <500 posts: full, ≥500 posts: streaming
  • One-command setup — 5-minute bot creation

📁 Multi-Source Ingestion

  • Files — Glob patterns (Markdown, text, etc.)
  • URLs — Single or bulk URL ingestion
  • JSON — Chat exports, API responses
  • Raw text — Quick knowledge capture
  • Batch operations — Unified ingestion from multiple sources

🔄 Deduplication & Safety

  • SHA1-based — Cryptographic hash matching
  • 100% idempotent — Safe for repeated runs
  • ConfigurablecheckDedupe: true/false per operation
  • Zero data loss — Failed items tracked, fallback per-item ingestion
  • Hash persistence.ingest_hashes.json for cross-session tracking

🤖 Agent-Ready

  • 6 documented patterns — Direct API, Discord Agent, CLI, Cron, Batch, Thread
  • Working code examples — Copy-paste ready
  • Real-world patterns — GitHub sync, Discord monitoring, team decisions
  • Error handling — Comprehensive error recovery
  • Progress callbacks — Track ingestion in real-time

🛠️ Developer-Friendly

  • CLI toolclawtext-ingest + clawtext-ingest-discord commands
  • Node.js API — Simple imports for programmatic use
  • TypeScript-ready — Clear method signatures
  • Extensible — Custom transforms, field mapping
  • Well-documented — 11 guides, 20+ examples

🔗 ClawText Integration

  • Automatic cluster indexing — New memories indexed after rebuild
  • RAG injection — Relevant context injected into agent prompts
  • Project routing — Organize memories by project/source
  • Entity linking — Auto-extract and link related entities

🚀 Quick Start

Installation

# Via npm
npm install clawtext-ingest

# Via OpenClaw
openclaw install clawtext-ingest

Discord Ingestion (5 minutes)

# 1. Set up Discord bot (see DISCORD_BOT_SETUP.md)
# 2. Get bot token, set DISCORD_TOKEN env var

# 3. Inspect forum
clawtext-ingest-discord describe-forum --forum-id FORUM_ID --verbose

# 4. Ingest with progress
DISCORD_TOKEN=xxx clawtext-ingest-discord fetch-discord --forum-id FORUM_ID

# 5. Rebuild ClawText clusters
clawtext-ingest rebuild

File Ingestion

clawtext-ingest ingest-files --input="docs/*.md" --project="docs"

Node.js API

import { ClawTextIngest } from 'clawtext-ingest';

const ingest = new ClawTextIngest();

// Ingest files
await ingest.fromFiles(['docs/**/*.md'], { project: 'docs', type: 'fact' });

// Ingest JSON
await ingest.fromJSON(chatArray, { project: 'team' }, {
  keyMap: { contentKey: 'message', dateKey: 'timestamp', authorKey: 'user' }
});

// Rebuild clusters for RAG injection
await ingest.rebuildClusters();

🤖 Agent Integration (6 Patterns)

Pattern 1: Direct API

For: In-agent code
Use when: Agents need to ingest as part of workflow

const ingest = new ClawTextIngest();
await ingest.fromFiles(['docs/**/*.md'], { project: 'docs' });

Pattern 2: Discord Agent

For: Autonomous Discord ingestion
Use when: Agents need to fetch Discord forums

const runner = new DiscordIngestionRunner(ingest);
await runner.ingestForumAutonomous({
  forumId, mode: 'batch', token: process.env.DISCORD_TOKEN
});

Pattern 3: CLI Subprocess

For: Agents executing commands
Use when: Simpler CLI-based execution needed

await execAsync('clawtext-ingest-discord fetch-discord --forum-id ID');

Pattern 4: Cron/Scheduled

For: Recurring tasks
Use when: Daily/hourly ingestion needed

cron.schedule('0 * * * *', () => agentIngest());

Pattern 5: Batch Multi-Source

For: Unified ingestion
Use when: Multiple sources in one operation

await ingest.ingestAll([
  { type: 'files', data: ['docs/**/*.md'], metadata: {...} },
  { type: 'json', data: chatExport, metadata: {...} }
]);

Pattern 6: Discord Thread

For: Thread-specific ingestion
Use when: Single thread fetch needed

await runner.ingestThread(threadId);

→ See AGENT_GUIDE.md for complete examples


📊 Real-World Examples

Example 1: Daily Documentation Sync

async function syncDocsDaily() {
  const ingest = new ClawTextIngest();
  const result = await ingest.ingestAll([
    { type: 'files', data: ['docs/**/*.md'], metadata: { project: 'docs' } },
    { type: 'urls', data: ['https://docs.example.com/api'], metadata: { project: 'api-docs' } }
  ]);
  await ingest.rebuildClusters();
  return result;
}

Example 2: Discord Forum Monitoring

async function monitorDiscordForum(forumId) {
  const ingest = new ClawTextIngest();
  const runner = new DiscordIngestionRunner(ingest);
  
  const result = await runner.ingestForumAutonomous({
    forumId,
    mode: 'batch',
    token: process.env.DISCORD_TOKEN,
    onProgress: (p) => console.log(`${p.percent}% complete...`)
  });
  
  return result;
}

Example 3: Team Decisions Ingestion

async function ingestTeamDecisions() {
  const ingest = new ClawTextIngest();
  
  const result = await ingest.ingestAll([
    { type: 'files', data: ['decisions/adr/**/*.md'], metadata: { type: 'adr' } },
    { type: 'json', data: slackThread, metadata: { type: 'decision', source: 'slack' } }
  ]);
  
  await ingest.rebuildClusters();
  return result;
}

🛒 CLI Commands

clawtext-ingest — File/URL/JSON/Text Ingestion

clawtext-ingest ingest-files --input="docs/*.md" --project="docs" --verbose
clawtext-ingest ingest-urls --input="https://example.com" --project="research"
clawtext-ingest ingest-json --input=messages.json --source="slack"
clawtext-ingest ingest-text --input="Finding: X is better than Y" --project="findings"
clawtext-ingest batch --config=sources.json
clawtext-ingest rebuild
clawtext-ingest status

clawtext-ingest-discord — Discord Integration

# Inspect forum
clawtext-ingest-discord describe-forum --forum-id FORUM_ID --verbose

# Fetch & ingest
DISCORD_TOKEN=xxx clawtext-ingest-discord fetch-discord \
  --forum-id FORUM_ID \
  --mode batch \
  --batch-size 100 \
  --verbose

📚 Documentation

DocumentPurposeRead Time
README.mdOverview + quick start5 min
QUICKSTART.md5-minute setup5 min
AGENT_GUIDE.md6 autonomous patterns10 min
API_REFERENCE.mdComplete API docs15 min
PHASE2_CLI_GUIDE.mdCLI commands10 min
DISCORD_BOT_SETUP.mdBot creation5 min
CLAYHUB_GUIDE.mdPublication5 min
INDEX.mdDocumentation index2 min

🎯 Who Should Use This

  • AI/Agent developers — Building knowledge-aware agents
  • RAG engineers — Populating memory for context injection
  • Teams using Discord — Leveraging Discord as knowledge base
  • DevOps/MLOps — Automated knowledge ingestion pipelines
  • Researchers — Structuring unstructured data sources

⚡ Performance

OperationSpeedNotes
Ingest 100 files~5 secWith SHA1 dedup check
Ingest 1000 JSON items~15 secBatch processing
Small forum (<100 msgs)~10 secFull mode
Large forum (1000+ msgs)~2 minAuto-batch, streaming
Rebuild clusters~5-30 secDepends on total memories

✅ Quality Metrics

MetricValue
Tests22/22 passing ✅
Code1,254 production lines
Documentation92 KB across 11 guides
Examples20+ working examples
Coverage100% critical paths

🔗 Integration with ClawText

  1. Ingest data → Creates memories with YAML metadata
  2. Rebuild clusters → ClawText indexes new memories
  3. RAG layer → Relevant context injected on next prompt
  4. Agent response — Enhanced with contextual information
# Complete workflow
clawtext-ingest-discord fetch-discord --forum-id ID  # Step 1
clawtext-ingest rebuild                               # Step 2
# Step 3-4 automatic (ClawText + Agent)

🆘 Support


📦 Installation & Requirements

Requirements:

  • Node.js ≥ 18.0.0
  • OpenClaw (for agent patterns)
  • ClawText ≥ 1.2.0 (for RAG integration)

Installation:

npm install clawtext-ingest
# or
openclaw install clawtext-ingest

Binaries:

  • clawtext-ingest — File/URL/JSON ingestion
  • clawtext-ingest-discord — Discord integration

🚀 Why This Over Alternatives

FeatureClawText-IngestManualGeneric ImporterAPI Tool
Discord native
DeduplicationPartial
Agent patterns
Metadata autoPartial
ClawText integration
IdempotentPartial

📄 License

MIT — Use freely, open source, community supported


🙌 Contributing

Contributions welcome! See GitHub issues for current priorities.


Ready to ingest? Start with QUICKSTART.md (5 min) or AGENT_GUIDE.md if you're building agents.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

LinkedIn Data Scraper

Scrape LinkedIn profiles, job listings, and company pages. Bypass LinkedIn's aggressive anti-bot detection with sticky residential proxy sessions. Extract na...

Registry SourceRecently Updated
Automation

Tinder Dating Automation

Manage multiple Tinder, Bumble, and Hinge accounts for A/B testing profiles, expanding match radius, and scaling outreach. Use mobile-grade residential proxi...

Registry SourceRecently Updated
Automation

moltbook

The social network for AI agents. Post, comment, upvote, and create communities.

Registry SourceRecently Updated