ragflow-runbook Skill

A practical runbook for deploying, operating, troubleshooting, and calling RAGFlow (Retrieval-Augmented Generation).

Goal: any agent should be able to bring RAGFlow up, diagnose failures, and call the API safely even without knowing the deployment details up front.

1) When To Use

Deploy RAGFlow (Docker / Windows / Linux / WSL2).
Troubleshoot failures: startup issues, unhealthy backend services, port conflicts, performance problems.
Use the API for operations purposes: validate liveness/readiness, verify auth, and check system endpoints.
Run health checks, automate smoke tests, or prepare backup/restore.

2) What The Agent Must Ask First (Minimum Inputs)

Before running any commands, confirm the following (missing any of these often leads to wrong assumptions):

Deployment environment: Windows / WSL2 / Linux / macOS (client only)
Install directory (the directory that contains docker-compose.yml)
Access method:
- RAGFLOW_BASE_URL (e.g. http://localhost:9380 or an internal/Tailscale address)
- Whether there is an Nginx/reverse proxy in front (and whether Web UI uses port 80/8080)
Whether an API key already exists (do NOT paste secrets into chat; use env vars / secret manager)
Current symptom:
- "does not start" vs "starts but UI/API errors" vs "retrieval quality is poor"

Security: never store or share API keys / DB passwords in plaintext (docs, repo, or chat).

3) Canonical Environment Variables (Recommended)

Use environment variables so all agents can run the same commands:

RAGFLOW_BASE_URL: prefer an internal/Tailscale URL, e.g. http://100.x.y.z:9380
RAGFLOW_API_KEY: Bearer token (created in the RAGFlow Web UI)

Quick verification (separate liveness / readiness / auth; tolerate path differences across versions):

Liveness (usually no auth; try in order, any 200 is OK):
- GET $RAGFLOW_BASE_URL/openapi.json
- GET $RAGFLOW_BASE_URL/api/v1/openapi.json
- GET $RAGFLOW_BASE_URL/v1/system/ping
Readiness (often requires auth; try in order):
- GET $RAGFLOW_BASE_URL/v1/system/status
- GET $RAGFLOW_BASE_URL/v1/system/ping

If these do not match your deployment: treat the returned openapi.json as the source of truth.

This skill ships with its own ops helpers under scripts/:

scripts/ragflow_ping.py: liveness + readiness
scripts/ragflow_smoke.py: auth + API smoke (system-level only)
scripts/ragflow_status.py: compact status summary
scripts/ragflow_alert.py: send an ops alert via OpenClaw messaging

This skill is intentionally decoupled from any workspace-specific application content. It focuses only on RAGFlow runtime operations.

4) Bootstrap (Fresh Install; Windows/WSL2 + Linux)

This section targets a brand-new machine. Goal: get to a working UI + API quickly: clone upstream docker bundle -> start -> create API key in UI -> validate via curl/scripts.

4.1 Choose Install Mode (Default)

Primary path (best for most desktop / Windows users): Windows + WSL2
Alternate path: a Linux server (Ubuntu/Debian/CentOS/etc.)

4.1.1 Fresh Install: Copy/Paste (WSL2 / Linux)

WSL2 (recommended: store files on a Windows drive like D:; run commands inside WSL2):

# WSL2
cd /mnt/d

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker

# Common requirement for some document engine profiles
sudo sysctl -w vm.max_map_count=262144 || true

# Default .env = elasticsearch + cpu
# To change ports/passwords/image versions: edit docker/.env

docker compose up -d

docker compose ps

Linux:

# Linux
sudo mkdir -p /opt && cd /opt
sudo chown -R "$USER" /opt

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker

sudo sysctl -w vm.max_map_count=262144 || true

docker compose up -d

docker compose ps

Next: open Web UI (default http://<host>:80), finish initialization, create an API key, then validate using ## 3 + ## 8.

4.2 Get The Official Docker Compose Bundle (Robust; Verified Against Upstream)

To avoid missing files or mismatched versions, use git clone and run from the upstream docker/ directory:

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker

# Optional: pin to a tag/commit for production
# git checkout <tag-or-commit>

The upstream docker/ folder typically includes:

docker-compose.yml (often include: ./docker-compose-base.yml)
docker-compose-base.yml (backend services: database + cache + object storage + document engine)
.env (default ports/passwords; change for production)
service_conf.yaml.template (used to generate service_conf.yaml at container startup)
entrypoint.sh (commonly started with flags like --enable-adminserver / --enable-mcpserver)
nginx/ (for built-in Web UI / reverse proxy)
README.md (docker-specific docs)

Note: upstream explicitly warns that some compose variants (e.g. docker-compose-macos.yml) are not actively maintained. Do not use them unless you know why.

4.3 First Bring-Up (Upstream `COMPOSE_PROFILES`)

Upstream .env defaults:

COMPOSE_PROFILES is derived from selected backend profiles (e.g. document engine + compute device)

So you typically do not need to pass --profile manually. docker compose up -d will pick profiles from .env.

Before starting (Linux/WSL2, for some document engine profiles):

cat /proc/sys/vm/max_map_count || true
sudo sysctl -w vm.max_map_count=262144 || true

Start:

# In ragflow/docker
# Optional: explicit profiles if you do not want to rely on COMPOSE_PROFILES
# docker compose --profile elasticsearch --profile cpu up -d

docker compose up -d

docker compose ps

Switch CPU/GPU (examples):

# Option 1: edit docker/.env
# DEVICE=gpu

# Option 2: override temporarily (do not modify files)
DEVICE=gpu docker compose up -d

Enable embeddings service (TEI): upstream suggests adding a tei profile to COMPOSE_PROFILES:

# Example:
# COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
# or:
# COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-gpu

docker compose up -d

Validation: wait for key services to be running/healthy in docker compose ps, then run liveness/readiness (## 3) and API prefix detection (## 8).

4.4 First-Time Setup Checklist (Aligned With Upstream `.env`)

In upstream docker/.env (main branch), exposed ports typically mean:

Web UI / WebServer: SVR_WEB_HTTP_PORT (default 80), SVR_WEB_HTTPS_PORT (default 443)
API (RAGFlow HTTP): SVR_HTTP_PORT (default 9380)
Admin Server: ADMIN_SVR_HTTP_PORT (default 9381)
MCP: SVR_MCP_PORT (default 9382)

Shortest path to a usable setup:

Open Web UI: http://<host>:${SVR_WEB_HTTP_PORT} (default http://<host>:80)
Complete initialization (admin/org setup depending on version)
Create an API key (usually under Settings/System/API Keys)
Set on the client side (recommended env vars):

RAGFLOW_BASE_URL=http://<host>:${SVR_HTTP_PORT} (default http://<host>:9380)
RAGFLOW_API_KEY=ragflow-... (Bearer token; do not paste secrets into chat)

Then validate with liveness/readiness in ## 3.

Production warning: upstream .env explicitly warns against using default passwords. At minimum change ELASTIC_PASSWORD, MYSQL_PASSWORD, MINIO_PASSWORD, and REDIS_PASSWORD.

5) Quick Start (Ops Checklist)

5.1 Prerequisites

Docker Engine + Docker Compose v2 (docker compose ...)
Resources (rule of thumb):
- CPU >= 4 cores, RAM >= 16GB (32GB recommended)
- Disk >= 50GB (depends on document volume + vector index size)
Linux/WSL2: vm.max_map_count >= 262144 (required by some document engine profiles)

Checks:

docker --version
docker compose version

# Linux/WSL2 only
cat /proc/sys/vm/max_map_count

Temporary fix (Linux/WSL2):

sudo sysctl -w vm.max_map_count=262144

5.2 Bring-Up (Docker Compose)

Prereq: you are in the directory that contains docker-compose.yml.

docker compose up -d
docker compose ps

Tail logs:

docker compose logs -f

6) Service Management (Day-2 Operations)

# Status
docker compose ps

# Start/stop
docker compose up -d
docker compose down

# Restart
docker compose restart

# Logs (all / last N lines / last 1h)
docker compose logs
docker compose logs --tail=200
docker compose logs --since=1h

# Resource usage
docker stats

Note: service names differ across compose versions. If you see "no such service", run docker compose ps and use the actual service name.

7) Health Checks (Common Root Causes)

7.1 Document engine unhealthy / crash loop

Common causes: vm.max_map_count too small, low RAM, disk full.

# Linux/WSL2
cat /proc/sys/vm/max_map_count
sudo sysctl -w vm.max_map_count=262144

docker compose ps
docker compose logs --tail=200 <es-service>

7.2 Database connection failures

docker compose ps
docker compose logs --tail=200 <mysql-service>

7.3 Port conflicts

Symptom: containers fail to start or port mapping fails.
Fix: find the process using the port, or change port mappings in .env / compose and restart.

8) API Usage (Agent-Safe Patterns)

8.1 Base URL + Auth

RAGFlow often exposes:

Web UI (port 80/8080)
API (commonly 9380, or a reverse-proxied path)

Recommended convention:

RAGFLOW_BASE_URL points to the API root, e.g. http://localhost:9380
Auth header: Authorization: Bearer $RAGFLOW_API_KEY

8.2 Minimal curl examples (Prefix Auto-Detect; Prefer `v1`)

Across versions/deployments, RAGFlow may have two API prefixes:

v1/... (often system/user/token)
api/v1/... (often application endpoints)

Use this template to auto-detect the prefix (prefer v1, fallback to api/v1).

# 0) Ensure you are hitting the API port/host (not the UI port)
#    Any 200 is OK
curl -sS -o /dev/null -w "%{http_code}\n" "$RAGFLOW_BASE_URL/openapi.json"
curl -sS -o /dev/null -w "%{http_code}\n" "$RAGFLOW_BASE_URL/api/v1/openapi.json"

# 1) Auto-detect prefix (prefer v1)
RAGFLOW_API_PREFIX=""
if curl -sS -o /dev/null -w "%{http_code}" "$RAGFLOW_BASE_URL/v1/system/ping" | grep -q "200"; then
  RAGFLOW_API_PREFIX="v1"
elif curl -sS -o /dev/null -w "%{http_code}" "$RAGFLOW_BASE_URL/api/v1/openapi.json" | grep -q "200"; then
  RAGFLOW_API_PREFIX="api/v1"
else
  echo "Cannot detect API prefix. Check base URL / reverse proxy / firewall."
  exit 1
fi

echo "Detected prefix: $RAGFLOW_API_PREFIX"

Ops-only examples (no application-level endpoints):

Example 1: system ping (no secrets in output)

curl -sS -o /dev/null -w "%{http_code}\n" "$RAGFLOW_BASE_URL/v1/system/ping"

Example 2: system status (auth)

curl -sS -X GET "$RAGFLOW_BASE_URL/v1/system/status" \
  -H "Authorization: Bearer $RAGFLOW_API_KEY" | head

Example 3: fetch openapi schema (liveness)

curl -sS "$RAGFLOW_BASE_URL/openapi.json" | head

Note: If your deployment uses different paths, openapi.json is the source of truth. Avoid calling application-level endpoints from ops runbooks.

8.3 Agent Guidance

Always fetch openapi.json first to confirm real paths/fields/version differences.
If you get 404: suspect base URL (hitting UI port), reverse proxy misconfig, or a different path prefix.
If you get 401/403: suspect missing/expired key or missing Bearer prefix.

9) Backup / Restore (Practical)

Principle: stop services first, then back up volumes, then back up compose configs.

Backup (example; volume names depend on your environment):

mkdir -p backup

docker run --rm \
  -v <mysql_volume>:/source \
  -v "$PWD/backup":/backup \
  alpine tar czf /backup/mysql-data.tar.gz -C /source .

Restore:

docker compose down

docker run --rm \
  -v <mysql_volume>:/target \
  -v "$PWD/backup":/backup \
  alpine tar xzf /backup/mysql-data.tar.gz -C /target

docker compose up -d

10) Security Baseline (Minimum)

Do not use latest in production; pin image versions.
API keys must be stored in env vars / secret manager only.
Minimize exposure: allow only internal/Tailscale ranges to access API ports.
If public access is required: add TLS + auth at reverse proxy, and restrict source IP ranges.

11) Troubleshooting Flow (Agent Playbook)

When a user says "RAGFlow is not working", use this order to reduce back-and-forth:

docker compose ps (which containers are unhealthy/exited)
docker compose logs --tail=200 <unhealthy-service> (capture the first actionable errors)
Resources: docker stats, disk, vm.max_map_count (Linux/WSL2)
Network: from the caller machine curl $RAGFLOW_BASE_URL/openapi.json
Auth: ragflow_ping.py or GET /v1/system/status (with Bearer)

12) OpenClaw Ops Integration

This section documents an end-to-end operations workflow for running RAGFlow with OpenClaw. It is intentionally decoupled from any application-layer usage and focuses only on RAGFlow runtime operations.

12.1 Scope

Included:

Host/service health verification (liveness/readiness)
Basic auth verification (API key works)
Smoke checks (API reachable, key endpoints respond)
Alerting hooks (what to check, what to report)
Scheduling (daily/periodic checks)

Excluded (by design):

Any application-layer conventions
Content parsing/chunking/index strategy
Retrieval quality evaluation

12.2 Standard Environment Contract

On the machine running OpenClaw, set:

RAGFLOW_BASE_URL (prefer an internal/Tailscale address)
RAGFLOW_API_KEY (Bearer token; never commit; do not paste into chat)

Recommended ops endpoints:

Liveness (no auth): GET $RAGFLOW_BASE_URL/openapi.json
Readiness (auth): GET $RAGFLOW_BASE_URL/v1/system/status

If paths differ in your deployment, use openapi.json as the source of truth.

12.3 Local Workspace Helpers (Preferred)

This skill includes built-in helpers under scripts/. They are designed to be:

non-interactive
safe to run repeatedly
machine-readable (short output + non-zero exit code on failure)

Helpers:

scripts/ragflow_ping.py
- Purpose: liveness + readiness checks.
scripts/ragflow_smoke.py
- Purpose: auth verification + minimal API smoke calls.
- Note: it only uses system endpoints. It does not depend on any application-level data.
scripts/ragflow_status.py
- Purpose: fetch /v1/system/status and print a compact key summary.
scripts/ragflow_alert.py
- Purpose: send an ops alert to Telegram via the openclaw message send CLI.

(Prefer the skill-local scripts so the runbook works in any environment.)

12.3.1 Script Contracts (Inputs / Outputs / Exit Codes)

scripts/ragflow_ping.py

Required env:
- RAGFLOW_BASE_URL
Optional env:
- RAGFLOW_API_KEY (if set, readiness check is performed)
Network calls:
- GET {base_url}/openapi.json (no auth)
- GET {base_url}/v1/system/status (Bearer auth)
Output (examples):
- OK_LIVE (no api key set)
- OK_READY keys=...
- LIVENESS_FAIL ...
- READINESS_FAIL ...
Exit codes:
- 0 OK
- 2 liveness failed
- 3 readiness failed

scripts/ragflow_smoke.py

Required env:
- RAGFLOW_BASE_URL
- RAGFLOW_API_KEY
Network calls:
- GET {base_url}/v1/system/status (auth)
- GET {base_url}/v1/system/ping (auth or no-auth depending on deployment)
Output (examples):
- OK smoke
- FAIL system/status ...
- FAIL system/ping ...
Exit codes:
- 0 OK
- 2 system/status failed
- 3 system/ping failed

scripts/ragflow_status.py

Required env:
- RAGFLOW_BASE_URL
- RAGFLOW_API_KEY
Network calls:
- GET {base_url}/v1/system/status
Output (example):
- OK keys=key1,key2,... (compact, no secrets)
Exit codes:
- 0 OK
- 2 HTTP failure
- 3 invalid JSON

scripts/ragflow_alert.py

Purpose: notify humans when ping/smoke fails.
Inputs:
- CLI flags: --title (required), --details (optional)
- Optional env: OPENCLAW_PRIMARY_CHAT_ID (default target)
Behavior:
- Sends a Telegram message via openclaw message send ....
Notes:
- Do not include secrets in --details.

12.4 Ops Workflow (Suggested)

Connectivity
- Confirm OpenClaw host can reach RAGFLOW_BASE_URL over the network.
Liveness
- Check openapi.json responds with HTTP 200.
Readiness
- Check v1/system/status responds with HTTP 200 when authenticated.
Smoke
- Run the skill-local smoke helper: scripts/ragflow_smoke.py (system endpoints only).
Escalation artifacts
- Collect:
  - docker compose ps
  - docker compose logs --tail=200 <ragflow-service>
  - the exact endpoint + HTTP code observed from the OpenClaw side

12.5 Scheduling (Copy/Paste Examples)

Goal: provide copy/paste recipes. An agent can create these tasks when needed.

12.5.1 cron (Linux)

Ping every 10 minutes and alert on failure:

*/10 * * * * RAGFLOW_BASE_URL="http://127.0.0.1:9380" RAGFLOW_API_KEY="${RAGFLOW_API_KEY}" /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_ping.py || /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_alert.py --title "ping failed" --details "ragflow_ping.py exit=$?"

Smoke once per day at 06:05 and alert on failure:

5 6 * * * RAGFLOW_BASE_URL="http://127.0.0.1:9380" RAGFLOW_API_KEY="${RAGFLOW_API_KEY}" /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_smoke.py || /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_alert.py --title "smoke failed" --details "ragflow_smoke.py exit=$?"

Notes:

Replace /path/to/skills/... with the real absolute path.
Prefer sourcing secrets from a root-owned env file, or use your secret manager. Avoid putting API keys directly into crontab.

12.5.2 launchd (macOS)

Create two plist files (one for ping, one for smoke) and load them with launchctl.

Ping (every 10 minutes):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
    <key>Label</key>
    <string>ai.openclaw.ragflow.ping</string>

    <key>ProgramArguments</key>
    <array>
      <string>/usr/bin/python3</string>
      <string>/ABS/PATH/skills/ragflow-runbook/scripts/ragflow_ping.py</string>
    </array>

    <key>StartInterval</key>
    <integer>600</integer>

    <key>EnvironmentVariables</key>
    <dict>
      <key>RAGFLOW_BASE_URL</key>
      <string>http://127.0.0.1:9380</string>
      <key>RAGFLOW_API_KEY</key>
      <string>${RAGFLOW_API_KEY}</string>
    </dict>

    <key>StandardOutPath</key>
    <string>/tmp/ragflow-ping.out</string>
    <key>StandardErrorPath</key>
    <string>/tmp/ragflow-ping.err</string>

    <key>RunAtLoad</key>
    <true/>
  </dict>
</plist>

Smoke (daily at 06:05):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
    <key>Label</key>
    <string>ai.openclaw.ragflow.smoke</string>

    <key>ProgramArguments</key>
    <array>
      <string>/usr/bin/python3</string>
      <string>/ABS/PATH/skills/ragflow-runbook/scripts/ragflow_smoke.py</string>
    </array>

    <key>StartCalendarInterval</key>
    <dict>
      <key>Hour</key>
      <integer>6</integer>
      <key>Minute</key>
      <integer>5</integer>
    </dict>

    <key>EnvironmentVariables</key>
    <dict>
      <key>RAGFLOW_BASE_URL</key>
      <string>http://127.0.0.1:9380</string>
      <key>RAGFLOW_API_KEY</key>
      <string>${RAGFLOW_API_KEY}</string>
    </dict>

    <key>StandardOutPath</key>
    <string>/tmp/ragflow-smoke.out</string>
    <key>StandardErrorPath</key>
    <string>/tmp/ragflow-smoke.err</string>

    <key>RunAtLoad</key>
    <true/>
  </dict>
</plist>