ragflow-runbook Skill
A practical runbook for deploying, operating, troubleshooting, and calling RAGFlow (Retrieval-Augmented Generation).
Goal: any agent should be able to bring RAGFlow up, diagnose failures, and call the API safely even without knowing the deployment details up front.
1) When To Use
- Deploy RAGFlow (Docker / Windows / Linux / WSL2).
- Troubleshoot failures: startup issues, unhealthy backend services, port conflicts, performance problems.
- Use the API for operations purposes: validate liveness/readiness, verify auth, and check system endpoints.
- Run health checks, automate smoke tests, or prepare backup/restore.
2) What The Agent Must Ask First (Minimum Inputs)
Before running any commands, confirm the following (missing any of these often leads to wrong assumptions):
- Deployment environment:
Windows / WSL2 / Linux / macOS (client only) - Install directory (the directory that contains
docker-compose.yml) - Access method:
RAGFLOW_BASE_URL(e.g.http://localhost:9380or an internal/Tailscale address)- Whether there is an Nginx/reverse proxy in front (and whether Web UI uses port 80/8080)
- Whether an API key already exists (do NOT paste secrets into chat; use env vars / secret manager)
- Current symptom:
- "does not start" vs "starts but UI/API errors" vs "retrieval quality is poor"
Security: never store or share API keys / DB passwords in plaintext (docs, repo, or chat).
3) Canonical Environment Variables (Recommended)
Use environment variables so all agents can run the same commands:
RAGFLOW_BASE_URL: prefer an internal/Tailscale URL, e.g.http://100.x.y.z:9380RAGFLOW_API_KEY: Bearer token (created in the RAGFlow Web UI)
Quick verification (separate liveness / readiness / auth; tolerate path differences across versions):
- Liveness (usually no auth; try in order, any 200 is OK):
GET $RAGFLOW_BASE_URL/openapi.jsonGET $RAGFLOW_BASE_URL/api/v1/openapi.jsonGET $RAGFLOW_BASE_URL/v1/system/ping
- Readiness (often requires auth; try in order):
GET $RAGFLOW_BASE_URL/v1/system/statusGET $RAGFLOW_BASE_URL/v1/system/ping
If these do not match your deployment: treat the returned openapi.json as the source of truth.
This skill ships with its own ops helpers under scripts/:
scripts/ragflow_ping.py: liveness + readinessscripts/ragflow_smoke.py: auth + API smoke (system-level only)scripts/ragflow_status.py: compact status summaryscripts/ragflow_alert.py: send an ops alert via OpenClaw messaging
This skill is intentionally decoupled from any workspace-specific application content. It focuses only on RAGFlow runtime operations.
4) Bootstrap (Fresh Install; Windows/WSL2 + Linux)
This section targets a brand-new machine. Goal: get to a working UI + API quickly: clone upstream docker bundle -> start -> create API key in UI -> validate via curl/scripts.
4.1 Choose Install Mode (Default)
- Primary path (best for most desktop / Windows users): Windows + WSL2
- Alternate path: a Linux server (Ubuntu/Debian/CentOS/etc.)
4.1.1 Fresh Install: Copy/Paste (WSL2 / Linux)
WSL2 (recommended: store files on a Windows drive like D:; run commands inside WSL2):
# WSL2
cd /mnt/d
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
# Common requirement for some document engine profiles
sudo sysctl -w vm.max_map_count=262144 || true
# Default .env = elasticsearch + cpu
# To change ports/passwords/image versions: edit docker/.env
docker compose up -d
docker compose ps
Linux:
# Linux
sudo mkdir -p /opt && cd /opt
sudo chown -R "$USER" /opt
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
sudo sysctl -w vm.max_map_count=262144 || true
docker compose up -d
docker compose ps
Next: open Web UI (default http://<host>:80), finish initialization, create an API key, then validate using ## 3 + ## 8.
4.2 Get The Official Docker Compose Bundle (Robust; Verified Against Upstream)
To avoid missing files or mismatched versions, use git clone and run from the upstream docker/ directory:
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
# Optional: pin to a tag/commit for production
# git checkout <tag-or-commit>
The upstream docker/ folder typically includes:
docker-compose.yml(ofteninclude: ./docker-compose-base.yml)docker-compose-base.yml(backend services: database + cache + object storage + document engine).env(default ports/passwords; change for production)service_conf.yaml.template(used to generateservice_conf.yamlat container startup)entrypoint.sh(commonly started with flags like--enable-adminserver/--enable-mcpserver)nginx/(for built-in Web UI / reverse proxy)README.md(docker-specific docs)
Note: upstream explicitly warns that some compose variants (e.g. docker-compose-macos.yml) are not actively maintained. Do not use them unless you know why.
4.3 First Bring-Up (Upstream COMPOSE_PROFILES)
Upstream .env defaults:
COMPOSE_PROFILESis derived from selected backend profiles (e.g. document engine + compute device)
So you typically do not need to pass --profile manually. docker compose up -d will pick profiles from .env.
Before starting (Linux/WSL2, for some document engine profiles):
cat /proc/sys/vm/max_map_count || true
sudo sysctl -w vm.max_map_count=262144 || true
Start:
# In ragflow/docker
# Optional: explicit profiles if you do not want to rely on COMPOSE_PROFILES
# docker compose --profile elasticsearch --profile cpu up -d
docker compose up -d
docker compose ps
Switch CPU/GPU (examples):
# Option 1: edit docker/.env
# DEVICE=gpu
# Option 2: override temporarily (do not modify files)
DEVICE=gpu docker compose up -d
Enable embeddings service (TEI): upstream suggests adding a tei profile to COMPOSE_PROFILES:
# Example:
# COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-cpu
# or:
# COMPOSE_PROFILES=${COMPOSE_PROFILES},tei-gpu
docker compose up -d
Validation: wait for key services to be running/healthy in docker compose ps, then run liveness/readiness (## 3) and API prefix detection (## 8).
4.4 First-Time Setup Checklist (Aligned With Upstream .env)
In upstream docker/.env (main branch), exposed ports typically mean:
- Web UI / WebServer:
SVR_WEB_HTTP_PORT(default 80),SVR_WEB_HTTPS_PORT(default 443) - API (RAGFlow HTTP):
SVR_HTTP_PORT(default 9380) - Admin Server:
ADMIN_SVR_HTTP_PORT(default 9381) - MCP:
SVR_MCP_PORT(default 9382)
Shortest path to a usable setup:
- Open Web UI:
http://<host>:${SVR_WEB_HTTP_PORT}(defaulthttp://<host>:80) - Complete initialization (admin/org setup depending on version)
- Create an API key (usually under Settings/System/API Keys)
- Set on the client side (recommended env vars):
RAGFLOW_BASE_URL=http://<host>:${SVR_HTTP_PORT}(defaulthttp://<host>:9380)RAGFLOW_API_KEY=ragflow-...(Bearer token; do not paste secrets into chat)
Then validate with liveness/readiness in ## 3.
Production warning: upstream
.envexplicitly warns against using default passwords. At minimum changeELASTIC_PASSWORD,MYSQL_PASSWORD,MINIO_PASSWORD, andREDIS_PASSWORD.
5) Quick Start (Ops Checklist)
5.1 Prerequisites
- Docker Engine + Docker Compose v2 (
docker compose ...) - Resources (rule of thumb):
- CPU >= 4 cores, RAM >= 16GB (32GB recommended)
- Disk >= 50GB (depends on document volume + vector index size)
- Linux/WSL2:
vm.max_map_count >= 262144(required by some document engine profiles)
Checks:
docker --version
docker compose version
# Linux/WSL2 only
cat /proc/sys/vm/max_map_count
Temporary fix (Linux/WSL2):
sudo sysctl -w vm.max_map_count=262144
5.2 Bring-Up (Docker Compose)
Prereq: you are in the directory that contains docker-compose.yml.
docker compose up -d
docker compose ps
Tail logs:
docker compose logs -f
6) Service Management (Day-2 Operations)
# Status
docker compose ps
# Start/stop
docker compose up -d
docker compose down
# Restart
docker compose restart
# Logs (all / last N lines / last 1h)
docker compose logs
docker compose logs --tail=200
docker compose logs --since=1h
# Resource usage
docker stats
Note: service names differ across compose versions. If you see "no such service", run docker compose ps and use the actual service name.
7) Health Checks (Common Root Causes)
7.1 Document engine unhealthy / crash loop
Common causes: vm.max_map_count too small, low RAM, disk full.
# Linux/WSL2
cat /proc/sys/vm/max_map_count
sudo sysctl -w vm.max_map_count=262144
docker compose ps
docker compose logs --tail=200 <es-service>
7.2 Database connection failures
docker compose ps
docker compose logs --tail=200 <mysql-service>
7.3 Port conflicts
- Symptom: containers fail to start or port mapping fails.
- Fix: find the process using the port, or change port mappings in
.env/ compose and restart.
8) API Usage (Agent-Safe Patterns)
8.1 Base URL + Auth
RAGFlow often exposes:
- Web UI (port 80/8080)
- API (commonly 9380, or a reverse-proxied path)
Recommended convention:
RAGFLOW_BASE_URLpoints to the API root, e.g.http://localhost:9380- Auth header:
Authorization: Bearer $RAGFLOW_API_KEY
8.2 Minimal curl examples (Prefix Auto-Detect; Prefer v1)
Across versions/deployments, RAGFlow may have two API prefixes:
v1/...(often system/user/token)api/v1/...(often application endpoints)
Use this template to auto-detect the prefix (prefer v1, fallback to api/v1).
# 0) Ensure you are hitting the API port/host (not the UI port)
# Any 200 is OK
curl -sS -o /dev/null -w "%{http_code}\n" "$RAGFLOW_BASE_URL/openapi.json"
curl -sS -o /dev/null -w "%{http_code}\n" "$RAGFLOW_BASE_URL/api/v1/openapi.json"
# 1) Auto-detect prefix (prefer v1)
RAGFLOW_API_PREFIX=""
if curl -sS -o /dev/null -w "%{http_code}" "$RAGFLOW_BASE_URL/v1/system/ping" | grep -q "200"; then
RAGFLOW_API_PREFIX="v1"
elif curl -sS -o /dev/null -w "%{http_code}" "$RAGFLOW_BASE_URL/api/v1/openapi.json" | grep -q "200"; then
RAGFLOW_API_PREFIX="api/v1"
else
echo "Cannot detect API prefix. Check base URL / reverse proxy / firewall."
exit 1
fi
echo "Detected prefix: $RAGFLOW_API_PREFIX"
Ops-only examples (no application-level endpoints):
Example 1: system ping (no secrets in output)
curl -sS -o /dev/null -w "%{http_code}\n" "$RAGFLOW_BASE_URL/v1/system/ping"
Example 2: system status (auth)
curl -sS -X GET "$RAGFLOW_BASE_URL/v1/system/status" \
-H "Authorization: Bearer $RAGFLOW_API_KEY" | head
Example 3: fetch openapi schema (liveness)
curl -sS "$RAGFLOW_BASE_URL/openapi.json" | head
Note: If your deployment uses different paths, openapi.json is the source of truth. Avoid calling application-level endpoints from ops runbooks.
8.3 Agent Guidance
- Always fetch
openapi.jsonfirst to confirm real paths/fields/version differences. - If you get 404: suspect base URL (hitting UI port), reverse proxy misconfig, or a different path prefix.
- If you get 401/403: suspect missing/expired key or missing
Bearerprefix.
9) Backup / Restore (Practical)
Principle: stop services first, then back up volumes, then back up compose configs.
Backup (example; volume names depend on your environment):
mkdir -p backup
docker run --rm \
-v <mysql_volume>:/source \
-v "$PWD/backup":/backup \
alpine tar czf /backup/mysql-data.tar.gz -C /source .
Restore:
docker compose down
docker run --rm \
-v <mysql_volume>:/target \
-v "$PWD/backup":/backup \
alpine tar xzf /backup/mysql-data.tar.gz -C /target
docker compose up -d
10) Security Baseline (Minimum)
- Do not use
latestin production; pin image versions. - API keys must be stored in env vars / secret manager only.
- Minimize exposure: allow only internal/Tailscale ranges to access API ports.
- If public access is required: add TLS + auth at reverse proxy, and restrict source IP ranges.
11) Troubleshooting Flow (Agent Playbook)
When a user says "RAGFlow is not working", use this order to reduce back-and-forth:
docker compose ps(which containers are unhealthy/exited)docker compose logs --tail=200 <unhealthy-service>(capture the first actionable errors)- Resources:
docker stats, disk,vm.max_map_count(Linux/WSL2) - Network: from the caller machine
curl $RAGFLOW_BASE_URL/openapi.json - Auth:
ragflow_ping.pyorGET /v1/system/status(with Bearer)
12) OpenClaw Ops Integration
This section documents an end-to-end operations workflow for running RAGFlow with OpenClaw. It is intentionally decoupled from any application-layer usage and focuses only on RAGFlow runtime operations.
12.1 Scope
Included:
- Host/service health verification (liveness/readiness)
- Basic auth verification (API key works)
- Smoke checks (API reachable, key endpoints respond)
- Alerting hooks (what to check, what to report)
- Scheduling (daily/periodic checks)
Excluded (by design):
- Any application-layer conventions
- Content parsing/chunking/index strategy
- Retrieval quality evaluation
12.2 Standard Environment Contract
On the machine running OpenClaw, set:
RAGFLOW_BASE_URL(prefer an internal/Tailscale address)RAGFLOW_API_KEY(Bearer token; never commit; do not paste into chat)
Recommended ops endpoints:
- Liveness (no auth):
GET $RAGFLOW_BASE_URL/openapi.json - Readiness (auth):
GET $RAGFLOW_BASE_URL/v1/system/status
If paths differ in your deployment, use openapi.json as the source of truth.
12.3 Local Workspace Helpers (Preferred)
This skill includes built-in helpers under scripts/.
They are designed to be:
- non-interactive
- safe to run repeatedly
- machine-readable (short output + non-zero exit code on failure)
Helpers:
scripts/ragflow_ping.py- Purpose: liveness + readiness checks.
scripts/ragflow_smoke.py- Purpose: auth verification + minimal API smoke calls.
- Note: it only uses system endpoints. It does not depend on any application-level data.
scripts/ragflow_status.py- Purpose: fetch
/v1/system/statusand print a compact key summary.
- Purpose: fetch
scripts/ragflow_alert.py- Purpose: send an ops alert to Telegram via the
openclaw message sendCLI.
- Purpose: send an ops alert to Telegram via the
(Prefer the skill-local scripts so the runbook works in any environment.)
12.3.1 Script Contracts (Inputs / Outputs / Exit Codes)
scripts/ragflow_ping.py
- Required env:
RAGFLOW_BASE_URL
- Optional env:
RAGFLOW_API_KEY(if set, readiness check is performed)
- Network calls:
GET {base_url}/openapi.json(no auth)GET {base_url}/v1/system/status(Bearer auth)
- Output (examples):
OK_LIVE (no api key set)OK_READY keys=...LIVENESS_FAIL ...READINESS_FAIL ...
- Exit codes:
0OK2liveness failed3readiness failed
scripts/ragflow_smoke.py
- Required env:
RAGFLOW_BASE_URLRAGFLOW_API_KEY
- Network calls:
GET {base_url}/v1/system/status(auth)GET {base_url}/v1/system/ping(auth or no-auth depending on deployment)
- Output (examples):
OK smokeFAIL system/status ...FAIL system/ping ...
- Exit codes:
0OK2system/status failed3system/ping failed
scripts/ragflow_status.py
- Required env:
RAGFLOW_BASE_URLRAGFLOW_API_KEY
- Network calls:
GET {base_url}/v1/system/status
- Output (example):
OK keys=key1,key2,...(compact, no secrets)
- Exit codes:
0OK2HTTP failure3invalid JSON
scripts/ragflow_alert.py
- Purpose: notify humans when ping/smoke fails.
- Inputs:
- CLI flags:
--title(required),--details(optional) - Optional env:
OPENCLAW_PRIMARY_CHAT_ID(default target)
- CLI flags:
- Behavior:
- Sends a Telegram message via
openclaw message send ....
- Sends a Telegram message via
- Notes:
- Do not include secrets in
--details.
- Do not include secrets in
12.4 Ops Workflow (Suggested)
-
Connectivity
- Confirm OpenClaw host can reach
RAGFLOW_BASE_URLover the network.
- Confirm OpenClaw host can reach
-
Liveness
- Check
openapi.jsonresponds with HTTP 200.
- Check
-
Readiness
- Check
v1/system/statusresponds with HTTP 200 when authenticated.
- Check
-
Smoke
- Run the skill-local smoke helper:
scripts/ragflow_smoke.py(system endpoints only).
- Run the skill-local smoke helper:
-
Escalation artifacts
- Collect:
docker compose psdocker compose logs --tail=200 <ragflow-service>- the exact endpoint + HTTP code observed from the OpenClaw side
- Collect:
12.5 Scheduling (Copy/Paste Examples)
Goal: provide copy/paste recipes. An agent can create these tasks when needed.
12.5.1 cron (Linux)
Ping every 10 minutes and alert on failure:
*/10 * * * * RAGFLOW_BASE_URL="http://127.0.0.1:9380" RAGFLOW_API_KEY="${RAGFLOW_API_KEY}" /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_ping.py || /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_alert.py --title "ping failed" --details "ragflow_ping.py exit=$?"
Smoke once per day at 06:05 and alert on failure:
5 6 * * * RAGFLOW_BASE_URL="http://127.0.0.1:9380" RAGFLOW_API_KEY="${RAGFLOW_API_KEY}" /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_smoke.py || /usr/bin/python3 /path/to/skills/ragflow-runbook/scripts/ragflow_alert.py --title "smoke failed" --details "ragflow_smoke.py exit=$?"
Notes:
- Replace
/path/to/skills/...with the real absolute path. - Prefer sourcing secrets from a root-owned env file, or use your secret manager. Avoid putting API keys directly into crontab.
12.5.2 launchd (macOS)
Create two plist files (one for ping, one for smoke) and load them with launchctl.
Ping (every 10 minutes):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>ai.openclaw.ragflow.ping</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/python3</string>
<string>/ABS/PATH/skills/ragflow-runbook/scripts/ragflow_ping.py</string>
</array>
<key>StartInterval</key>
<integer>600</integer>
<key>EnvironmentVariables</key>
<dict>
<key>RAGFLOW_BASE_URL</key>
<string>http://127.0.0.1:9380</string>
<key>RAGFLOW_API_KEY</key>
<string>${RAGFLOW_API_KEY}</string>
</dict>
<key>StandardOutPath</key>
<string>/tmp/ragflow-ping.out</string>
<key>StandardErrorPath</key>
<string>/tmp/ragflow-ping.err</string>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
Smoke (daily at 06:05):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>ai.openclaw.ragflow.smoke</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/python3</string>
<string>/ABS/PATH/skills/ragflow-runbook/scripts/ragflow_smoke.py</string>
</array>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>6</integer>
<key>Minute</key>
<integer>5</integer>
</dict>
<key>EnvironmentVariables</key>
<dict>
<key>RAGFLOW_BASE_URL</key>
<string>http://127.0.0.1:9380</string>
<key>RAGFLOW_API_KEY</key>
<string>${RAGFLOW_API_KEY}</string>
</dict>
<key>StandardOutPath</key>
<string>/tmp/ragflow-smoke.out</string>
<key>StandardErrorPath</key>
<string>/tmp/ragflow-smoke.err</string>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
Notes:
- Replace
/ABS/PATH/...with the real absolute path. - Prefer storing secrets outside the plist and injecting them safely; do not commit plists with secrets.
12.6 Security Notes
- Do not expose
SVR_HTTP_PORTto the public internet. - Prefer allowlisting internal/Tailscale ranges.
- Store
RAGFLOW_API_KEYin env/secret manager only.
References
- RAGFlow Docs: https://ragflow.io/docs/
- GitHub: https://github.com/infiniflow/ragflow