Caveman Token Optimizer
Skill by ara.so — Daily 2026 Skills collection.
A Claude Code skill and Codex plugin that makes AI agents respond in compressed caveman-speak — cutting ~65% of output tokens on average (up to 87%) while keeping full technical accuracy. No pleasantries. No filler. Just answer.
What It Does
Caveman mode strips:
- Pleasantries: "Sure, I'd be happy to help!" → gone
- Hedging: "It might be worth considering" → gone
- Articles (a, an, the) → gone
- Verbose transitions → gone
Caveman keeps:
- All code blocks (written normally)
- Technical terms (exact:
useMemo,polymorphism,middleware) - Error messages (quoted exactly)
- Git commits and PR descriptions (normal)
Same fix. 75% less word. Brain still big.
Install
Claude Code (npx)
npx skills add JuliusBrussee/caveman
Claude Code (plugin system)
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman
Codex
- Clone the repo
- Open Codex inside the repo
- Run
/plugins - Search
Caveman - Install plugin
Install once. Works in all sessions after that.
Manual / Local
git clone https://github.com/JuliusBrussee/caveman.git
cd caveman
pip install -e .
Usage — Trigger Commands
Claude Code
/caveman # enable default (full) caveman mode
/caveman lite # professional brevity, grammar intact
/caveman full # default — drop articles, use fragments
/caveman ultra # maximum compression, telegraphic
Codex
$caveman
$caveman lite
$caveman full
$caveman ultra
Natural language triggers
Any of these phrases activate caveman mode:
- "talk like caveman"
- "caveman mode"
- "less tokens please"
- "be concise"
Disable
/caveman off
# or say: "stop caveman" / "normal mode"
Level sticks until changed or session ends.
Intensity Levels
| Level | Trigger | Style | Example |
|---|---|---|---|
| Lite | /caveman lite | Drop filler, keep grammar | "Component re-renders because inline object prop creates new reference each cycle. Wrap in useMemo." |
| Full | /caveman full | Drop articles, use fragments | "New object ref each render. Inline prop = new ref = re-render. Wrap in useMemo." |
| Ultra | /caveman ultra | Telegraphic, abbreviate everything | "Inline obj prop → new ref → re-render. useMemo." |
Benchmark Results
Real token counts from Claude API (reproducible via benchmarks/ directory):
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| React re-render bug | 1180 | 159 | 87% |
| Auth middleware fix | 704 | 121 | 83% |
| PostgreSQL pool setup | 2347 | 380 | 84% |
| Git rebase vs merge | 702 | 292 | 58% |
| Async/await refactor | 387 | 301 | 22% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Average | 1214 | 294 | 65% |
Important: Caveman only affects output tokens. Thinking/reasoning tokens are untouched. Caveman make mouth smaller, not brain.
Reproducing Benchmarks
git clone https://github.com/JuliusBrussee/caveman.git
cd caveman/benchmarks
# Set your Anthropic API key
export ANTHROPIC_API_KEY=your_key_here
# Run benchmark suite
python run_benchmarks.py
# Compare normal vs caveman responses
python compare.py --task react-rerender
python compare.py --task auth-middleware
python compare.py --all
Code Examples — What Caveman Mode Changes
Before (normal, 69 tokens)
The reason your React component is re-rendering is likely because
you're creating a new object reference on each render cycle. When
you pass an inline object as a prop, React's shallow comparison
sees it as a different object every time, which triggers a
re-render. I'd recommend using useMemo to memoize the object.
After (caveman full, 19 tokens)
New object ref each render. Inline object prop = new ref = re-render.
Wrap in `useMemo`.
Code blocks stay normal — caveman not stupid
# Caveman explains in grunt, but code stays clean:
# "Token expiry check broken. Fix:"
def verify_token(token: str) -> bool:
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
# Was: payload["exp"] < time.time()
# Fix:
return payload["exp"] >= time.time()
What Caveman Preserves vs. Removes
# Tokens caveman REMOVES (waste):
filler_phrases = [
"I'd be happy to help you with that", # 8 tokens gone
"The reason this is happening is because", # 7 tokens gone
"I would recommend that you consider", # 7 tokens gone
"Sure, let me take a look at that", # 8 tokens gone
"Great question!", # 2 tokens gone
"Certainly!", # 1 token gone
]
# Things caveman KEEPS (substance):
preserved = [
"code blocks", # always normal
"technical_terms", # exact spelling preserved
"error_messages", # quoted verbatim
"variable_names", # exact
"git_commits", # normal prose
"pr_descriptions", # normal prose
]
Integration Pattern — Using in a Project
If you want caveman-style compression in your own Claude API calls:
import anthropic
client = anthropic.Anthropic() # uses ANTHROPIC_API_KEY env var
# Load the caveman SKILL.md as a system prompt addition
with open("path/to/caveman/SKILL.md", "r") as f:
caveman_skill = f.read()
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=f"{caveman_skill}\n\nRespond in caveman mode: full intensity.",
messages=[
{"role": "user", "content": "Why is my React component re-rendering?"}
]
)
print(response.content[0].text)
# → "New object ref each render. Inline prop = new ref = re-render. useMemo fix."
print(f"Tokens used: {response.usage.output_tokens}") # ~19 vs ~69
Session Workflow
# Start session with caveman
/caveman full
# Ask technical questions normally — agent responds in caveman
> Why does my Docker build take so long?
→ "Layer cache miss. COPY before RUN npm install. Fix order:"
[code block shown normally]
# Switch intensity mid-session
/caveman lite
# Turn off for PR description writing
/caveman off
> Write a PR description for this auth fix
→ [normal, professional prose]
# Back to caveman
/caveman
Troubleshooting
Caveman mode not activating:
# Verify plugin installed
claude plugin list | grep caveman
# Reinstall
claude plugin remove caveman
claude plugin install caveman@caveman
Savings lower than expected:
- Caveman only compresses output tokens — input tokens unchanged
- Tasks with heavy code output (like Docker setup) see less savings since code is preserved verbatim
- Reasoning/thinking tokens not affected — savings show in visible response only
- Ultra mode gets maximum compression; switch if full mode feels verbose
Need normal mode for specific output:
/caveman off # for PR descriptions, user-facing docs, formal reports
/caveman # re-enable after
Benchmarking your own tasks:
cd benchmarks/
export ANTHROPIC_API_KEY=your_key_here
python run_benchmarks.py --task "your custom task description"
Why It Works
Backed by a March 2026 paper "Brevity Constraints Reverse Performance Hierarchies in Language Models": constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose not always better.
TOKENS SAVED ████████ 65% avg (up to 87%)
TECHNICAL ACCURACY ████████ 100%
RESPONSE SPEED ████████ faster (less to generate)
READABILITY ████████ better (no wall of text)
Key Files
caveman/
├── SKILL.md # the skill definition loaded by Claude Code
├── benchmarks/
│ ├── run_benchmarks.py # reproduce token count results
│ └── compare.py # side-by-side comparison tool
├── plugin.json # Codex plugin manifest
└── README.md
Links
- Repo: https://github.com/JuliusBrussee/caveman
- Homepage: https://juliusbrussee.github.io/caveman/
- Also by Julius: Blueprint — spec-driven dev for Claude Code
One rock. That it. 🪨