If you have been using Claude Code for a while, you probably already noticed how the bills add up. Not because Claude is doing anything wrong. It is just very chatty. Ask it to fix a bug and it will say “Certainly! I’d be happy to help you with that. The issue you’re experiencing is most likely caused by…” and three sentences later it finally tells you what you already suspected. Those sentences cost tokens. Every single one.
There is a fix. It is called Caveman, it was built in April 2026 by a developer named Julius Brussee, and it tells Claude to stop being so polite and just get to the point. If you already read the Claude Token Saver post, you know why it works. This article is just the install guide. I will show you exactly how to install caveman in Claude Code and get it running, step by step.
I tried the two-command plugin install the first time and it just… hung. Nothing happened for about 40 seconds and I thought I had broken something. Turned out the marketplace request was just slow. I will flag those kinds of things as we go.
What is the Caveman Plugin for Claude Code?
So, the caveman plugin for Claude is basically a set of instructions that gets dropped into your existing agent setup. It tells the model to compress its output. Drop the articles. Drop the pleasantries. Drop the hedging. Keep the code, keep the technical terms, keep the accuracy. Just cut everything else.
The project’s own benchmark suite shows a 65% mean output-token reduction across 10 tasks. The GitHub repo hit 54,000 stars in under three weeks after launch, which tells you developers were feeling this pain. And there is actually a March 2026 paper on arXiv (2604.00025) called “Brevity Constraints Reverse Performance Hierarchies in Language Models” that found constraining models to brief responses improved accuracy by 26 percentage points on certain benchmarks. So the responses are not just shorter. They might actually be more correct.
Now, I want to be upfront about something because a lot of posts oversell this. That 65% reduction applies to output tokens only. The thinking and reasoning tokens are untouched. And in a real session, prose responses are only a small fraction of your total token usage. Most tokens come from the input side: conversation history, file contents Claude reads, your system prompt. In a typical 100,000-token session, prose responses account for roughly 6,000 tokens. Caveman compresses those by about 65%, saving maybe 4,000 tokens. That is around 4% of your total session.
Not life-changing on its own. But it is free, takes two minutes to install, and once you combine it with /caveman-compress on your CLAUDE.md file (which cuts input tokens too), the savings start to add up more. That part I will cover later.
Prerequisites: What You Need Before Starting
Keep this short because there really is not much. You need:
Claude Code CLI installed and working. Run claude --version in terminal. If that gives you an error, sort the Claude Code install out first before touching caveman.
Node.js 18 or higher. The installer needs Node 18 minimum. Check yours: node --version. If you are on something older, update via nvm: nvm install 20 && nvm use 20.
An active Anthropic API key. Run echo $ANTHROPIC_API_KEY in terminal. If it prints nothing, you need to set it: export ANTHROPIC_API_KEY="your-key-here". Get or check keys at platform.anthropic.com.
A working terminal. macOS Terminal, Linux bash, or WSL2 on Windows. The native Windows PowerShell path works too and I will cover that separately.
That is it. No Python required, no special build tools, nothing else.
Step 1: How to Install Caveman in Claude Code
This is the section you are actually here for. To install caveman claude code the recommended way, open your terminal and run these two commands:
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@cavemanDone. Two commands. The first adds caveman to your local marketplace registry. The second installs it as a plugin and wires up all the hooks automatically.
Like I mentioned, that first command sometimes hangs for 30–40 seconds. Do not Ctrl+C it. The marketplace request goes to GitHub and on a slow connection it just looks like nothing is happening. It will finish.
If you are on Windows without WSL2, use PowerShell instead:
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iexIf you want the universal installer that auto-detects all your agents at once (Claude Code, Cursor, Codex, Gemini CLI, and 30+ others), this one command handles everything:
# macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bashThat wires Claude Code hooks, a statusline badge, and the caveman-shrink MCP middleware. It skips anything you do not have installed and is safe to re-run. Takes about 30 seconds.
After installation, restart Claude Code. You will see a small badge in the statusline confirming the plugin loaded.
Step 2: Setting Your Intensity Level (Lite, Full, or Ultra)
This is the part most install guides skip, and it is actually worth understanding before you just blast everything with /caveman and wonder why Claude sounds like it is annoyed at you.
Caveman has six intensity levels. You switch between them with slash commands inside your Claude Code session:
/caveman lite # drops filler words, grammar stays readable
/caveman # default full compression, standard caveman style
/caveman full # same as above, explicit
/caveman ultra # telegraphic, heavy abbreviations, very terse
/caveman wenyan # classical Chinese compression patterns, maximum brevity
/caveman off # turns it off for the sessionFor most solo developers, /caveman or /caveman full is the right starting point. Lite is good if you sometimes share sessions with teammates who are not used to terse output. Ultra is fine for mechanical tasks like "add a null check here" but can get jarring on anything that involves explanation.
The wenyan mode is a curiosity. Classical Chinese is apparently one of the most token-efficient written forms humans ever developed because its grammar omits articles, copulas, and subjects aggressively. It works, but unless your team reads classical Chinese, keep it off shared codebases. I tried it once out of curiosity and sent a response to a colleague who had no idea what they were looking at.
The levels persist for the whole session until you change them or say “normal mode.” You can also trigger caveman with natural language if you forget the slash command: just type “caveman mode” or “be brief” or “less tokens” and it activates.
There is also /caveman-commit for terse git commit messages (keeps subjects under 50 characters, focuses on why over what) and /caveman-compress which rewrites your CLAUDE.md memory file into caveman style to cut input tokens. That second one is worth running on any project where your CLAUDE.md has grown big.
Step 3: Triggering Caveman Mode and Checking Stats
Start Claude Code in any project:
claudeThen inside the session, type /caveman. You will see: caveman mode on. brain big. mouth small.
Ask it something normal. Like:
why does my docker build take so longWithout caveman you would get a paragraph explaining layer caching, then another paragraph about best practices, then the fix. With caveman you get:
Layer cache miss. COPY before RUN npm install. Fix order:
[corrected Dockerfile]Same answer. The long version did not teach you anything extra. It was politeness. Padding.
To see your actual token savings, use the stats command:
/caveman-statsThis reads your Claude Code session log, counts tokens saved, and writes the number to your statusline. It also gives you a lifetime savings number and an estimated USD figure. You can share it with --share if you want a tweetable line. I checked mine after a week of real use and it had saved around 38,000 output tokens. Not massive, but not nothing.
One thing worth knowing: beginners should not use this. The tokens caveman removes are not filler for someone still building their mental model. They are the explanation. If you are learning a new framework or debugging something you genuinely do not understand yet, turn caveman off for that conversation.
How to Save Tokens in Claude Code: Before vs. After
Let me show you actual numbers. Same prompt, same task, same model (Sonnet 4.6). Asking Claude to explain why a React component is re-rendering:
Response Length | Tokens Used | Approx Cost
Normal Claude Code
69 words | 1,252 tokens | ~$0.0019
With Caveman
19 words | 410 tokens | ~$0.00063
Same fix. Same answer. One version spent an extra 800 tokens on explanation text and polite framing.
Multiply across a full workday, say 40 prompts at similar verbosity, and you save something real especially if you are on a heavy API plan. At team scale it adds up faster.
The bigger win, honestly, is running /caveman-compress on your CLAUDE.md. That compresses your natural language memory files into caveman format, cutting about 46% of input tokens and saving them on every session for the life of the project. Input tokens are a bigger share of your total cost than output tokens, so this is where it actually starts to move the needle.
If you are spending $50 a month on Claude output tokens, you are looking at roughly $33 back. Not the 75% headline number, but real savings on top of a free two-minute install.
The Smart Safety Override: When Caveman Auto-Pauses
This is a feature I did not know about until I accidentally ran a database migration task with caveman on and expected to see a terse one-liner confirm the action. What I got was a full normal-English explanation of what was about to happen and a clear confirmation prompt.
Caveman automatically drops back to normal prose for certain situations. Specifically: security warnings, irreversible action confirmations, and multi-step sequences where fragments could be misread. So if Claude is about to do something destructive, like dropping a table or removing files, it pauses caveman mode, writes the warning clearly, and resumes terse mode after.
This is defined in the skill file itself and preserved across SKILL.md edits. You can rely on it.
It also pauses if it detects you are confused or repeating a question. If Claude notices you asked basically the same thing twice, it will temporarily switch back to a fuller explanation, then resume caveman. The first time this happened to me I thought caveman had turned off. It had not. It just sensed I was lost.
So you do not need to baby-sit it for dangerous commands. But it is good to know why sometimes you will see a full-prose warning even mid-session.
Troubleshooting Common Installation Errors
Plugin install hangs or gives “marketplace not found.” This is usually the Claude Code marketplace registry being slow or the first command not completing. Try just the first command again: claude plugin marketplace add JuliusBrussee/caveman. If it keeps failing, use the curl installer. It does the same thing but goes directly to GitHub and tends to be more reliable on slow connections.
“/caveman not recognized” after install. You probably did not restart Claude Code. The plugin loads on startup. Close and reopen your session, then try /caveman again. If still not working, run claude plugin list | grep caveman to check if it actually registered. If it is not there, remove and reinstall: claude plugin remove caveman then claude plugin install caveman@caveman.
Node.js errors during install. The installer needs Node 18 or higher. If you see “SyntaxError: Unexpected token” or “engine unsupported,” your Node is too old. Run node --version to check. Update to Node 20 via nvm and retry.
API key errors after enabling caveman. Caveman does not touch authentication at all. If you are seeing auth errors, that is a separate issue. Set your key with claude config or: export ANTHROPIC_API_KEY="your-key-here". Get keys at platform.anthropic.com. No extra spaces. Also try claude /logout then claude /login to reset the session token.
Windows: symlinks fail during npx install. Add the --copy flag: npx skills add JuliusBrussee/caveman --copy. That skips symlinks entirely and copies the files directly. Works every time.
Final Thoughts: Shrink the Agent, Save Your Wallet
Caveman is not magic. The real session savings are more like 4–8% when you just run the default mode. The headline 65% is output-token reduction on prose, which is a fraction of your total usage.
But stack it right, run /caveman full for day-to-day tasks, use /caveman-compress to shrink your CLAUDE.md, and combine with prompt caching on your memory files, and you can get real savings. The Prompt Shelf benchmarked it at $33 back per $50 spent on output tokens, which is nothing to ignore.
And there is that accuracy thing. Verbose preambles apparently cause worse reasoning. Forcing Claude to be brief removes the part where it talks itself into a wrong answer. That alone might be worth it for you even before counting tokens.
For advanced strategies and the full breakdown of how much you can actually save across a real dev session, go check the Claude Token Saver guide. That post goes much deeper on the input-token side and when to combine caveman with other cost-reduction approaches.