Cut Claude Code Token Costs 60% With CLAUDE.md and .claudeignore

Cut Claude Code Token Costs 60% With CLAUDE.md and .claudeignore

I was sitting in a code review last month, watching a senior dev push a PR, and I noticed something weird. Along with the usual code changes there was this agents.md file sitting in the diff. I thought, okay, what is this, some kind of documentation thing? I almost asked him to explain it but then I figured I'd just watch what happened over the next few sessions.

And what I saw made sense once I understood what was actually going on under the hood.

The problem with any AI coding tool Claude Code, Cursor, whatever is that every new session starts from zero. Claude doesn’t remember what you decided last Tuesday about the architecture. It doesn’t know you’re on Postgres 16 with a specific migration setup, or that you spent an hour two weeks ago discussing and rejecting a particular approach. So what does it do? It re-reads everything. Your codebase, your folder structure, related files as much as it can fit in the context window, every single time. That costs tokens. A lot of them. And once I understood that, the agents.md file made complete sense.

But that file is just one piece. There are actually several things you can do mostly just creating a few extra files and learning a couple of built-in commands that will save you a lot of tokens without losing anything in terms of code quality.

First, Understand What You’re Actually Paying For

Most people think token costs in Claude Code are mainly about what Claude generates the code it writes, the explanation it gives. That’s a normal assumption. It’s also mostly wrong.

The bigger cost is input tokens, and specifically, the way conversations grow. Every message you send in a session re-sends your entire conversation history. So message number 50 in a long session is paying for messages 1 through 49 as input, plus whatever files Claude is reading, plus the system prompt running in the background. This is why a session can feel cheap for the first hour and then suddenly starts burning through your quota in the last twenty minutes.

There’s also something called context rot, which is a separate but related problem. As the context window fills up, the quality of Claude’s responses degrades. It starts forgetting decisions you made earlier in the session. It suggests approaches you already ruled out. It loses track of the overall goal and starts optimising for local problems. So token management is not just a cost issue it’s a quality issue too.

The fixes for all of this fall into two categories: setup things you do once per project, and session habits you develop over time. The setup stuff has the bigger payoff.

CLAUDE.md (or agents.md): The Main One

Anthropic officially calls this CLAUDE.md. Some teams call it agents.md. It's the same idea a markdown file that Claude reads automatically at the start of every session, before it reads anything else.

The point is to give Claude a pre-built mental model of your project so it doesn’t have to build one from scratch by reading dozens of files. Architecture decisions, naming conventions, folder structure, which services talk to which, what patterns you’re using all of it in one place. Claude reads it, knows the shape of the project, and gets straight to work.

I tried this after watching that PR. I wrote maybe 200 lines for a mid-size Node.js service basically a brain dump of everything a new developer would need to understand the project. Things like “we use Zod for all API validation, never do manual type checks,” and “the auth flow goes through the middleware in /src/middleware/auth.ts, don't touch this unless you really have to," and "we're not using any ORM, raw SQL queries through the db helper in /src/lib/db.ts."

The next session, Claude stopped asking redundant questions. It stopped suggesting things we’d already decided against. It was noticeably more focused.

There’s a size issue here though, and this is where most people get it wrong. If your CLAUDE.md is 1,000 lines, every message you send triggers a full reload of all 1,000 lines. Even if you just say “hi.” The file loads on every single turn, not just at session start. So a 5,000-token CLAUDE.md is 5,000 tokens of baseline cost you’re paying constantly, for the entire session. A 5,000-token CLAUDE.md costs 5,000 tokens before you’ve typed a word. Every turn. Every session.

The sweet spot is under 200 lines. Every token in CLAUDE.md is a token you can’t use for conversation. Document only what Claude needs on every session, not everything about your project. Five rules and three file pointers, not a full architecture document.

You can also have multiple CLAUDE.md files in a monorepo one in the root, and one inside each service subfolder. Claude reads the closest one to the file it's working on. So each service gets its own focused context without the root file getting bloated.

One more thing: when establishing new patterns or making architectural decisions mid-session, you can explicitly tell Claude to preserve them “Document this pattern in CLAUDE.md so we maintain consistency.” This turns ephemeral session knowledge into persistent context. I’ve started doing this whenever we settle on something important during a session. Otherwise it’s gone the moment you close the terminal.

.claudeignore: The Boring One That Actually Matters

This one gets less attention but it’s probably the second most impactful thing on this list.

.claudeignore works exactly like .gitignore you list the files and folders you don't want Claude to touch. Without it, Claude might explore your node_modules, your build output folder, your generated code, your test fixtures with large JSON blobs. None of this helps Claude write better code. It just eats tokens.

A basic .claudeignore for a Node project looks something like this:

node_modules/
dist/
build/
.next/
coverage/
*.log
*.lock

I’ve also seen people add their migration history files here. If you have 300 SQL migration files going back three years, Claude doesn’t need to read them to help you write a new one. Add the migrations folder to .claudeignore and just include the schema file instead.

The amount of context this can cut in a large project is actually significant. One team I know was inadvertently including their entire public/ folder (full of images and compiled assets) in Claude's exploration. That alone was adding thousands of tokens per session.

Decisions and Architecture Files

This is less Claude-specific and more just good engineering practice, but it helps Claude a lot keeping a decisions.md or an adr/ folder (architecture decision records). Instead of Claude inferring why the code is structured a certain way by reading through it, you've written it down. "We chose Redis over Memcached in October 2023 because we needed pub/sub for the notification service." Claude doesn't have to guess. It reads it and moves on.

The difference between CLAUDE.md and decisions.md is that CLAUDE.md is for the what (current state of the project) and decisions.md is for the why (history of choices and the reasoning behind them). Both are useful but for different purposes.

Some teams also keep a todo.md or a current-work.md with work-in-progress notes. This sounds like overkill but it's actually saved a bunch of confusion. Without it, Claude sometimes tries to solve problems that are already being handled in another branch, or suggests changes that conflict with something in progress. A simple "auth refactor in progress — don't modify /src/auth/ for now" prevents that whole situation.

Session Habits That Save Tokens While You Work

The files handle the setup side. But you also burn a lot of tokens through session habits how you work inside Claude Code, not just how you configure it.

Plan mode before anything big. You can toggle this with Shift+Tab in Claude Code. In plan mode, Claude outputs a step-by-step plan without making any changes. You review the plan, cut anything unnecessary, then switch back to normal mode. This eliminates the biggest source of token waste: trial-and-error execution, where Claude tries things, hits errors, and iterates with each iteration costing tokens. I’ve started using this as a default before any task that involves more than two or three files. If Claude is about to read six files when I only need it to touch two, I’d rather catch that in plan mode than pay for six file reads.

Be specific with file paths. The more vague the task, the more likely Claude is to spend tokens opening several files, exploring dead ends, and reconstructing context you could have handed it directly. “Fix the auth bug” will cause Claude to explore. “Fix the session expiry check in src/auth/session.ts around line 45" will not. Same outcome, fraction of the tokens.

Use /compact before the session falls apart. This is a built-in command that compresses your conversation history into a structured summary, then starts a fresh session with that summary loaded. The key thing most people miss: run /compact at 60% context capacity, not 95%. By the time you’re nearly out of context, you’ve already paid a premium for an overstuffed conversation. You can also tell Claude what to preserve when you run it: /compact preserve the list of modified files and the auth refactor decision we made earlier. Anything you don't specify might get dropped. And you can put standing compaction instructions in your CLAUDE.md under a # Compact instructions heading so you don't have to type them every time.

Use /clear between unrelated tasks. Every message from previous tasks is still in context. It’s dead weight. Claude is processing information about feature A when all you need is help with bug B. A fresh session for a fresh task is almost always better than carrying old context that isn’t relevant anymore.

Use /btw for quick questions. For quick questions that don’t need to stay in context, use /btw. The answer appears in a dismissible overlay and never enters conversation history, so you can check a detail without growing context. I didn’t know this existed for a while and was using regular messages for things like “what’s the syntax for X again” — completely unnecessary context pollution.

Subagents: Delegate the Heavy Reading

This one is a bit more advanced but it’s genuinely useful once you understand the idea.

When Claude needs to research something read through a bunch of files, check how the auth system works, look at how tests are structured all of that reading happens in your main session’s context. It eats into the window that you need for actual implementation work.

Subagents run in separate context windows and report back summaries. You can delegate research with “use subagents to investigate X” they explore in a separate context, keeping your main conversation clean for implementation.

So instead of asking “how does our authentication handle token refresh?” and having Claude read ten files in your main session, you ask it to spin up a subagent to investigate that and bring back a summary. The subagent does the heavy reading in its own context. You get a clean summary back in your main session. The exploration tokens never touch your main window.

You can also create permanent subagents by creating an MD file at .claude/agents/investigator.md. After saving, you can trigger the workflow with /investigator "auth tests are failing" — Claude will run the investigation as a subagent automatically. This is useful for tasks you do repeatedly like debugging failing tests or checking for breaking changes where you want a consistent investigation workflow without typing it out every time.

The catch with subagents is that they don’t share context with each other or with your main session. So if you need the subagent’s findings to feed into another subagent’s work, you have to pass the information explicitly. I ran into this when I tried to chain two subagents the second one had no idea what the first one had found. A bit annoying, but it’s still better than running everything in one session.

The Bigger Problem With Messy Context

There’s something I didn’t fully appreciate until I started paying attention to this properly. Token costs and code quality are actually connected in a way that’s not obvious at first.

Every new message you send in Claude Code re-sends your entire conversation history as input tokens. That 200-message session you’ve been running for two hours? Message 201 costs as much in input tokens as messages 1 through 200 combined. This is why your credits seem fine for the first hour and then evaporate in the last fifteen minutes.

But the quality problem is separate. As the context window fills up, there’s a “lost in the middle” effect where the model pays most attention to the beginning and end of the session and loses track of things in the middle. So all those decisions you made at message 80 and message 120? Those are the ones most likely to get forgotten or contradicted by the time you’re at message 180.

The CLAUDE.md file and the session habits together address this. The file makes sure the foundational context survives across sessions. The habits — /compact, /clear, focused prompts — make sure individual sessions don’t balloon into a context mess.

I’ve seen reported savings of 40% to 70% on focused tasks from combining these approaches, though honestly I haven’t tracked my own numbers carefully enough to give a precise figure. What I can say is that the sessions feel different. Less drift. Less re-explanation. Less of that frustrating thing where Claude undoes something it did an hour ago because it forgot why it did it.

Putting It Together

If you’re just starting, this is the order I’d do things:

Start with .claudeignore. Takes five minutes, cuts noise immediately. Add your node_modules, build folders, generated files, anything Claude doesn't need to read to do its job.

Then write a CLAUDE.md. Keep it under 200 lines. Focus on things that aren’t obvious from reading the code your conventions, the non-obvious architecture decisions, the things Claude gets wrong if it doesn’t know them. If you find yourself writing something that’s just restating what’s already in the code, delete it.

Then start using /compact proactively, not reactively. Run it when you feel a session getting long, not when it hits the limit.

The subagents and the session habits come after that. They have real value but the setup files give you the biggest return for the least effort.

Writing agents.md forces something useful on you as well, separate from the Claude benefit. You have to actually articulate your architecture in clear language. I've sat down to write these files and realised I couldn't explain in plain terms why something was structured the way it was. That's usually a sign the structure itself is unclear, not just the documentation. So the file ends up being useful even when Claude isn't reading it.

The senior dev who introduced this to our team has since added it to the onboarding docs. New developers now read the CLAUDE.md files before they read the actual code. That wasn't the original plan. 

But it turned out to be a pretty good side effect of doing the AI stuff right.

Post a Comment

Previous Post Next Post