Claude Code Codebase Rewrite Review 2026

So the project had been sitting in a messy state for a while. A Node.js backend, around 40-something files, some old authentication logic copy-pasted across six different routes, tests that barely covered anything. The kind of codebase where you know where the problems are, you just keep finding reasons not to fix them. The idea was simple enough: let Claude Code handle a proper rewrite of the auth system and a few other parts that had been collecting dust. See what happens.

This is not a success story with a perfect ending. It’s more like a field report from the middle of something that’s still changing very fast.

What Claude Code Actually Does When You Hand It a Real Codebase

The first thing that surprises people is how it starts. Claude Code doesn’t ask for a brief or a summary. You type a prompt, and it starts reading — not just the file you mentioned, but related files, imports, the stuff three directories away that you forgot was connected. In January 2026, Anthropic published data from roughly 400,000 Claude Code sessions showing that users now average 20 hours per week on the tool. That number felt unbelievable until about day three of actually using it, when you realize sessions run longer than you expect because you’re not switching contexts every ten minutes.

The rewrite started with authentication. The prompt was something like: “Read the auth middleware and every route that calls it. Tell me what would break if we changed the interface.” That part — the reading before any changes — is not optional. And actually, it took longer than expected just for Claude to map the dependencies. That’s fine. The alternative is it guessing.

But here’s the part nobody really talks about: what Claude Code does depends enormously on what period you’re asking it to do things. Between January and March 2026, Anthropic made a few internal changes that quietly broke a lot of developer workflows. A researcher analyzed 6,852 session files and found that the model’s read-to-edit ratio dropped from 21.8 reads per edit in late January to 1.6 by mid-March. Basically, it stopped doing its homework before touching files. The number of files read before each edit fell from 6.6 to 2.0. That’s not a small regression — that’s a model that went from careful to careless.

So timing matters. What Claude Code can do for you in June 2026 is not what it could do in February 2026, and probably not what it’ll be able to do in September. That’s the reality of building on something that’s still being actively worked on.

The Rewrite: What Went Right

The auth refactor went reasonably well. Claude read the six files that had copy-pasted middleware, identified the pattern, proposed a plan before writing anything, and the output was clean enough that it only needed minor adjustments. More specifically, it replaced 340-odd lines scattered across routes with a single middleware file and updated all the imports automatically. That part was genuinely fast. Doing it by hand would have taken most of an afternoon just to find all the import chains.

What Claude Code is good at — actually good at, not just in demos — is this kind of repetitive structural work. Renaming functions across 20 files, extracting validation logic into a separate module, updating tests to match a changed interface. Work where the pattern is clear and the main challenge is just doing it consistently across a lot of files. A developer who revisited Claude Code after a year of not using it noted that when they asked it to refactor authentication, it looked at existing middleware patterns first and matched the project style. Generic templates would have clashed with the codebase.

The plan mode helped with the bigger changes. You invoke /plan and instead of immediately editing, it explores the codebase using the Agent tool, designs the architecture, and produces a structured plan — covering data structures, function signatures, and the order of changes. One real working session described on Medium involved rewriting a reporting engine, which resulted in 817 insertions and 260 deletions across four files after a Claude-designed plan was approved and executed. The plan step is where you catch the bad ideas before they're already in the code.

So for isolated refactoring work with clear scope, Claude Code earns its place.

The Rewrite: What Went Wrong

The context problem is real and it will hit you if you’re doing anything that spans more than 90 minutes of session time. A fresh session starts consuming roughly 20,000 tokens just for the system prompt, tool definitions, and CLAUDE.md before you type a single thing. Quality starts degrading somewhere between 20 and 40 percent of the 200,000-token context window. The attention mechanism gives earlier instructions less weight as the window fills up.

One developer lost three hours of refactoring work when the auto-compaction feature fired mid-session and erased all knowledge of decisions made earlier. Compaction retains only about 20 to 30 percent of details. You come back to your terminal and the model has basically forgotten what approach you agreed on.

The multi-session situation is even messier. If you run four Claude Code sessions against the same codebase — which is something developers do, because parallel agents are a real feature — each session compacts independently. You end up with four diverging summaries of what happened, and they contradict each other in ways that don’t surface until something breaks downstream.

There was also a period in early 2026 where the full-file write pattern became much more common. Instead of making surgical edits — change this line, update this import — the model started rewriting entire files. Faster for the model, worse for code review. You’re staring at a diff that shows 400 lines changed when you asked for one function to be moved. The February 2026 thinking redaction change (technically called redact-thinking-2026-02-12) is what caused this. Intermediate thinking steps got hidden to reduce latency, and without those thinking steps, the model stopped checking surrounding context before editing.

The March 2026 source code leak added to the general chaos. A packaging error in the npm release accidentally bundled a 59.7 MB source map, which exposed 1,884 TypeScript files and 64,464 lines of Claude Code’s internal code. Community analysis found 250,000 failed API calls per day buried in internal telemetry. Not great for confidence in the tool’s reliability, though the fix for the transcript chain breaks landed in version 2.1.91 by early April.

The CLAUDE.md Problem Nobody Warned About

The CLAUDE.md file is the thing that makes Claude Code feel like it actually knows your project. You run /init, Claude analyzes the codebase, generates a starter file with build commands, test frameworks, and code patterns, and from that point on every session starts with Claude reading that file first.

Here’s the thing though: CLAUDE.md has limits that the documentation doesn’t shout about. Frontier models reliably follow around 150 to 200 instructions total. Claude Code’s own system prompt already consumes roughly 50 of them before you add anything. So your actual budget for CLAUDE.md rules is smaller than it looks. Go past a couple hundred lines and you get context rot — the rules that matter quietly stop being applied.

The project’s CLAUDE.md grew to about 280 lines over a week of iteration. Started noticing that naming conventions were being ignored again. Variable names like buf and cnt started reappearing despite explicit rules saying not to use abbreviations. The model wasn't broken — it had just stopped reading the bottom third of the file with full attention. Cutting the file back to under 150 lines and moving detailed rules into separate files under .claude/rules/ fixed most of it.

The other thing CLAUDE.md can’t fix: if your context window is getting full, rules from early in the session get pushed out. The anchor that’s supposed to keep Claude consistent across a long session gradually loses grip. Some developers have moved to externalizing checkpoint logic entirely — writing structured checkpoint files on session stop, then injecting only the relevant delta at the start of the next session, not the full accumulated history.

What Anthropic Says vs. What Actually Happens

The official guidance is that you should describe outcomes in plain language, Claude handles the rest, and developers can focus on architecture and product thinking. At Anthropic internally, the majority of code is now written by Claude Code, with engineers focused on orchestration and direction rather than line-by-line implementation.

That’s probably true for Anthropic’s internal setup, where the infrastructure around Claude Code is tightly controlled and the engineers are expert Claude Code users. The experience is different when you’re a developer who started using the tool three weeks ago, doesn’t have a CLAUDE.md yet, and just typed “refactor the auth system.”

Anthropic’s own internal testing found that unguided attempts succeed about 33 percent of the time. The gap between that and the higher success rates experienced users report comes down to one thing: structure built before execution. The difference isn’t prompt quality. It’s the CLAUDE.md, the rule files, the plan step, the commit discipline — all of it combined.

So the marketing framing is not exactly wrong, it’s just missing context. You can hand Claude Code a project and describe what you want. But the results will be substantially different depending on whether you’ve spent thirty minutes setting up the scaffolding first.

What the Numbers Actually Show

The Anthropic research report from April 2026 covers data from roughly 235,000 people across 400,000 sessions between October 2025 and April 2026. A few findings that don’t get mentioned enough:

The share of sessions spent fixing broken code fell from 33 percent to 19 percent over those seven months. The share of sessions operating software grew from 14 to 21 percent. Writing and data analysis roughly doubled, from 10 to 20 percent. The estimated economic value of the average session rose 27 percent over that same period.

But also: the more expertise a developer brings, the more work Claude does per instruction. The gap between novice sessions and intermediate sessions is large. The gap between intermediate and expert is smaller, but still meaningful. What this means practically is that non-developers using Claude Code to build things from scratch — which is something Anthropic actively markets — are working with the tool at its hardest difficulty setting.

For the rewrite project, success rate on individual tasks was high when the tasks were specific. “Read these three files and extract the validation logic into a new file called validators.ts with these exact exports” — that worked almost every time. “Clean up the auth system” — that’s where things got inconsistent.

Things That Were Not Expected Going In

The context window marketing is misleading. Claude Opus 4.6 advertises a 1 million token context window. In practice, based on a detailed bug report from a developer doing heavy Claude Code sessions, the model’s performance degraded well before hitting 50 percent of that window. At 20 percent usage: circular reasoning and forgotten decisions. At 40 percent: context compression kicked in. At 48 percent: the model itself recommended starting a fresh session. If the effective high-quality context is somewhere around 400,000 tokens, advertising 1 million is a gap worth understanding before you plan a major project around it.

The read-before-edit discipline is not automatic. Early sessions with no CLAUDE.md instruction produced a pattern where the tool jumped to editing without reading dependencies first. One community guide puts it clearly: “When a developer types ‘refactor this file’ without constraints, Claude Code can rewrite the target file, update imports, and rename exports in a single action before the developer has reviewed the dependency graph.” The manual refactoring process builds a mental model first. Without explicit instruction, Claude Code skips that step.

And the parallel agents feature, while real, requires more coordination infrastructure than it initially appears. Four independent agents working on the same codebase, each with their own compacting context, can produce changes that are internally consistent but mutually incompatible. The patterns that seem to work require an external coordination layer that owns the session lifecycle rather than letting each session manage its own state.

Where It Actually Makes Sense

For the project that started this whole thing: the auth rewrite got done, the tests got updated, the repeated middleware got consolidated. Total time was probably a day and a half including all the back-and-forth and the session restart after context got too large. Doing it manually would have taken three or four days and probably would have missed a few import chains.

The sweet spot for Claude Code is work where the pattern is clear, the scope is bounded, and you have a CLAUDE.md that sets the rules upfront. The gap between what it can do in those conditions and what it can do without them is large enough that it’s almost like two different tools.

The part that still doesn’t work reliably is anything involving deeply coupled logic where a change in one place has ripple effects the model can’t fully trace before acting. Security logic, anything with subtle side effects, anything where wrong is worse than slow. Those still need a human who understands the codebase at the architectural level — and probably always will.

The developer who built a full RTS game using Claude Code exclusively put it this way: he eventually scrapped it not because the code was bad, but because he had “cognitively offloaded all his work to AI and therefore lost complete touch with the underlying code.” He couldn’t fix it himself when something broke. That tradeoff is worth thinking about before you start any project that Claude Code will be doing most of the work on.

So What Now

Claude Code as of June 2026 is a real tool with real limitations that are mostly documented somewhere on GitHub issues threads that you have to go looking for. The March regression issues are mostly patched. The context management situation has improved with compaction updates. The CLAUDE.md practices have gotten more mature as the community has had longer to develop them.

The experience of doing a meaningful project rewrite with it is neither as smooth as the product page suggests nor as broken as the frustrated GitHub issues make it sound. It’s a tool that rewards preparation, punishes vague prompts, and needs a developer who understands the code well enough to review what it produces — which is maybe exactly how it should work.

The permission system still asks before modifying files or running commands. Decisions about what code ships still stay with the developer. That part hasn’t changed.