Anthropic Claude Mythos AI Model Leak: Cybersecurity Risks Explained

Anthropic has spent years building a reputation as the AI lab that takes safety seriously. More seriously, they’ll remind you, than anyone else. So there is something genuinely funny — in a grimace-and-laugh kind of way — about the fact that their most powerful model yet, one that Anthropic itself warns poses “unprecedented cybersecurity risks,” was revealed to the world through a basic configuration mistake.

This is the story of Claude Mythos: a model that may reshape the AI landscape, a data leak that no self-respecting cybersecurity firm would brag about, and a market that reacted with the kind of panic you’d expect if someone shouted “fire” in a very crowded, very well-funded room.

The irony is hard to miss. A company whose leaked draft blog post warned that its new model is currently far ahead of any other AI model in cyber capabilities inadvertently stored that very warning — along with nearly 3,000 other unpublished assets — in a publicly searchable data store. No password required. No authentication needed. Just open to anyone who knew where to look. You genuinely cannot make this up.

How the Leak Happened

The mechanics were embarrassingly mundane. Anthropic uses an off-the-shelf content management system to publish its blog. By default, that system sets uploaded assets to public and assigns them a reachable URL unless a user actively changes the setting. Someone didn’t change the setting. Close to 3,000 assets — including images, PDFs, audio files, and draft blog posts — became publicly accessible, sitting in an unsecured and publicly searchable data store that anyone with the right query could browse.

Roy Paz, a senior AI security researcher at LayerX Security, and Alexandre Pauwels, a cybersecurity researcher at the University of Cambridge, independently discovered the exposed material. Fortune reviewed the documents and notified Anthropic on Thursday, March 26, 2026. By that evening, Anthropic had restricted public access to the data store and begun assessing how long the material had been exposed and what exactly was out there.

The company attributed the situation to “human error” in its CMS configuration, describing the exposed drafts as “early drafts of content considered for publication.” Not a hack. Not a sophisticated supply-chain attack. A misconfigured content management system — the kind of thing that gets caught in a routine cloud security posture review. The irony that the exposure involved a company whose core business is AI safety, and whose leaked document specifically warned about unprecedented cybersecurity risk, will not be lost on anyone who has spent time in the security industry.

The drafts themselves, though, were worth reading carefully.

What Mythos Actually Is

The leaked document described a model internally called Claude Mythos, operating under the planned product name “Capybara.” Both names appear to refer to the same underlying system — Anthropic used Mythos as the internal codename and Capybara as the external product label. By the time Fortune published its reporting, both were circulating widely enough that the distinction barely mattered.

Here’s what matters structurally. Right now, Anthropic sells its models in three tiers: Haiku at the bottom (smallest, fastest, cheapest), Sonnet in the middle, and Opus at the top (largest, most capable, most expensive). Those tiers have been stable for a while, and the market has built expectations around them. Capybara would add a fourth tier above all three — bigger than Opus, more expensive to run, and by the company’s own accounting, dramatically more capable. The draft confirmed the model is not yet ready for broad general release, which alone tells you something about both the computational demands and the risk profile Anthropic is managing.

As for what it can actually do, the leaked draft stated that Capybara scores “dramatically higher” than Claude Opus 4.6 — the current top model — on tests of software coding, academic reasoning, and cybersecurity tasks. Anthropic confirmed this in a statement to Fortune, describing the model as “the most capable we’ve built to date” and using the phrase “step change” to characterize the performance gap. That language was clearly chosen deliberately. Step change carries more weight than “significant improvement” and less hysteria than “breakthrough” or “revolutionary.”

It’s worth being a little skeptical of that framing, though. OpenAI’s GPT-5, released in August 2025, fell noticeably short of the expectations the company had spent months carefully cultivating. The language labs use to describe their own models before release is almost always aspirational. That said, Anthropic’s decision to describe this model as carrying “unprecedented” cybersecurity risk in a draft document they presumably didn’t intend to publish suggests the capability claims aren’t purely marketing. When a company worries in private about what its own model can do, that’s a different kind of signal than a press release.

The name “Capybara,” for what it’s worth, fits an established Anthropic pattern of using animals as product identifiers — Haiku, Sonnet, and Opus are all poetic forms, but Claude itself was apparently named partly as a nod to Claude Shannon, the mathematician who founded information theory. Capybara is the world’s largest rodent, which is either a deliberate joke or a complete coincidence. Probably a joke.

The Cybersecurity Warning Hidden in the Draft

This is where the story gets genuinely complicated. Sit with it for a moment.

The leaked blog post wasn’t just a product announcement. It included a frank, detailed warning from Anthropic about what this model can do in the hands of bad actors — and the company didn’t pull its punches. The draft described Capybara as a system that “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.” It acknowledged that the model poses risks “even beyond what we learn in our own testing.”

That’s a striking thing to say about your own product before releasing it. Most companies, even ones that genuinely worry about dual-use risk, keep that language internal or couch it in careful hedges for public consumption. The fact that Anthropic was drafting this kind of candid warning for a blog post — something intended for public consumption — suggests the internal assessment of what Capybara can do is serious enough that they felt obligated to say it plainly.

The draft also stated that the model is “currently far ahead of any other AI model in cyber capabilities.” That’s a comparative claim, which means Anthropic’s internal evaluations include benchmarking against OpenAI, Google, and others. If that assessment holds, it suggests the capability gap at the frontier isn’t narrowing the way some observers assumed — it’s widening, at least in the cybersecurity domain, at least for now.

OpenAI crossed a related threshold in February 2026, when GPT-5.3-Codex became the first model the company classified as “high capability” for cybersecurity-related tasks under its Preparedness Framework — also the first OpenAI had directly trained to surface software vulnerabilities in production code. Anthropic’s Opus 4.6, released around the same time, demonstrated an ability to find previously unknown vulnerabilities in real codebases. Both companies acknowledged the dual-use character of that capability openly: the same model that helps a defender locate a vulnerability in their code can help an attacker find the same vulnerability and exploit it faster.

What makes Mythos different, based on the leaked materials, is the scale of that gap. Previous models could assist with cybersecurity tasks. Capybara, by Anthropic’s own internal account, is so capable in this domain that it changes the threat environment rather than simply participating in it. That’s a different claim entirely.

The draft’s proposed rollout strategy reflects that concern directly. Rather than a standard general release or even a staged enterprise rollout, Anthropic described releasing Capybara in early access specifically to organizations working on the defensive side — giving them the ability to harden their codebases before the model becomes more broadly available. That framing is telling. It suggests Anthropic believes the window between a model’s release and its exploitation by malicious actors has compressed to the point where getting defenders armed first is an operational necessity, not a nice-to-have.

This isn’t theoretical, either. In late 2025, Anthropic discovered that a Chinese state-sponsored group had been running a coordinated campaign using Claude Code — a currently available tool — to infiltrate roughly 30 organizations, including tech companies, financial institutions, and government agencies, before Anthropic detected the pattern. The company spent ten days investigating the full scope of the operation, banned the accounts involved, and notified affected organizations. That attack happened with models Anthropic considered relatively safe for general release. Mythos, by the company’s own account, is a different animal entirely.

Why Cybersecurity Stocks Dropped So Hard

Markets react to the story they think they’re hearing, and the story investors heard after Fortune published its reporting was not subtle: Anthropic has a new AI model that dramatically outperforms everything in cybersecurity, the company itself is warning it will outpace defenders, and it hasn’t even been released yet.

The sell-off was fast, wide, and not particularly discriminating. CrowdStrike fell roughly 7%. Palo Alto Networks dropped about 6%. Zscaler was down around 4.5%. SentinelOne tumbled more than 6%. Tenable, which focuses on vulnerability management and exposure management, dropped around 9% — arguably the most logical target given that a model which can autonomously find vulnerabilities competes most directly with what Tenable sells. The iShares Cybersecurity ETF lost 4.5% on the day. Okta and Netskope each fell more than 7%.

Raymond James analyst Adam Tindle was among the first to articulate the thesis driving the sell-off: compression of traditional defensive advantages, increased attack complexity that favors well-resourced attackers over stretched defenders, and a potential reallocation of enterprise security spending toward AI-native tools and away from incumbents. That last point is the one that should concern cybersecurity company boards most. If enterprises start routing their security budgets toward AI platforms that can both identify threats and generate defensive code, the traditional security software stack starts looking thinner.

The underlying fear isn’t entirely new, but it’s intensifying. For years, cybersecurity vendors have warned that attackers would eventually get access to AI tools that make phishing more convincing, malware more adaptive, and reconnaissance faster. That’s been true for a while. What’s different about the Mythos moment is the suggestion that AI models are now capable enough to change the economics of the attacker-defender balance — not just making attacks marginally easier, but potentially making certain classes of defense structurally harder. That’s a different kind of threat to a company like CrowdStrike’s business model.

To be fair, the sell-off also reflects some market overreaction. CrowdStrike and Palo Alto are not going away because Anthropic built a more capable model. Regulated industries, government contractors, and enterprises with existing vendor relationships don’t pivot overnight. The security market has absorbed disruptive technologies before and restructured around them. But “this will be fine eventually” is cold comfort for investors watching 7% evaporate in a single session.

This isn’t the first time Anthropic triggered this kind of cascade. In February 2026, the launch of Claude Cowork — a workplace automation product targeting contract review and compliance workflows — triggered a broad sell-off across software and professional services companies, erasing roughly $285 billion in combined market value as investors reassessed the long-term implications of AI agents displacing established enterprise software categories. The pattern is consistent enough now that it’s worth naming: Anthropic announces something, markets reprice an entire adjacent sector, and then analysts spend two weeks arguing about whether the reaction was proportionate.

What a “Deliberate” Rollout Actually Means

One detail buried in the leaked draft deserves more attention than it’s received. The rollout strategy isn’t cautious because the model isn’t technically ready. It’s cautious because Anthropic has said — in plain language in a document they didn’t intend to publish — that they haven’t fully mapped the risks yet.

That’s an unusual admission. Most AI labs are deliberate about their release cadence for commercial reasons: staged rollouts reduce infrastructure strain, build user feedback loops, and generate press. Anthropic’s framing here was different. The draft described releasing Capybara to a small group of early-access organizations specifically to help those organizations prepare for what the model’s broad release would enable — a kind of safety-informed pre-arming of defenders. That’s not a typical product launch strategy. It looks more like a risk management framework with a product launch attached.

How long that caution holds under competitive pressure is genuinely uncertain. Google DeepMind is not standing still. OpenAI has its own frontier models in development and has shown a willingness to ship aggressively when it feels the competitive need. The history of AI lab safety commitments and competitive timing is not encouraging. Labs that announce careful rollout strategies have a tendency to accelerate those timelines when a rival ships something comparable. Whether Anthropic maintains its stated pace — or quietly compresses it when Google or OpenAI announces their own next-generation model — is something worth watching closely.

There’s also a commercial reality that the draft dances around but never quite addresses head-on. Capybara is described as expensive to run — expensive enough that it won’t be priced for general access, at least initially. That positions it as an enterprise-only offering, which changes the distribution math considerably. Enterprise rollouts move slowly, involve procurement cycles and security reviews, and typically reach a relatively small number of organizations before broad availability follows. If that’s the model, the “defenders first” framing makes more practical sense: the early-access window isn’t just a safety measure, it’s also just the normal speed at which large enterprise deals close. The question of whether Anthropic is being genuinely cautious or commercially realistic — or both — is probably not answerable from the outside.

The Operational Embarrassment Nobody Is Talking About Enough

The data leak itself, setting the model’s capabilities completely aside, is worth its own moment of reflection.

Anthropic’s content management system set uploaded files to public by default. Nearly 3,000 assets — ranging from draft blog posts and internal PDFs to images and audio files — sat in an unsecured, publicly searchable location. The company found out not because its own security team caught it, but because two independent researchers discovered the exposed data and a journalist called Anthropic to ask about it. After being informed, the company removed public access to the data store and attributed the situation to “human error.”

That’s a painful sequence for a company in Anthropic’s position. The technical failure itself is unremarkable — CMS misconfiguration is one of the most common cloud security gaps, and plenty of organizations with sophisticated engineering teams have made equivalent mistakes. What makes it uncomfortable for Anthropic specifically is the context. This is a company whose entire brand positioning rests on the argument that it thinks harder about safety and risk than anyone else in the industry. The leaked document was itself a warning about unprecedented cybersecurity risk. The exposure was discovered by external researchers, not internal monitoring. And the material stayed publicly accessible long enough that multiple independent parties were able to find and review it before Anthropic knew it was out there.

None of that is disqualifying. Companies fix configuration errors. Anthropic acted quickly once notified. But the gap between the company’s public posture and this particular operational stumble is real, and in an industry where trust is the actual product, that gap matters.

The other detail worth noting from the leaked files: alongside the Mythos draft, there was a PDF describing an upcoming invite-only retreat for European CEOs, to be held at an 18th-century English countryside manor with Dario Amodei in attendance. Anthropic confirmed the event is part of an ongoing series. It’s a small thing. But it does complete a certain portrait: a company developing a model it’s worried about releasing, warning about cybersecurity risks it hasn’t fully mapped, and simultaneously hosting high-end diplomatic gatherings for European business leaders at a country house hotel — all of it accidentally published to the open internet via a settings checkbox that nobody ticked.

Where Things Actually Stand

Claude Mythos may well be what Anthropic says it is. A genuine step forward in reasoning and coding. A model with cybersecurity capabilities serious enough to require a different kind of release strategy. Something that changes the math on what AI can and can’t do in the security domain.

What’s harder to assess from the outside is whether the frameworks Anthropic is building to manage that risk are actually keeping pace with what the models can do. The company has released responsible scaling policies, has published research on evaluating dangerous capabilities, and has been more transparent than most of its peers about the risks it’s identified internally. Those are real things. They’re also, almost by definition, incomplete — because the point of the leaked draft’s warning is that Anthropic itself acknowledges it doesn’t fully understand what Capybara can do in adversarial conditions.

The honest version of where we are isn’t catastrophic and it isn’t comforting either. Frontier AI models have crossed into territory where their capabilities in cybersecurity — both offensive and defensive — are genuinely significant. Markets are pricing that reality even before products are released. Defenders are scrambling to get access to the same tools that attackers will use. And the companies building these models are telling us, in draft blog posts they didn’t mean to publish, that they’re moving carefully because they have to, not because they’re confident they’ve got it figured out.

That’s a harder message to sit with than either the doom narrative or the hype. But it’s the accurate one. The problem is hard. It’s getting harder. And the people working on it — at Anthropic, at the labs they compete with, and at the security firms watching their stock prices fall — are doing so in real time, without a safety net, and occasionally forgetting to check their CMS settings.

That last part will be fine. The rest of it is going to take a while.