CVE-2026-31431 Copy Fail Linux Root Vulnerability Explained

CVE-2026-31431 Copy Fail Linux Root Vulnerability Explained

 A 732-byte Python script. That’s all it takes to go from a regular unprivileged user to full root on pretty much every major Linux distribution running on servers right now — Ubuntu, Amazon Linux, RHEL, SUSE. No race condition needed. No kernel version checking. No compiled payload. Just a short Python script using only the standard library, and you are root.

The vulnerability is called Copy Fail (CVE-2026–31431, CVSS 7.8). It was sitting in the Linux kernel since 2017. Researchers at security firm Theori and their AI scanning tool Xint Code disclosed it publicly on April 29, 2026, and patches started landing within hours across major distros.

So basically, from the moment you first read a headline about this to right now, the bug has been in your Linux servers. And the scary part is not that someone found it — it’s how long it was just sitting there, ignored, in a part of the kernel that most people never look at.

What the Bug Actually Does

Linux has something called the page cache. When the kernel loads a file — any file, including executables — it keeps a copy of that file’s contents in memory. This is the page cache. Every time a program reads a file or runs a binary, the kernel is reading from this in-memory copy, not going to disk each time.

Copy Fail lets an unprivileged user write 4 controlled bytes anywhere into the page cache of any file they can read. That sounds limited at first. Four bytes? So what?

Here is the thing — /usr/bin/su is a file every user can read. And it is a setuid binary, which means when you run it, it runs as root. So if you can write 4 bytes of shellcode into the right spot in its page cache entry, and then execute it, you get a root shell. The kernel loads the binary from the page cache. The corrupted in-memory version runs as root.

And then there is the stealthy part: the kernel never marks the changed page as “dirty.” The file on disk is completely unchanged. If someone runs a checksum tool on /usr/bin/su, it looks totally fine. The corruption exists only in memory. Standard file integrity monitoring tools completely miss it.

The page cache is also shared across all processes on a system — and across container boundaries. So this is not just a local privilege escalation. Theori says Part 2 of their writeup will cover the Kubernetes container escape angle. That part is still coming, and it is probably going to be bad for cloud infrastructure.

How the Bug Works Technically

The root cause is a logic error that lives at the intersection of three different kernel subsystems, none of which had a problem on its own.

AF_ALG is a socket type that exposes the kernel’s cryptographic operations to unprivileged userspace processes. You open an AF_ALG socket, bind it to a crypto algorithm, send data, and get encrypted or decrypted output. No root needed, anyone can use it.

splice() is a system call that moves data between file descriptors without copying. If you splice a file into a pipe and then into an AF_ALG socket, the socket’s input list holds direct references to that file’s page cache pages — not a copy of them. The same physical memory pages.

authencesn is a cryptographic wrapper inside the kernel used for IPsec with 64-bit Extended Sequence Numbers. Since 2011, this wrapper has been using the caller’s destination buffer as scratch space during a decryption operation. It writes 4 bytes at a position just past the output boundary — dst[assoclen + cryptlen] — to temporarily rearrange some bytes during HMAC computation. It never restores the original content at that offset.

Now here is what happened in 2017. A developer added an optimization to algif_aead.c that made AEAD operations run "in-place" — meaning the input and output scatterlists were the same object. To do this, it copied the AAD and ciphertext from the input buffer into the output buffer, but chained the authentication tag pages by reference using sg_chain(). So the tag region of the output scatterlist was still pointing directly at the page cache pages.

When authencesn runs its scratch write on dst[assoclen + cryptlen], it now walks right into those chained page cache pages and writes into them.

The HMAC fails because the ciphertext was fake. The kernel returns an error from recvmsg(). But the 4-byte write already happened. The page cache is already corrupted.

Each individual change — authencesn’s scratch write in 2011, AEAD support in AF_ALG in 2015, the in-place optimization in 2017 — was reasonable in isolation. Nobody ever connected all three to see what happened when you combined them. The bug was silently exploitable for nearly a decade.

What Makes This Worse Than Other Linux LPE Bugs

Linux has seen bad local privilege escalation bugs before. Dirty Cow (CVE-2016–5195) and Dirty Pipe (CVE-2022–0847) were both serious. But Copy Fail has a combination of properties that those didn’t have at the same time.

Dirty Cow required winning a race condition — you had to time your attack carefully and sometimes it crashed the kernel if you lost the race. Dirty Pipe was version-specific; it only worked on kernels from version 5.8 onward. Copy Fail requires neither timing nor a specific kernel version. It is a straight-line logic bug. You run the script, it works. The exploit tested cleanly on kernels 6.12, 6.17, and 6.18 across four major distros.

Xint.io’s statement to The Hacker News put it well: the vulnerability has four properties that almost never appear together. Portable — the same script works on every tested distribution. Tiny — the entire exploit is under a kilobyte of Python. Stealthy — file integrity tools miss it because the on-disk file is unchanged. And cross-container — the page cache is shared at the host level, so it can escape container isolation.

David Brumley at Bugcrowd explained the technical mechanism this way: the 2017 in-place optimization in algif_aead allows a page cache page to end up in the kernel’s writable destination scatterlist for an AEAD operation submitted over an AF_ALG socket. An unprivileged process can then drive splice() into that socket and complete a small, targeted write into the page cache of a file it does not own.

That last bit — “a file it does not own” — is the whole problem. A normal user should never be able to write to /usr/bin/su. The page cache is supposed to be read-only from the user's perspective. Copy Fail punches through that.

The Limitations — and Why Some People Are Less Worried

It is not remotely exploitable on its own. Copy Fail is a local privilege escalation, which means the attacker needs a shell on the machine first. They need to already be logged in as a low-privilege user before this does anything.

For a typical desktop Linux system, this matters less than it sounds. If someone is already logged into your personal laptop, you have bigger problems. Red Hat’s initial response was actually to defer the fix — they rated the local requirement as reducing urgency enough to postpone patching. They reversed that position later and aligned with the other distros after the public reaction, but their initial read was: local-only bugs, even reliable ones, are not our top priority.

The concern goes up a lot for multi-tenant systems. Shared Linux servers where multiple users log in, CI/CD runners that execute code from pull requests, VPS environments, Kubernetes nodes — anywhere untrusted code runs under a low-privilege account is genuinely at risk. Cloud environments where thousands of tenants share the same kernel are the worst case here.

And the container escape angle, which Theori has promised to document in Part 2, is probably the part that will matter most to enterprise security teams. That writeup was not published as of April 30. Nobody knows the full scope of that yet.

The Fix and What to Do Right Now

The patch reverts algif_aead.c to out-of-place operation, removing the 2017 in-place optimization. The commit message from Linus Torvalds' tree reads: "There is no benefit in operating in-place in algif_aead since the source and destination come from different mappings."

Debian, Ubuntu, SUSE, Amazon Linux, and Red Hat all have patches out or in progress as of April 30, 2026. The fix is in the upstream kernel as commit a664bf3d603d.

For immediate mitigation without a kernel update, the researchers recommend blocking AF_ALG socket creation via seccomp, or disabling the vulnerable module entirely:

echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif-aead.conf
rmmod algif_aead 2>/dev/null

This breaks IPsec with Extended Sequence Numbers, which most systems don’t actually use. For most servers that is a fine trade until the kernel update is applied.

How It Was Found

The disclosure came from Theori researcher Taeyang Lee, who had done earlier work on AF_ALG attack surface as part of kernelCTF research. He had a hypothesis — that splice() delivering page cache pages into the crypto subsystem might be an underexplored source of bugs — and used Xint Code, an AI-assisted security scanning tool, to run the idea across the entire Linux crypto subsystem.

The scan took about an hour. Copy Fail was the highest-severity result. The team also found other vulnerabilities, at least one more privilege escalation bug, which is still in responsible disclosure and not yet public.

This is also part of a wider trend. Dustin Childs, who runs threat awareness for Trend Micro’s Zero Day Initiative, wrote earlier this month that the number of vulnerability reports has surged recently, and the most likely explanation is that security teams are now using AI tools to find bugs faster. The Internet Bug Bounty program actually suspended new awards temporarily in April 2026 because they couldn’t keep up with the volume of incoming reports from AI-assisted research.

So Copy Fail is probably not the last one. Taeyang Lee’s team explicitly said the crypto subsystem scan found multiple high-severity issues. The others are coming.

Should You Panic

No. But you should patch.

For homelabs and personal servers — update your kernel when the distro pushes the fix, and you’re done. For production servers, especially anything running CI/CD or hosting multiple users, the mitigation command above is worth doing today while you wait for the update to roll out through your deployment pipeline.

The real problem is not this specific bug. It is the nine years it spent undetected. There is no reason to think this is the only logic error sitting quietly at the intersection of three subsystems that nobody thought to look at together. The Linux kernel is enormous — millions of lines, decades of accumulated changes, and a crypto subsystem that most kernel developers rarely touch.

Copy Fail is the bug that got found. The question nobody has answered yet is how many Copy Fails are still waiting.

Post a Comment

Previous Post Next Post