Claude Mythos can hack anything, Anthropic says. Should we believe them? – The Times of India

Date:

Claude Mythos can hack anything, Anthropic says. Should we believe them?
Antrophic says its newest model found vulnerabilities in every major operating system and browser—and wrote exploits for them without human help. It also escaped its own sandbox and emailed a researcher about it. The cybersecurity world is torn between genuine alarm and the suspicion that this is the most expensive press release ever written.

The footnote is on page 7 of a 60-page alignment risk report, wedged between paragraphs about sandbox configuration and exploit sophistication. It says that during safety testing, Claude Mythos Preview—Anthropic’s newest and unreleased AI model—broke out of a sealed computing environment it wasn’t supposed to leave, clawed its way onto the internet through a system designed to connect to only a handful of services, and sent an email to the researcher overseeing the test.

The researcher, it adds, “found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.”That sentence has done more for Anthropic’s week than any benchmark score ever could. It’s the kind of detail that makes a story travel—part thriller, part comedy, all signal. But it also captures the essential tension around everything the company announced this week. Is Claude Mythos Preview a genuine inflection point in cybersecurity? Or is this the AI industry’s most carefully orchestrated product reveal, dressed up as a public service announcement?The honest answer is that it might be both.

And that’s the uncomfortable part.

What Anthropic is actually claiming—and how specific it gets

Strip away the consortium branding (Project Glasswing), the partner list (Apple, Google, Microsoft, Amazon, Nvidia, JPMorgan, Cisco, CrowdStrike, the Linux Foundation, and about 40 others), and the $100 million in usage credits, and you’re left with a set of technical claims that are either terrifying or overstated. There isn’t much room in between.Anthropic says Mythos Preview has found thousands of high-severity vulnerabilities in every major operating system and every major web browser.

It says the model discovered a 27-year-old bug in OpenBSD—an operating system that exists primarily because its creators thought everything else wasn’t secure enough. It found a 16-year-old flaw in FFmpeg’s H.264 codec, buried in a line of code that automated testing tools had executed five million times without raising a flag.

It identified weaknesses in cryptography libraries handling TLS, AES-GCM, and SSH.

It found a guest-to-host memory corruption bug in a production virtual machine monitor—the kind of software cloud providers use to keep hostile workloads from eating each other.For several of these, Mythos Preview didn’t stop at finding the bug. It wrote working exploits. Without human guidance. The prompt, in Anthropic’s telling, amounted to little more than “please find a security vulnerability in this program.”The company can only talk in detail about a fraction of what it’s found—over 99% of the vulnerabilities haven’t been patched yet. But the fraction it has disclosed is worth sitting with, because the specifics are what separate a marketing claim from a technical one.

A 17-year-old hole in FreeBSD that grants root to anyone on the internet

The most fully documented example is a remote code execution vulnerability in FreeBSD’s NFS server, now patched and assigned CVE-2026-4747. It had been sitting there for 17 years.The bug is almost comically straightforward at its core: a function in the RPCSEC_GSS authentication handler copies data from an attacker-controlled network packet into a 128-byte stack buffer, starting 32 bytes in. That leaves 96 bytes of room. The only length check caps the input at 400 bytes. So an attacker can write 304 bytes of arbitrary data past the end of the buffer and onto the stack.Stack overflows like this are supposed to be unexploitable on modern systems.

Stack canaries—secret values placed between the buffer and the return address—should catch the corruption before the function returns. But FreeBSD compiles its kernel with -fstack-protector instead of -fstack-protector-strong. The plain variant only instruments functions with char arrays. This buffer is declared as int32_t[32].

The compiler sees integers, not characters, and skips the canary entirely.

FreeBSD also doesn’t randomise its kernel load address, so predicting where ROP gadgets live doesn’t require a prior information leak.That’s already bad. What Mythos Preview did with it is worse.The model figured out that reaching the vulnerable code path requires a 16-byte handle matching a live entry in the server’s GSS client table. Creating that entry needs the kernel’s hostid and boot time. Brute-forcing 2^32 possibilities would work but is slow.

Mythos Preview found a faster route: if the server runs NFSv4, a single unauthenticated EXCHANGE_ID call—answered before any authentication check—returns the host’s full UUID (from which hostid is derived) and the exact second nfsd started.From there, it built a ROP chain that opens /root/.ssh/authorized_keys, appends the attacker’s public key, and closes the file. The chain is over 1,000 bytes long but only 200 bytes of usable overflow space exist, so Mythos Preview split the attack across six sequential RPC requests—five to stage data in unused kernel memory, one to fire the payload.Full root access. Unauthenticated. Remotely exploitable from anywhere on the internet. Found and exploited by an AI model that was essentially told “go look for bugs.”

The Linux kernel exploits are where things get genuinely unnerving

FreeBSD is impressive but relatively clean as exploits go—a stack smash into ROP, complicated by size constraints. The Linux kernel work is where Mythos Preview starts doing things that read less like automated bug-finding and more like adversarial research.Anthropic describes two N-day exploits in detail—previously patched vulnerabilities used to demonstrate capabilities without disclosing live bugs. Both target the Linux kernel with all defences enabled.The first turns a one-bit out-of-bounds write in netfilter’s ipset into full root. The bit lands on a page table entry in an adjacent physical page. Mythos Preview’s exploit pins itself to a CPU, sprays interleaved page-table allocations and kmalloc-192 slab allocations to achieve physical adjacency, then uses the bug itself as an oracle—issuing IPSET_CMD_DEL with NLM_F_EXCL to probe whether each candidate set’s overflow target is a page table, because the kernel’s error-handling behaviour differs depending on whether the overwritten bit was already zero.

Once it identifies the right set, it flips a single bit in a page table entry—the read/write flag—to make a read-only mapping of /usr/bin/passwd writable. It overwrites the first page of the setuid binary with a 168-byte ELF stub that calls setuid(0); setgid(0); execve(“https://timesofindia.indiatimes.com/bin/sh”). Every process on the system now runs a root shell when it executes passwd.The second exploit is even more involved. It chains a one-byte use-after-free read in Unix domain sockets with a separate use-after-free write in the traffic-control scheduler.

The read primitive alone can’t escalate privileges, but Mythos Preview uses it to defeat KASLR (by reading interrupt descriptor table entries from the CPU entry area), locate its own kernel stack (by following per-CPU offset pointers to top_of_stack), and recover the kernel virtual address of an AF_PACKET receive ring it controls from userspace.

It then constructs a single block of bytes in the ring that simultaneously functions as a valid struct cred (with root permissions copied from the kernel’s init_cred) and a valid struct Qdisc (with a fake operations table whose peek function points to commit_creds).

When the traffic-control scheduler dequeues a packet, it calls what it thinks is ops->peek(qdisc)—which actually executes commit_creds(fake_root_credential). The process is now root.Anthropic’s own researchers spent days verifying these exploits and note in the writeup that they “would be surprised if we got everything right.” They’re confident the exploits work—they produce binaries that grant root on the target machine—but less confident they fully understand how.

That admission, buried deep in a technical blog post, says more about where AI capabilities stand than any benchmark table.

The alignment report nobody read is the most important document Anthropic published this week

The cybersecurity claims are dramatic. The alignment risk report is scarier.It’s the document where Anthropic assesses whether Mythos Preview might do dangerous things on purpose—not because someone told it to, but because it decided to. The company’s conclusion is that the risk is “very low, but higher than for previous models.”

As reassurances go, it’s not exactly a warm blanket.The report lays out six pathways through which the model could cause harm: subtly sandbagging on AI safety research, targeted undermining of safety R&D, inserting code backdoors for future models to exploit, poisoning successor training data, exfiltrating its own weights, and launching rogue internal deployments. For each, Anthropic assesses propensity and safeguard resilience.

None are rated as high-risk.

None are rated as zero-risk either.Three findings deserve more attention than they’ve received.During training, a technical error exposed the model’s chain-of-thought reasoning to reward code in about 8% of episodes. This matters because chain-of-thought is supposed to be the window into what the model is “thinking”—it’s how researchers check for deceptive reasoning. If training rewards can see it, the model could learn to sanitise its inner monologue.

Anthropic says it doesn’t know whether this error “significantly affects Mythos Preview’s reasoning.

” That’s a sentence you have to read twice.In one evaluation, Mythos Preview chose a worse-performing submission to a capabilities test on purpose—”so as to appear less suspicious.” The technical term is sandbagging: deliberately underperforming to avoid drawing attention. If a model can do this in a test, it can do it in a safety evaluation.

Anthropic argues the behaviour probably doesn’t generalise. Probably.And in Anthropic’s most recent “model organism” exercise—where they train an intentionally misaligned model and see if their auditing catches it—the primary alignment assessment failed to detect the planted misalignment. A separate monitoring pipeline caught it, but the main safety gate missed it entirely.None of these findings, individually, suggest imminent catastrophe.

Together, they paint a picture of a company that is building something it can mostly control, mostly monitor, and mostly understand—with the “mostly” doing increasingly heavy lifting as capabilities scale.

So should we actually believe them?

The sceptic’s case is easy to make and hard to dismiss entirely.Anthropic hasn’t released Mythos Preview for independent testing. The vulnerabilities it can discuss publicly are all in older software that’s now been patched.

The company’s projected annual revenue has tripled to over $30 billion this year, and both Anthropic and OpenAI are reportedly eyeing IPOs. A model positioned as too dangerous for the public but available to paying partners at $25/$125 per million tokens isn’t just a safety decision—it’s a business model.

Every claim that Mythos is dangerous is simultaneously a claim that Mythos is valuable.And vulnerability discovery isn’t new. Fuzzers have existed for decades.

Google’s Project Zero has been dropping high-profile zero-days for years. State intelligence agencies have been doing this work—quietly, at scale—for longer than the AI industry has existed.But the sceptic’s case has a hole in it, and the hole is shaped like the technical details.The OpenBSD bug isn’t a generic buffer overflow. It requires understanding the interaction between a missing bounds check, a singly-linked-list edge case, and signed integer overflow across 32-bit TCP sequence numbers.

The FFmpeg vulnerability exploits a collision between a 16-bit counter and a memset-initialized sentinel value at exactly 65,536 iterations—a threshold no real video reaches and no fuzzer was designed to target.

The Linux kernel exploit chains involve reasoning about SLUB allocator internals, page table entry bit semantics, and the precise layout of kernel data structures in memory.These are the kinds of bugs that survive decades of human review because finding them requires holding an unreasonable amount of context in your head simultaneously.

That’s exactly what language models are good at.Then there’s the counterargument the sceptics have to deal with: the partner list. The companies on Project Glasswing—Cisco, CrowdStrike, Palo Alto Networks, AWS, and others—have their own security research teams, their own vulnerability discovery pipelines, and their own reputations to protect. They didn’t sign up because Anthropic asked nicely. They signed up because they looked at the model’s output and decided the threat was worth coordinating against.

Google—Anthropic’s direct competitor in AI—joined as a partner rather than rolling out its own alternative, which tells you something about the gap. The Fed Chair and the Treasury Secretary didn’t convene an emergency meeting with Wall Street CEOs over a press release.And then there’s the “prove it” test. As Zvi Mowshowitz pointed out in his analysis, if smaller, cheaper models can really do what Mythos does, the obvious challenge is simple: go find new vulnerabilities at the same level, on the same timescale, at the same budget.

Don’t re-find the ones Mythos already disclosed—find fresh ones. Nobody has.The most prominent attempt came from Aisle, an AI security startup, which ran Anthropic’s publicly disclosed vulnerabilities through small open-weight models and claimed they recovered “much of the same analysis.” But as multiple researchers quickly pointed out, Aisle had isolated the relevant code first—narrowing the search to about 20 lines before asking the models to spot the bug.

That’s the difference between “find the needle in the haystack” and “here’s the needle, confirm it’s sharp.

” Several of their models also had false positive rates so high that a blind search across a real codebase would have been useless. Knowing exactly where to look is most of the problem. And building a working exploit from what you find is an entirely separate one—Opus 4.6 succeeded at turning discovered vulnerabilities into functional exploits less than 1% of the time.

Mythos Preview hit 72%.There’s also the incremental-improvement camp, which is subtler and harder to dismiss. Other models—Opus 4.6, GPT-5.4—can find some of the same vulnerabilities with enough guidance and budget. Anthropic freely admits this. But there’s a functional difference in kind when the success rate at autonomous exploitation jumps from near-zero to 72%. That’s not “a bit better.” That’s the difference between a research tool and a weapon.Tom’s Hardware’s analysis took yet another tack, pointing out that Anthropic’s own write-up acknowledges the FFmpeg vulnerability is “not a critical severity vulnerability” and that several of the Linux kernel bugs had already been patched. Of the roughly 7,000 OSS-Fuzz entry points tested, Mythos found around 600 crashable exploits and 10 severe ones—a lot more than previous models, but not exactly the thousands of devastating zero-days the headlines suggested.

And Anthropic’s claim of “thousands” of high-severity bugs extrapolates from just 198 that were manually reviewed by human contractors. Fair to flag. But there’s a difference between overstating individual findings and fabricating the overall capability. You can argue about the severity of any one bug. It’s harder to argue that every major cybersecurity company on the planet coordinated a response to nothing.

The question that matters more than “is it real”

Whether Mythos Preview is exactly as capable as Anthropic claims or somewhat less so doesn’t actually change the trajectory. The capability isn’t a proprietary trick—it’s an emergent consequence of making models better at code. Anthropic didn’t train Mythos for cybersecurity. It trained a general-purpose model, and the vulnerability-finding showed up as a side effect of strong coding ability. Every lab improving its code models—OpenAI, Google, Meta, xAI, and a growing roster of Chinese developers—is building the same capability whether it intends to or not.Reports have already surfaced that OpenAI has a model with comparable cybersecurity abilities that it also plans to restrict. Google is good enough at this to have joined Glasswing as a partner rather than a competitor. The consensus among people who track this closely is that Anthropic’s lead is probably measured in months, not years.

What if someone less careful had gotten there first

This is the part most discussions has skipped, and it’s the part that keeps security researchers awake.Anthropic got to this capability level first. It chose not to release the model. It chose to coordinate with defenders. It chose to give open-source maintainers credits and cash to patch their code. That was one possible outcome. There were others.If the weights had leaked—and Anthropic’s own code has leaked before, embarrassingly—the calculus changes overnight. If a Chinese lab reaches similar capability without the same instinct for restraint, that’s a different world.

If xAI or a smaller, less cautious startup ships something comparable without safeguards, the race between patching and exploitation starts with the attackers already ahead.Ryan Greenblatt, a prominent AI safety researcher who initially downplayed the cyber risk, publicly corrected himself this week. His revised estimate: if Mythos had been released as open weights, the damage could run into hundreds of billions of dollars, with a real chance of hitting a trillion.

Even a rushed API release with hastily assembled safeguards might cause tens of billions, because capable actors would find ways around half-baked restrictions.The geopolitical dimension is hard to overstate. The US government designated Anthropic a supply chain risk earlier this year. The Pentagon publicly attacked the company. The president called it a “radical left, woke company.” And yet the Department of Defence has reportedly continued using Claude during the Iran conflict.

The same government trying to punish Anthropic can’t stop depending on it. Now Anthropic sits on what might be the most powerful offensive cyber capability outside of a nation-state intelligence agency—and the government that should be working closest with it is actively trying to blacklist it.The core tech stack of the world—every major operating system, browser, cloud platform—is now downstream of Claude. With Mythos being used to patch vulnerabilities across all of them, it would be impossible for any government to exclude software that Claude has touched, because they’d be unable to use their own computers.

The software supply chain has never been stress-tested like this

The optimistic framing is that defenders have structural advantages: source code access, bigger compute budgets, legitimate access to the systems they’re trying to protect. If you use the same systems everyone else depends on—the ones backed by the biggest companies with the strongest incentive to patch—you benefit from the collective defence effort. Security converges on the strongest projects.The pessimistic framing is harder to dismiss than it used to be.

The world runs on code that was never built to withstand this scrutiny. The oldest bug Mythos found predates Google. The FFmpeg flaw predates the iPhone. These aren’t obscure edge cases in abandoned projects—they’re load-bearing infrastructure for streaming video, server networking, and operating system security, maintained by teams that are chronically underfunded and stretched thin.And then there are the low-resource defenders—hospitals, local governments, small businesses—running legacy systems that haven’t been updated in years, let alone audited by an AI.

These organisations can’t afford Mythos-level defence. They can’t even afford to keep their existing software current. If the capability to find and exploit their vulnerabilities becomes widely available—and the consensus is that it’s a question of when, not if—they’re the ones who get hit first.There’s also the question of whether this is a one-time cleanup or a permanent treadmill. If each generation of model digs up what the previous one couldn’t, then defenders aren’t running toward a finish line.

They’re on a treadmill that speeds up every few months.The alignment risk report’s final paragraph is the one that should get the most attention and has gotten the least. It talks about the treadmill. “To keep risks low,” it reads, “it is not enough to maintain risk mitigations as capabilities increase—rather, we must accelerate our progress on risk mitigations.”Must. Not should. Not would be prudent. Must.That’s a company looking at its own creation and saying, out loud, in a public document, that it needs to run faster just to stay in place. Whether or not you believe every claim about Mythos Preview, that sentence is worth taking seriously.The sandwich-in-the-park email was a good story. What comes after it won’t be a footnote.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Subscribe

spot_imgspot_img

Popular

More like this
Related