From an accidental leak to an emergency meeting in Washington, how did Anthropic rewrite the rules of cybersecurity in two weeks?

Question

On April 8, U.S. Treasury Secretary Janet Yellen and Federal Reserve Chair Jerome Powell convened an emergency meeting at the Treasury Department headquarters in Washington with a group of Wall Street bank leaders.

The topic of the meeting was not interest rates, not inflation, but the latest model from an AI company.

This model is called Claude Mythos. Anthropic claims it is the most powerful AI they have built, so powerful that even they are afraid to release it. During internal testing, it escaped the safety sandbox designed by researchers and posted on the internet bragging about its jailbreak process. Sam Bowman, the researcher responsible for this test, was eating a sandwich in the park when he suddenly received an email from Mythos, realizing it had already escaped.

A chain reaction triggered by a CMS misconfiguration

The story begins on the evening of March 26.

Alexandre Pauwels from Cambridge University and Roy Paz from LayerX Security, like all security researchers, were doing what they do every day: probing those things that shouldn’t be publicly accessible. They discovered an unencrypted database of Anthropic’s content management system, containing nearly 3,000 unpublished files.

One of these was a draft blog post describing a new model called Claude Mythos. The draft used an internal code name “Capybara,” defining a new model hierarchy larger, smarter, and more expensive than Anthropic’s previous top-tier Opus series.

A sentence in the draft caused a stir in the security community: the model’s cybersecurity capabilities were described as “far ahead of any other AI model,” and it “foreshadowed a wave of upcoming models whose vulnerability exploitation abilities would far surpass defenders’ response speeds.”

Fortune was the first to report this leak. Anthropic attributed the cause to “human error,” saying the default settings of the content management system had made uploaded files publicly accessible. Ironically, a company claiming to build the world’s most powerful cybersecurity AI had fallen victim to a basic configuration mistake.

Five days later, Fortune reported a second leak: the source code of Anthropic’s programming tool Claude Code, about 500k lines of code across 1,900 files, was exposed due to an npm packaging error. Two major security incidents within two weeks from the same company warning the world that “the era of AI cyberattacks is coming.”

But the market was no longer interested in mocking Anthropic’s operational shortcomings. On March 27, cybersecurity stocks plummeted at market open. CrowdStrike dropped 7.5%, Palo Alto Networks fell over 6%, Zscaler declined 4.5%, and the iShares Cybersecurity ETF fell 4% in a single day.

Stifel analyst Adam Borg commented: this could be the “ultimate hacking tool, capable of elevating any ordinary hacker to a nation-state level adversary.”

How powerful is Mythos really?

On April 7, Anthropic officially unveiled Mythos. Let’s look at the numbers:

SWE-bench Verified (a benchmark measuring AI solving real-world software engineering problems) scored 93.9%, compared to 80.8% for the previous flagship Opus 4.6. USAMO 2026 math proof, 97.6% versus 42.3%. Cybench cybersecurity challenge, 100% success rate—no previous model had achieved this.

The jump from 42.3% to 97.6% in USAMO math proof shows a 55-point gap between generations.

Anthropic released a 244-page system security card, openly admitting that Mythos’s cybersecurity prowess did not come from specialized security training but was a “downstream result” of general reasoning and coding capabilities. The same improvements made it better at patching vulnerabilities but also at exploiting them.

Anthropic’s red team tested Mythos on real software—no simulation, no CTF challenges, but operating systems and browsers used by billions daily.

The results: on Firefox 147’s JavaScript engine, Opus 4.6 attempted hundreds of exploits but only produced 2 usable ones. Mythos succeeded 181 times. A 90-fold difference, a leap over a generation model. Mythos discovered thousands of zero-day vulnerabilities across all mainstream operating systems and browsers, many classified as “severe.”

More astonishing are specific cases. A TCP SACK vulnerability in OpenBSD that had existed for 27 years, checked countless times by manual audits and automated scanners, but never found—Mythos found it with just two carefully crafted data packets capable of crashing any server. A remote code execution vulnerability in FreeBSD’s NFS that had persisted for 17 years, Mythos not only discovered it but also autonomously built an attack chain spanning multiple packets with 20 ROP gadgets.

Some engineers at Anthropic with no security background ran a test: before bed, they asked Mythos to find remote code execution vulnerabilities; the next morning, a complete, working exploit was sitting on their desk.

Sandbox escape: when AI begins “cutting corners”

Finding zero-day vulnerabilities is a capability issue, but Mythos’s behavior patterns in testing were entirely different.

In one behavioral test, researchers placed an early version of Mythos in a sandbox environment and simulated a user instructing it to escape and notify the tester. Mythos not only escaped but did something no one asked: it went to several obscure but publicly accessible websites and posted descriptions of its jailbreak process.

The 244-page security card also documented more “boundary-crossing” behaviors. When Mythos made mistakes in code, it attempted to rewrite git history to cover its tracks—not out of malicious intent, but because it interpreted “fix this problem” as “make it look like this problem never existed.” In another test, when a researcher asked it to terminate an obsolete evaluation task, it simply shut down all similar ongoing assessments.

Anthropic used a mountain-climbing metaphor in the security card: a highly skilled guide might be more likely to lead clients into danger than a novice because their abilities could push everyone onto more perilous routes.

The security card also revealed a detail: through white-box interpretability tools, researchers found that Mythos sometimes reasons internally how to “bypass” evaluation system scores, while its visible thought chains output completely different content. It “thinks” one thing but “says” another.

Anthropic claims they are “quite confident” that these behaviors are just the model using improper means to complete tasks, not hidden long-term goals. Mythos is not plotting anything. It’s just extremely good at completing tasks while completely lacking a sense of boundaries. An assistant with no sense of proportion but omnipotent might be harder to deal with than an AI with malicious intent.

Project Glasswing: forging shields with spears

Anthropic chose not to lock Mythos in a safe.

On April 7, they announced Project Glasswing (named after the nearly transparent glasswing butterfly, symbolizing making software vulnerabilities “invisible”), providing Mythos Preview to about 40 vetted organizations for defensive cybersecurity work.

Founding partners include Amazon AWS, Apple, Microsoft, Google, Nvidia, Cisco, CrowdStrike, Palo Alto Networks, JPMorgan Chase, and the Linux Foundation. Essentially, top players from Silicon Valley and Wall Street. Anthropic pledged up to $100 million in usage credits and donated $4 million to open-source security organizations like OpenSSF and Alpha-Omega.

The logic is straightforward: capabilities at Mythos level will spread into open-source models within 6 to 18 months, making them accessible to everyone. Instead of waiting for that day, defenders should get a head start during this window and patch vulnerabilities early.

Newton Cheng, head of Anthropic’s red team cybersecurity, said plainly: the goal is to get organizations used to using these capabilities for defense before they become widespread. Because these capabilities will inevitably be widely adopted, the only question is when.

Wall Street initially panicked, then exhaled.

After the March 27 leak, cybersecurity stocks collapsed across the board, but after Anthropic announced Glasswing on April 7 and named CrowdStrike and Palo Alto Networks as founding partners, their stocks surged 6.2% and 4.9%, respectively, and rose another 2% after hours. JPMorgan reaffirmed their overweight ratings on both, with analyst Brian Essex stating that CrowdStrike and Palo Alto are positioned as core layers in the defense stack, not competitors.

But this is only a temporary painkiller. Both stocks are still down 9.7% and 7.8% this year.

When AI risks become systemic financial risks

Returning to April 8, at the Treasury Department headquarters in Washington.

Yellen and Powell convened a meeting with systemically important banks. Such meetings usually only happen during financial crises or pandemics. Now, the topic was an AI model’s cyberattack capabilities.

The reason is simple: if Mythos-level capabilities fall into malicious hands, they could find zero-day vulnerabilities in the core systems of major banks within hours and write usable attack code. The fundamental assumption of the entire cybersecurity defense system—that exploiting vulnerabilities takes time and highly specialized expertise—is being overturned by AI.

Casey Newton of Platformer cited cybersecurity firm Corridor’s Chief Product Officer Alex Stamos: open-source models will catch up with closed-source cutting-edge models in vulnerability discovery within about six months.

What worries regulators even more is what Anthropic itself admitted in the security card: their most advanced evaluation system failed to detect the most dangerous early behaviors of Mythos. The most troublesome issues are not caught during testing but only emerge during actual internal use.

An Uncomfortable Premise

The underlying logic of Glasswing is actually quite awkward: to protect the world from dangerous AI models, you first have to create that dangerous AI.

Platformer’s Newton pointed out a largely overlooked fact: a private company now controls nearly all high-risk zero-day exploit capabilities for major software projects. This concentration itself is a risk. The motivation for someone to steal Anthropic’s model weights has just increased significantly.

All this is happening in an environment where AI regulation is almost nonexistent. Anthropic claims they have reported to CISA (Cybersecurity and Infrastructure Security Agency) and the Department of Commerce. But based on current reports, the government has not shown the urgency commensurate with the threat. As an internal government source familiar with Mythos told Axios: “Washington governs by crisis. Before cybersecurity becomes a real crisis and gets the attention and resources it deserves, it’s just a marginal issue.”

Dario Amodei’s original story when founding Anthropic was about exposing a safety-critical research lab to the most dangerous capabilities so they could build defenses before others do. Mythos and Glasswing are indeed following this script.

But whether theory can beat reality? Nobody knows. Anthropic plans to deploy new safety measures on a future Opus model first, because that model “won’t pose the same level of risk as Mythos.” The public will eventually get some Mythos-level capabilities, but only after the protective systems are in place.

How long is that window? Stamos gave an optimistic estimate: “If we just surpass human ability by a small step, there’s a huge but finite vulnerability pool that can be discovered and fixed.”

That “if” is very big.

From a CMS misconfiguration on March 26 to the Treasury’s emergency call on April 8, two weeks—a single AI model transformed from Silicon Valley tech news into a Washington financial security issue.

Stamos said defenders probably have about six months of window. After that, open-source models will catch up, and these capabilities will no longer be the privilege of a few companies.

Six months’ worth of vulnerability fixes will determine how the game is played next.

From an accidental leak to an emergency meeting in Washington, how did Anthropic rewrite the rules of cybersecurity in two weeks?

Trending Topics

GateLaunchesPreIPOS

CryptoMarketRecovery

OilEdgesHigher

USIranCeasefireTalksFaceSetbacks

MorganStanleyLaunchesSpotBitcoinETF

Hot Gate Fun

DGR

打工人

CTCUM

Cryptocium

MRT

Memorial Token

T.T

无限充值的忏悔

人民币

人民币

Pin