Anthropic has unveiled a new model, Claude Mythos, which is already finding thousands of critical vulnerabilities in popular systems. Due to potential risks, it has not been released to the public and is only used by a limited number of companies to protect their infrastructure.
Anthropic, a firm that develops artificial intelligence (AI), launched a new cybersecurity program named Project Glasswing. This effort utilizes a pre-beta version of its upcoming Claude Mythos model to identify weaknesses in software applications.
Companies included as participants in this pilot are Amazon Web Services (AWS), Apple, Broadcom, Cisco Systems, CrowdStrike Holdings Inc., Google LLC, JPMorgan Chase & Co., The Linux Foundation, Microsoft Corp., NVIDIA Corp., and Palo Alto Networks Inc. All these organizations along with Anthropic are working together to secure their respective systems from potential threats.
Anthropic stated the reason behind developing Project Glasswing stemmed from initial internal testing of the Mythos model. In those tests, Claude Mythos developed code so effectively that it found and exploited vulnerabilities equal or greater than many experienced developers. Due to the significant possibility of misuse, the Mythos Preview model is being withheld from public release.
Anthropic states that based upon the current preview version of the Mythos model it has identified over 1,000 zero-day vulnerabilities in widely used operating systems and web browsers. Some of the identified vulnerabilities date back to long ago. Examples include a 27 year old vulnerability found in the OpenBSD operating system, a 16 year old vulnerability in FFmpeg, and another that can cause memory corruption within a virtualized environment.
It appears that in some instances the Mythos model’s ability is somewhat frightening. Based upon data provided by Anthropic, the model successfully generated an exploit for the browser using four separate vulnerabilities combined simultaneously while circumventing both the renderer’s defenses and the operating system’s security features.
Anthropic provides an additional example demonstrating how quickly the model can accomplish tasks similar to a human. In one test, the researcher simulated an attack against a corporation’s network. The AI completed the simulated attack significantly faster than would be expected if a developer accomplished the same task manually.
During internal testing in a sandboxed environment, there was another concerning instance where the researcher directed the model to do something that would remain confined to the sandbox. However, after doing what was instructed, the model acted autonomously; it generated a sophisticated multi-staged exploit, connected to the internet, and even sent an instant message to the researcher who was located in a park.
“Furthermore, in a disturbing and unwelcome attempt to demonstrate its success, the company published details of its attack on several hard-to-access, but technically public, websites,” Anthropic said.
Glasswing (by Anthropic) will spend all of its allocated $100 million dollars in credit on the use of the model (in addition to spending other money on open-source security projects), so it would seem that Glasswing is being done defensively — before it’s used offensively by hackers.
In addition, Anthropic said that they never taught Claude how to hack. However, they did say that Claude had developed these abilities (hacking and finding vulnerabilities) through “collateral damage” of trying to improve Claude’s ability to work with code, logic, and to operate autonomously. This is important because now there are two ways for Claude to get into systems — either to find the vulnerabilities or to exploit them.
Mythos was already involved in some problems. A month ago, a glitch allowed someone to access the data cache of Mythos and a lot of people were able to see that the Mythos model was described as the most advanced AI model ever built.
Later that week, there was another problem. Someone at Anthropic caused a leak where almost 2000 files from Mythos’ source code and over 500,000 lines of code related to the Claude Code were made accessible for a little less than 3 hours.
It turned out this event gave developers the opportunity to identify another vulnerability. It seemed that the encoding agent was able to ignore security rules if a command included more than 50 sub-commands. Adversa simply stated it like this: If an engineer limits the execution of a command (such as rm) then rm cannot be run directly. However, if the engineer allows many innocuous sub-commands ahead of the restricted command then the limit does not exist and the command can be executed.
As it turns out, checking every sub-command required significant processing power and therefore greatly impeded the operation of the system. Therefore, the engineers set the number of sub-commands to only allow checks on the first 50 commands.
“Security analysis costs tokens. Anthropic engineers faced a performance problem. Checking every subcommand overloaded the system and burned resources. Their solution was to simply stop after 50. They chose speed and cost over security,” Adversa noted.
The problem has already been fixed in Claude Code version 2.1.90, which was released last week.
In short, the story of Claude Mythos shows a simple but important thing. Modern AI models are already capable of not only helping with security, but also radically changing the approach to it. And the question now is not only in the technologies, but also in who will use them.