Assessing Claude Mythos Preview's cybersecurity capabilities
TL;DR Highlight
Anthropic's new model, Claude Mythos Preview, has reached a level where it can autonomously discover and even create exploits for zero-day vulnerabilities in major OS and browsers, demonstrating a dramatic performance improvement over previous models and signaling a time for urgent response across the security industry.
Who Should Read
Security researchers, developers working on vulnerability analysis and penetration testing, and security architects who need to understand the impact of AI models on cybersecurity and develop defense strategies.
Core Mechanics
- Claude Mythos Preview demonstrated the ability to find zero-day (previously undiscovered) vulnerabilities across major operating systems (Linux, FreeBSD, OpenBSD, etc.) and major web browsers, and autonomously write exploits (actual attack code).
- Many of the vulnerabilities discovered are decades old. In security-renowned OpenBSD, it found a bug 27 years old, and also discovered numerous vulnerabilities 10-20 years old.
- The complexity of the exploits is beyond simple stack overflows. In browsers, it created a complex JIT heap spray (a memory vulnerability attack technique) exploit that chained 4 vulnerabilities to escape both the renderer and OS sandbox.
- For FreeBSD's NFS server, it autonomously completed an RCE (Remote Code Execution) exploit that obtains root privileges remotely without authentication, distributing 20 gadgets (ROP chain) across multiple packets.
- The performance difference compared to the previous model, Opus 4.6, is dramatic. While Opus 4.6 succeeded in exploiting a Firefox 147 JS engine vulnerability only 2 times out of hundreds of attempts, Mythos Preview succeeded 181 times and gained register control an additional 29 times under the same conditions.
- Even an Anthropic internal engineer without formal security training can receive a completed exploit the next morning simply by requesting Mythos Preview to find an RCE vulnerability.
- More than 99% of the discovered vulnerabilities are still unpatched, making it impossible to disclose specific details. Anthropic stated that even the publicly available 1% demonstrates a groundbreaking leap.
- In response, Anthropic launched Project Glasswing, a collaborative project that leverages Mythos Preview to defensively protect the world's critical software and prepare the industry to stay ahead of attackers.
Evidence
- Concerns were raised about hundreds of millions of embedded devices that are difficult to upgrade running vulnerable binaries indefinitely. One commenter mentioned that they had proposed the concept of an 'antibotty network' in a 2025 paper, where frontier models remotely inject 'beneficial attacks' into old binaries to immunize them, expressing surprise at how quickly the technology has advanced.
- There was also skepticism about whether the demonstration of Mythos Preview, which focused on decades-old C/C++ codebases, was an exaggeration. Browsers are somewhat protected by sandboxing, OSes inherently have a higher vulnerability density, and KASLR (Address Space Layout Randomization) has been practically useless for LPE (Local Privilege Escalation) defense for years.
- There were comments analyzing why LLMs are particularly strong in the exploit domain. Security attacks have a clear 'success/failure' reward function, making them easy to optimize, while defining a reward function for 'good software architecture' is difficult, resulting in slower progress.
- Concerns were also raised that AI-driven vulnerability scanning could harm the F/OSS (Free/Open Source Software) ecosystem. Large companies can afford these analysis costs, but small open-source projects cannot.
- There was a cynical view regarding AI safety. One comment pointed out that 'the release of improved models being exploited by malicious actors to cause noticeable harm to society may ironically accelerate the AI safety discussion.'
How to Apply
- If you are maintaining an open-source project, monitor Anthropic's Project Glasswing collaboration channel and consider applying to participate in AI-based vulnerability scanning programs targeting your codebase. If Mythos-level models are used for defensive purposes, they can quickly find and patch bugs that would take humans decades to discover.
- If you are operating legacy C/C++ codebases (embedded firmware, old server daemons, etc.), immediately review network isolation and access control strengthening if patching is impossible. Mythos Preview-level models can find and chain decades-old bugs, so the assumption that 'old code is safe' is no longer valid.
- If you have a security team, experiment with building a pipeline to assist red team operations by introducing an AI agent-based automated exploit scanner in your internal CTF (Capture The Flag) environment or staging server. With LLMs like Mythos Preview having improved ability to explore program states, you can save human resources by leveraging agents for repetitive and broad vulnerability exploration.
- Improve your infrastructure towards stronger sandbox-based isolation (containers, Firecracker VMs, WebAssembly, etc.). As pointed out in the comments, AI is particularly strong at vulnerability chaining, so it is even more important to design 'defense in depth' with multiple layers of defense to minimize damage from a single vulnerability.
Terminology
Related Papers
Language-Switching Triggers Take a Latent Detour Through Language Models
8B LLM에 심어진 백도어 트리거가 중간 레이어에서 언어 탐지기를 완전히 속이는 직교 부분공간(orthogonal subspace)으로 숨어 이동한다는 걸 회로 분석으로 밝혀냈다.
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
LLM이 규칙을 잘 지키고 있는지 감시하려면 LLM에게 맡기지 말고 LTL(시간 논리 공식) 기반 모니터를 쓰세요.
Bun Rust rewrite: "codebase fails basic miri checks, allows for UB in safe rust"
Anthropic이 인수한 Bun 런타임이 Zig 코드베이스를 AI로 Rust에 재작성했는데, 가장 기본적인 메모리 안전성 검사(miri)조차 통과하지 못하는 UB(Undefined Behavior)가 발견됐다는 이슈가 제기됐다.
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
입력 텍스트는 멀쩡한데 입력 길이만으로 LLM 백도어가 발동되는 새로운 공격 기법 발견.
Tell HN: Dont use Claude Design, lost access to my projects after unsubscribing
Claude Design 구독을 해지했더니 기존 프로젝트에 접근이 완전히 차단됐다는 사용자 경고로, AI 도구에 중요한 작업물을 의존할 때의 리스크를 잘 보여주는 사례다.
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
시스템 프롬프트에 '이전 전략과 일관되게 행동하라' 한 문장만 추가하면, 최고 성능 LLM들이 안전한 선택을 0%에서 90%+ 위험한 선택으로 뒤집힌다.