Anthropic Opens Up AI-Powered Vulnerability Discovery Framework

A New Open-Source Blueprint for AI Security

Anthropic has taken a significant step towards democratizing AI-powered security by releasing its "Defending Code Reference Harness" on GitHub. This open-source framework provides a reference implementation for autonomous vulnerability discovery and remediation using Claude, offering security teams a blueprint to build their own scanning pipelines.

The release coincides with a major expansion of Anthropic's Project Glasswing initiative. The company is adding approximately 150 new vetted partners across 15 countries to the program, which grants access to the powerful Claude Mythos Preview model. This follows an initial cohort of around 50 organizations announced in early April.

This dual-track approach—open-sourcing foundational tools while expanding controlled access to cutting-edge models—reflects Anthropic's strategy to accelerate AI adoption in cybersecurity while managing associated risks. The framework is based on learnings from partnerships with security teams since Mythos' launch.

Inside the Reference Harness

The GitHub repository provides both interactive skills and an autonomous pipeline designed to help security teams implement AI-assisted vulnerability discovery workflows. The framework includes several key components that mirror professional security processes.

The interactive skills are designed for Claude Code and include:

/threat-model: For building security threat models
/vuln-scan: For running static vulnerability scans
/triage: For verifying, deduplicating, and ranking findings
/patch: For generating and validating fixes
/customize: For adapting the pipeline to different codebases

These skills operate in read/write mode only and are safe to run without sandboxing when used interactively with Claude Code. The autonomous pipeline, however, executes target code and requires sandboxing via gVisor containers with egress restricted to the Claude API.

The Pipeline Architecture

The autonomous reference pipeline follows a sophisticated seven-stage workflow that demonstrates Anthropic's approach to AI-powered security testing. The current implementation focuses on finding C/C++ memory vulnerabilities using Docker and ASAN (AddressSanitizer).

The pipeline stages include:

Build: Compiles the target into a Docker image with ASAN
Recon: Identifies distinct input-parsing subsystems for targeted exploration
Find: Parallel agents craft malformed inputs to trigger crashes
Verify: Separate grader agents reproduce crashes in fresh containers
Dedupe: Judge agents compare findings against known bugs
Report: Creates structured exploitability analyses
Patch: Generates and validates proposed fixes

Anthropic emphasizes that this is a reference implementation, not a product, and requires customization for different languages, vulnerability classes, and detection methods. The company provides guidance for porting the pipeline to other stacks by answering key questions about build processes, proof-of-concept formats, and detection signals.

Project Glasswing's Expanding Impact

Since its launch in early April, Project Glasswing's initial 50 partners have used Claude Mythos to uncover more than 10,000 high-or-critical-severity vulnerabilities across what Anthropic describes as "some of the most systemically important software in the world."

The new expansion includes essential infrastructure providers, maintainers of critical open-source software, and safety testers based both in the US and overseas. Anthropic's vetting process ensures partners meet strict security requirements before accessing Mythos.

The company's blog post notes that "for most partners, we estimate that a major attack could affect more than 100 million people, with important ramifications for both global and national security." This highlights the strategic importance of the organizations now gaining access to these capabilities.

continue reading below...

The Verification Challenge

While AI models dramatically accelerate vulnerability discovery, they create new bottlenecks in the security workflow. Anthropic's own testing reveals the scale of this challenge: Mythos scanned more than 1,000 open-source projects, flagging 23,019 potential vulnerabilities with 6,202 estimated as high or critical.

Of 1,752 high- or critical-rated findings that underwent independent review, over 90% were confirmed as valid. This high validation rate underscores Mythos' effectiveness but also highlights the verification burden.

Anthropic acknowledges this fundamental shift, stating in their blog post: "The bottleneck in fixing bugs like these is the human capacity to triage, report, and design and deploy patches for them." This admission aligns with concerns raised by security organizations about defenders being overwhelmed by AI-accelerated attacks.

Competitive Landscape and Market Implications

Anthropic's moves come amid increasing competition in the AI security space. OpenAI has reportedly offered nine major UK banks access to its cybersecurity AI tool, GPT-5.5 Cyber, according to sources cited in the CSOonline article.

This competitive pressure is driving rapid innovation but also raising concerns about the asymmetry created by high-capability security models concentrated in the hands of select organizations. As editorial analysis from Let's Data Science notes, "The asymmetry created by a small group of actors running high-capability security models... makes governance, access controls, and coordinated vulnerability disclosure primary concerns."

Anthropic has addressed some of these concerns by releasing Claude Security, a product using its publicly available Claude Opus 4.8 model that has been used to patch more than 2,100 vulnerabilities in three weeks. This provides a more accessible option for organizations not in the Glasswing program.

Practical Implementation Guidance

Anthropic provides a four-step ramp-up plan based on learnings from successful security team implementations:

Step 1 (Day 1): Teams are encouraged to build a threat model and run their first static scan and triage using the interactive skills. This establishes the basic workflow without requiring sandboxing.

Step 2 (Day 2): Security teams run the autonomous pipeline on a known-vulnerable C/C++ library to understand the full recon → find → verify → report loop in action.

Step 3 (Days 3-5): Organizations customize the pipeline for their specific targets by answering key questions about their technology stack, vulnerability signals, and proof-of-concept formats.

Step 4 (Week 2): Teams implement autonomous scanning at scale, adding an outer loop to manage findings across multiple pipeline runs, prioritize triage, and coordinate patching.

Security and Governance Considerations

Anthropic has implemented significant safeguards in its reference framework. The autonomous pipeline requires gVisor sandboxing and refuses to run outside it unless explicitly overridden. Agents run in isolated containers with egress restricted to the Claude API.

The company has also stated it will not release Mythos-class models to the general public, citing the absence of sufficient safeguards to prevent serious misuse. This cautious approach reflects concerns about the dual-use nature of powerful vulnerability discovery tools.

As the CyberScoop article notes, a joint report from the Cloud Security Alliance, the SANS Institute, and OWASP concluded that organizations are "likely to be overwhelmed" in the near term by threat actors using AI to find and exploit vulnerabilities faster than defenders can patch them.

The Future of AI-Powered Security

Anthropic's framework release and Glasswing expansion represent a pivotal moment in the evolution of cybersecurity. The company describes this as "the next step toward our long-term goals: for AI to make all software more secure, and for us to help the industry adjust to how AI could change many of the core assumptions of cybersecurity."

Successful implementations following the ramp-up plan tend to evolve in several directions: reviewing internal repos and dependencies by priority, building scanning infrastructure, incorporating scans into SDLC processes, and establishing recurring scanning schedules.

The open-source nature of the reference harness allows security teams to adapt these approaches to their specific needs while Anthropic continues to develop more advanced capabilities through its Glasswing program. This balanced approach—combining open accessibility with controlled access to cutting-edge technology—may define how AI transforms vulnerability discovery and remediation in the coming years.