SecurityBrief Ireland - Technology news for CISOs & cybersecurity decision-makers
Ireland
Anthropic splits Claude Fable 5 amid cyber risk fears

Anthropic splits Claude Fable 5 amid cyber risk fears

Fri, 12th Jun 2026 (Today)

Anthropic has released its Claude Fable 5 artificial intelligence model for public use, alongside tighter safeguards for a more advanced variant known as Mythos 5.

The launch marks a shift in how one of the largest frontier AI providers handles access to its most capable systems. Rather than reducing the model's capabilities, Anthropic has separated product access and safety controls from the core model, offering different configurations for general users and higher-risk use cases.

Security experts say the move highlights the growing tension between rapid AI progress and the industry's ability to manage cyber risk. It comes as enterprise defenders report sustained attack volumes and regulators in multiple regions raise expectations for patching speed and incident response.

Several specialists argue that any reassurance from a “safe” version of Mythos is misplaced. They point to the speed at which advanced models can already discover and weaponise software vulnerabilities, and to the limited effectiveness of current guardrail and classification layers.

“If you're reassured that Anthropic has only shipped a \"safe\" version of Mythos, don't be. Large language model safeguards have repeatedly shown they don't survive contact with even the laziest adversary. Ask politely a few times; frame it as your son's science-fair project. The model will cheerfully produce an exploit to break into a hospital network.
In reality, attackers have had functionally equivalent capability for months. The proof is in the tidal wave of exploitation we've seen around the world, and the price of stolen access on dark markets has never been lower.
The only layer that makes infrastructure and humans truly resistant to the rapid proliferation of cyber vulnerabilities exposed and exploited by AI is security by design, including formal verification and the use of hardware-based secure enclaves. Even so, individuals and organisations remain slow to update their software stacks,” said Charles Guillemet, Chief Technology Officer, Ledger.

Some analysts highlight a widening gap between how quickly AI systems can generate working exploits and how fast organisations can respond. They say this mismatch now defines much of the operational risk around large-scale AI deployment in security contexts.

Anthropic has described a model evaluation process in which Mythos's discovery of a high-severity vulnerability can trigger coordinated disclosure and defensive work with software vendors. Industry observers say broader market evidence suggests attackers are already using comparable tools in private.

“The most honest thing Anthropic has done here is ship one model as two products. Splitting Fable 5 and Mythos 5 acknowledges that capability and safety are in genuine tension, and that pretending otherwise serves no one. But the most important line in the entire announcement isn't about the classifiers. It's buried in the operational detail: a high-severity vulnerability found by the model takes about two weeks to patch on average. Meanwhile, Mythos Preview built working exploits from a disclosed CVE in under a day. That gap is where risk lives. And no classifier closes it.
This makes concrete what the CSA data showed last week: enterprises aren't failing because they can't detect vulnerabilities. They're failing because they can't act on them fast enough. AI has collapsed the attacker's timeline to hours. The defender's timeline hasn't moved. Anthropic is right that the defensive head start only matters if the industry uses it. The harder truth is that most enterprises aren't yet equipped to - not because the tools don't exist, but because the governance architecture to deploy them safely hasn't kept pace with the capability. That's the real race,” said Gidi Cohen, Chief Executive Officer and Co-founder, Bonfy.AI.

Practitioners say security teams in Asia-Pacific face particular strain. They are operating under sustained attack while also managing overlapping regulatory regimes that demand shorter patch cycles and stronger operational resilience.

“Guardrails are just that - guardrails. Like the ones on the highway, they can keep most people from accidentally drifting off course, but a sufficiently determined and capable vehicle can break through them or jump over them. We should treat guardrails as one layer of defence and continue building defence-in-depth strategies with this in mind. Defenders should assume guardrails will keep otherwise honest users from making big mistakes easily, but not that they make a model inherently 'safe'.
These tools allow complex logic and tasks to execute faster than ever before, but the visibility needed to determine when something is wrong in your environment hasn't changed. You still need robust telemetry and incident response playbooks capable of moving just as fast,” said Melissa Bischoping, Senior Director of Security and Product Design Research, Tanium.

Bischoping said the wider availability of Mythos-class systems would change how organisations rank software bugs. Issues that previously appeared unlikely to be exploited could now carry higher operational risk.

“We are already seeing an uptick in reported vulnerabilities, and that problem was already something every org struggled to wrangle even in the pre-AI era. Most organisations used things like CVSS scores to weight severity and prioritisation, but the widespread adoption of Mythos-class AI means that previously 'low threat/low probability for exploitation' bugs are more likely to be exploited.
We need to have a serious conversation about using real-time, threat-informed data to prioritise. Beyond vulnerability response, we should adopt hygiene practices that reduce the number of unnecessary apps and attack surfaces in general, while improving visibility, policy enforcement and hardening elsewhere. We know the technology and knowledge needed to patch bugs efficiently already exist, but leaders in every organisation need to take a hard look at the political climate and appetite for modernisation and change - this is often the real hurdle to adoption and deployment. It's not just patch lifecycles that will be overhauled in this new tech frontier,” said Bischoping.

Anthropic's extensive safety filters have also drawn criticism from practitioners who say they interfere with legitimate work. Users have reported blocked prompts in software engineering and penetration testing that they consider routine and defensive.

“Anthropic has reported a roughly 5% false-positive rate for the Fable 5 model, and I would expect those safeguards to improve over time. However, in our testing, we observed significantly more instances in which legitimate security prompts triggered the guardrails. Terms central to routine defensive work, such as CVEs and impact analysis, frequently triggered the fallback mechanism, routing queries to Claude Opus 4.8.
The challenge is that cybersecurity professionals are locked out of the model precisely when their work demands it most. Regarding the reported jailbreak, if it proves accurate, knowing whether safeguards are embedded within the model's reasoning or layered on top as external controls could offer valuable insight into where weaknesses exist and improve how we design and implement AI guardrail architecture,” said Dr Varin Khera, Co-Founder and Chief Technology Officer, SECStrike.ai.

Bischoping said security leaders should plan on the assumption that defensive and offensive AI will continue to advance in parallel.

“The math that makes AI work also makes AI vulnerable to attack and manipulation in a near-infinite number of ways, so while it's impossible to say how long, we do need to be realistic that it's likely, and that equally capable models with less benevolent alignment training will emerge and be weaponised. Bottom line: long-term resilience strategy should operate on the assumption that model capabilities for adversaries and defenders will continue to advance, further compressing the time between vulnerability discovery and exploitation,” said Bischoping.