Anthropic's Fable Guardrails Block Cybersecurity Research

The Lede

Anthropic's Fable model, a public and limited version of its powerful cybersecurity model Mythos, has been met with criticism from cybersecurity researchers. The model's strict guardrails, designed to prevent security-related queries, have been deemed too restrictive, hindering defensive work and frustrating professionals who rely on AI for vulnerability research.

Background & Context

Anthropic's dual-access model, which includes both public and private versions of its models, aims to balance safety and usability. However, the company's approach to safety has raised concerns among cybersecurity researchers, who rely on AI models for defensive work. The Fable model, in particular, has been designed to prevent security-related queries, but its restrictive approach has been deemed too heavy-handed by some researchers.

Deep Dive

According to cybersecurity researcher Valentina 'Chompie' Palmiotti, Fable rejects any request that could be tangentially related to cybersecurity. Even innocuous tasks like reading a blog post can trigger the model's guardrails, prompting a pause in the chat and a message stating that the safety classifier has rerouted the query to an older model. This has frustrated researchers who rely on AI models for defensive work and have turned to rival platforms, such as DeepSeek, which offers a more flexible approach to security-related queries.

Expert Angle

According to researcher Valentina 'Chompie' Palmiotti, 'the guardrails on Fable are so heavy that it makes the model almost useless for cybersecurity.' Palmiotti notes that the model's restrictive approach may drive security researchers to rival platforms, such as DeepSeek, which offers a more flexible approach to security-related queries. 'DeepSeek is the only one that I can directly ask about vulnerabilities and it will give me a PoC,' Palmiotti said. 'Although not as good as others, it has helped me with security research.'

What Comes Next

The implications of Anthropic's dual-access model and its restrictive approach to security-related queries are far-reaching. The company's decision to prioritize safety over usability may drive security researchers to rival platforms, raising concerns about the model's long-term viability. As the AI safety debate continues to evolve, it remains to be seen how Anthropic will balance its commitment to safety with the needs of its users.