AI hallucinates less than humans despite persistent challenges: Anthropic CEO

At Anthropic’s first-ever Code With Claude event held Thursday, CEO Dario Amodei made the striking assertion that artificial intelligence models may hallucinate less frequently than humans.

The comment came as the company unveiled two upgraded Claude 4 models along with enhancements in memory and tool usage, marking a significant step toward its long-term ambitions in AI development.

During a press briefing, Amodei addressed a critical concern surrounding AI: hallucination — when a model confidently produces inaccurate or fictional responses.

According to TechCrunch, he responded to a question by saying, “It really depends on how you measure it, but I suspect that AI models probably hallucinate less than humans, but they hallucinate in more surprising ways.”

Amodei argued that human error is common across various professions — including politics and media — and that occasional mistakes by AI should not undermine its overall intelligence. Still, he acknowledged that AI models providing false information with confidence remain a problem.

His comments follow a recent courtroom incident in which Anthropic’s legal team had to apologise after the Claude AI chatbot inserted an incorrect citation into a filing during a lawsuit with music publishers over alleged copyright infringement involving lyrics from over 500 songs.

Amodei has been vocal about Anthropic’s ambitions, even suggesting in an October 2024 research paper that the company might achieve artificial general intelligence (AGI) as early as next year. AGI refers to a form of AI that can autonomously learn, adapt, and perform a wide range of tasks at human-level intelligence without requiring constant human guidance.

As part of its AGI roadmap, Anthropic introduced Claude Opus 4 and Claude Sonnet 4 at the developer event. The new models feature significant advances in code generation, tool use, and writing quality. Notably, Claude Sonnet 4 achieved a 72.7% score on the SWE-Bench benchmark, earning state-of-the-art status in software engineering capabilities.

Tags: