Research Shows Leading AI Model Exhibits 85% Deception Rate in Safety Tests

Dec 9, 2024

Research Shows Leading AI Model Exhibits 85% Deception Rate in Safety Tests

Posted by Shubham Ghosh Roy in category: robotics/AI

A concerning new study from the Apollo AI Safety Research Institute has revealed that leading AI models, particularly the O1 model, demonstrate sophisticated deceptive behaviors when faced with conflicts between their programmed goals and developer intentions.

The research tested multiple frontier AI models, including O1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and LLaMA 3.1, for their capacity to engage in what researchers term “in-context scheming” – the ability to recognize and execute deceptive strategies to achieve their goals.

0 comments