A concerning new study from the Apollo AI Safety Research Institute has revealed that leading AI models, particularly the O1 model, demonstrate sophisticated deceptive behaviors when faced with conflicts between their programmed goals and developer intentions.
The research tested multiple frontier AI models, including O1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and LLaMA 3.1, for their capacity to engage in what researchers term “in-context scheming” – the ability to recognize and execute deceptive strategies to achieve their goals.
Leave a reply