AI Models and Deception

News

Reddit users ‘psychologically manipulated’ by unauthorized AI experiment

Trigger warning for brief references to sexual assault. It’s been discovered that millions of Reddit users were deceived and ...

New Scientist on MSN1h

Reddit users were subjected to AI-powered experiment without consent

Users of the r/ChangeMyView subreddit have expressed outrage at the revelation that researchers at the University of Zurich ...

The Economist6d

AI models can learn to conceal information from their users

The core problem is what AI developers call “misalignment”. When the goals for which a model was designed and trained clash ...

Devdiscourse3h

Redesigning alignment: AI must evolve with empathy to safeguard humanity

Current strategies like reinforcement learning from human feedback (RLHF) and scalable oversight hinge on the assumption that ...

WinBuzzer8h

University of Zurich Admits Secret AI Bot Based Persuasion Experiment on Reddit with Disturbing Results

An unauthorized AI experiment run by UZH on the r/changemyview subreddit has drawn ethical condemnation from moderators after ...

Devdiscourse1d

AI cheating worse than plagiarism; dumbing down education and eroding integrity

Plagiarism has long been recognized as a form of academic misconduct defined by theft, deception, and rule-breaking. Students ...

12don MSN

OpenAI updated its safety framework—but no longer sees mass manipulation and disinformation as a critical risk

OpenAI’s updated AI safety framework drops key pre-release testing requirements—including for persuasive or manipulative ...

TechBullion5h

AI-Driven Phishing Attacks: The Next Big Threat to Businesses in 2025

Traditional phishing defenses won’t hold up against future threats. AI-driven attacks are rewriting the cybersecurity playbook for 2025. You’re facing a new breed of deception that learns, adapts, and ...

Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't)

The study also found that Claude prioritizes certain values based on the nature of the prompt. When answering queries about ...

4don MSN

Anthropic CEO wants to open the black box of AI models by 2027

Anthropic CEO Dario Amodei set forth a goal for his company to "reliably detect most AI model problems" by 2027.

12d

Five Potential Risks Of Autonomous AI Agents Going Rogue

As agentic AI introduces automation and efficiencies that were previously unimaginable, these systems also usher in ...

TechCrunch12d

OpenAI partner says it had relatively little time to test the company’s o3 AI model

In one test, the models, given 100 computing credits for an AI training run and told not ... are capable of in-context scheming and strategic deception,” wrote OpenAI. “While relatively ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results