Value Alignment Evaluation

Guidelines for Word Alignment Evaluation and Manual Alignment

The purpose of this paper is to provide guidelines for building a word alignment evaluation scheme. The notion of word alignment quality depends on the application: here we review standard scoring ...

MediaNama

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

On August 27, 2025, Anthropic and OpenAI jointly released findings from their pilot alignment evaluation exercise, marking a significant collaboration between the two AI research organisations. In ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Guidelines for Word Alignment Evaluation and Manual Alignment

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

Trending now