OpenAI found features in AI models that correspond to different ‘personas’

By looking at an AI model's internal representations — the numbers that dictate how an AI model responds, which often seem completely incoherent to humans — OpenAI researchers were able to find patterns that lit up when a model misbehaved.

Jun 18, 2025 - 23:10
 0
OpenAI found features in AI models that correspond to different ‘personas’
By looking at an AI model's internal representations — the numbers that dictate how an AI model responds, which often seem completely incoherent to humans — OpenAI researchers were able to find patterns that lit up when a model misbehaved.