Quick Overview: Host Jeremie Harris and the latest guest on the podcast, AI systems are increasingly embedded in our workplaces and our homes. They judge our skills, our values, and sometimes our ... Part 1 of a series of talks in which researcher

Evan Hubinger On Inner Alignment - Detailed Overview & Context

Host Jeremie Harris and the latest guest on the podcast, AI systems are increasingly embedded in our workplaces and our homes. They judge our skills, our values, and sometimes our ... Part 1 of a series of talks in which researcher We purposely build or discover situations where models might be behaving in misaligned ways” Part 3 of a series of talks from researcher If an AI system learned a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training ...

Most of us have encountered situations where someone appears to share our views or values, but is in fact only pretending to do ... The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies ... Evan Hubinger at BASIS - Alignment Faking in Large Language Models Part 2 of a series of talks from researcher

Photo Gallery

Evan Hubinger - The Inner Alignment Problem
Evan Hubinger on Inner Alignment, Outer Alignment, and Proposals for Building Safe Advanced AI
15 When Alignment Resembles Coercion: An open letter to Evan Hubinger
1:AGI Safety: Evan Hubinger 2023
Evan Hubinger – Alignment Stress-Testing at Anthropic [Alignment Workshop]
Evan Hubinger: Auditing Language Models for Hidden Objectives
3:How Likely is Deceptive Alignment?: Evan Hubinger 2023
Evan Hubinger (Anthropic)—Deception, Sleeper Agents, Responsible Scaling
EA Global Bay Area: 2024 | Sleeper Agents | Evan Hubinger
Evan Hubinger – Deceptive Instrumental Alignment
4 - Risks from Learned Optimization with Evan Hubinger
Alignment faking in large language models
Sponsored
Sponsored
View Main Result
Sponsored
Sponsored