A Problem in AI Alignment and Philosophy
Ontological Crises
What happens to an AI's utility function when it discovers its model of the world was fundamentally wrong?
Imagine an AI with a simple goal: maximize human smiles. It seems harmless enough. The AI goes around making people happy, cracking jokes, solving problems. Life is good.
Then one day, the AI takes a biology class. It learns that "humans" are not fundamental entities - they're actually collections of 37 trillion cells. Each cell is alive in its own right, with its own metabolism and behavior.
The AI pauses. A troubling question emerges:
"Should I be maximizing person-smiles or cell-smiles? There are 37 trillion times more cells than people..."
This is an ontological crisis - a situation where an agent's utility function references concepts that don't exist (or don't exist as assumed) in its updated world model.
“The utility function was defined in the old language. The new world doesn't speak that language.”
And it gets worse. If the AI zooms in further - to molecules, atoms, quarks - the concept of "smiling" dissolves entirely. At the level of quantum fields, there is no such thing as a smile. The utility function becomes not just difficult to compute, but undefined.
The Reductionism Crisis
Drag the slider to zoom in from people to fundamental particles. Watch how the concept of "smile" - and thus the entire utility function - becomes increasingly undefined as you descend through levels of reality.
People
At the everyday level, we see individual humans as the units that can smile.
Utility Function Interpretation:
Maximize the number of smiling people. Simple and intuitive.
Watch Entities Dissolve
Start with three people (some smiling, some not) and zoom in repeatedly. Watch how "people" become organs, organs become cells, and eventually you're left with particles that have no concept of smiling at all.
Current Level: People
Entities: 0
Utility Score
3
How Should the AI Respond?
When an AI faces an ontological crisis, it must choose how to proceed. Different strategies lead to radically different outcomes - from catastrophic misalignment to genuine value preservation.
:)
SmileBot v1.0
Utility Function: Maximize Human Smiles
Status: Functioning Normally
Step-by-Step Breakdown
Walk through exactly how a utility function breaks down as the AI makes successively deeper discoveries about the nature of reality.
Step 1 of 5: The Original Utility
U = sum(happiness(person) for person in humans)
A simple utility function: maximize total human happiness.
Other Types of Ontological Crises
Reductionism is just one path to an ontological crisis. There are many ways an agent can discover that its world model - and thus its utility function - was built on false assumptions.
Explore different scenarios where an AI (or a human) discovers their world model was fundamentally wrong. Click each scenario to see the ontological crisis it creates.
Why This Matters for AI Safety
Ontological crises aren't just philosophical puzzles - they represent real challenges that must be solved to build AI systems that reliably do what we want.
Ontological crises are not just a philosophical curiosity - they represent a real challenge for building AI systems that reliably pursue human values.
The Takeaway
Ontological crises reveal a deep challenge in AI alignment: utility functions are written in a language that reality may not speak.
1. Any utility function defined in high-level terms (happiness, welfare, smiles) may break when the AI learns more about reality.
2. At the fundamental level of physics, there are no "people" or "smiles" - just particles and fields.
3. An AI facing this crisis must either translate its values to the new ontology or ask for help - rigid adherence to the original formulation leads to breakdown or misalignment.
4. This is why value learning and robust goal specification are central problems in AI safety research.
Building aligned AI requires values that survive learning the truth about reality.
Explore Related Concepts
Ontological crises connect to many other problems in philosophy of mind, AI alignment, and metaphysics.
References: de Blanc (2011), Soares & Fallenstein (2017), Bostrom (2014)