A Decision Theory Paradox
Counterfactual
Mugging
A perfect predictor says: “I flipped heads. But if tails, I'd have given you $10,000 if I predicted you'd pay me $100 now. Pay up?”
A perfect predictor approaches you with a proposition:
“I flipped a fair coin. It landed heads. Now I'm asking you to give me $100. You receive nothing in return.
However, if the coin had landed tails, I would have given you $10,000 if and only if I had predicted that you would pay me in the heads scenario.
I made my prediction before the coin flip. What do you do?”
The coin already landed heads. Paying $100 gets you nothing.
But here's the twist: the predictor's prediction was made before the flip. If you're the type of agent who would pay, the predictor knew that. And in the counterfactual world where the coin landed tails, you would have received $10,000.
Should you pay for a counterfactual?
Play the Game
Experience the counterfactual mugging yourself. The predictor will learn your decision algorithm over multiple rounds. Notice how your pattern of choices affects the outcomes.
A perfect predictor presents you with a strange scenario. Your decision will reveal what decision algorithm you run.
Play multiple rounds. The predictor adapts to your decision pattern.
Why You Shouldn't Pay
Causal Decision Theory says: Don't pay. The reasoning is simple:
The coin already landed.
It's heads. The counterfactual (tails) didn't happen and won't happen.
The prediction is already made.
Nothing you do now can change what the predictor predicted yesterday.
You're paying for nothing.
The $100 leaves your pocket and you get $0 in return. That's strictly worse than refusing.
Dominance: Refusing is better (or equal) in every state of the world.
CDT adherents argue: “You can't cause the past. The prediction is fixed. Giving away money for no causal benefit is just being a sucker.”
“I'm not going to be mugged by a counterfactual. The coin landed heads. End of story.”
Why You Should Pay
Functional Decision Theory says: Pay. The reasoning is subtle but powerful:
You are a decision algorithm.
The predictor analyzed your algorithm, not just this moment. Your decision reveals what algorithm you run.
Think from before the coin flip.
What decision algorithm has the best expected value? “Always pay” vs “Never pay”?
Calculate expected value:
“Always pay”: 50% x (-$100) + 50% x $10,000 = $4,950
“Never pay”: 50% x $0 + 50% x $0 = $0
Agents running “pay” algorithms have $4,950 higher expected earnings.
FDT adherents argue: “The question isn't 'what should I do now?' It's 'what decision algorithm should I be running?' The predictor can see my algorithm. If I'm the type who pays, I get $10,000 in the counterfactual.”
“I'm not paying for this outcome. I'm paying because agents like me who pay get $10,000 in the other branch.”
Which Agent Wins?
Run a simulation comparing CDT agents (who never pay) against FDT agents (who always pay). Over many rounds, the expected value difference becomes clear.
Pre-commitment vs Actual Choice
Here's a key insight: Would you pre-commit to paying before you knew the coin result? Most people say yes. But when the coin actually lands heads, they want to refuse.
This inconsistency is the heart of the paradox. If you would pre-commit to paying, but then refuse when the moment comes, what does that say about your decision algorithm?
Step 1: Before the scenario, what would you pre-commit to doing?
The predictor doesn't just predict your action. It predicts your algorithm.
Decision Theories Compared
The Counterfactual Mugging is a key thought experiment that separates different decision theories. Hover over each theory to learn more about its reasoning.
| Theory | Says | EV |
|---|---|---|
CDT | Refuse to pay | $0 |
EDT | Pay (if it signals you would) | $4,950 |
FDT | Pay | $4,950 |
UDT | Pay | $4,950 |
Why This Matters for AI
The Counterfactual Mugging isn't just a philosophical puzzle. It has direct implications for how we build and align artificial intelligence.
AI Decision Procedures
What decision procedure should an AI run? If we build AIs that use CDT, they'll refuse cooperation in situations where cooperation is beneficial. FDT-like reasoning may be necessary for AI alignment.
Predictable AI
As AI systems become more transparent and predictable, Newcomb-like situations become more common. An AI that can be perfectly predicted faces counterfactual muggings constantly.
Acausal Trade
Advanced AI systems might engage in "acausal trade" - coordinating with other decision-makers across counterfactual branches. This requires reasoning beyond simple causality.
Corrigibility
Should an AI commit to being corrigible even when it can predict that humans will make bad decisions? This is structurally similar to the counterfactual mugging.
The Deep Question
The counterfactual mugging forces us to ask: What is the right way to make decisions when your decision procedure itself is part of the environment?
In a world of predictors, simulators, and copies, your decision algorithm doesn't just determine your actions - it determines your outcomes across all branches where your algorithm runs.
What decision algorithm do you want to be running?
Explore More Decision Theory
The Counterfactual Mugging is closely related to Newcomb's Paradox. Explore both to fully understand the landscape of decision theory.
Reference: Yudkowsky (2010), Soares & Fallenstein (2017)