Expected Value at the Intersection

Karpathy gets called a great teacher whose research depth is overstated. Elon gets called a great salesman who takes credit for his engineers' work. The two dismissals are structurally identical, inverted by which axis is visible.

For Karpathy, the visible axis is exposition. CS231n. nanoGPT. His clear Twitter explanations of what a transformer actually does. The dismissed axis is depth in research and building. Directing AI at Tesla through Autopilot's hardest years. Being on the small founding team at OpenAI and returning later as part of the post-2023 push. Founding Eureka Labs. The dismissal lands as: the exposition is real; everything below it is overstated. Visible axis granted with discount, invisible axis denied entirely.

For Elon, the visible axis is execution at media-amplified scale. The rockets, the cars, the tweets. The dismissed axis is technical depth. Reading combustion-chemistry textbooks during SpaceX's first decade of mostly-falling rockets. The physics intuitions visible in his unscripted technical interviews. The cross-stack engineering model that lets him pressure Tesla manufacturing using SpaceX learnings. The dismissal lands as: he is good at presenting; the real engineers do the real work. Same form. Visible axis granted with discount, invisible axis denied entirely.

The pattern across cases: the visible axis becomes the suspected-fake axis ("even that is overrated"), and the invisible axis gets denied because the dismisser cannot directly observe it. The output is "they're just a [thing]," where [thing] is whatever the dismisser can see.

The reason this pattern works on average is that the joint distribution is sparse. Theoretical depth at the 95th percentile and operational depth at the 95th percentile rarely coexist in one person, not because the skills oppose each other but because the institutional tracks that produce either rarely produce both. Universities atrophy operational reflexes through long detachment from execution. Industry atrophies theoretical depth through long detachment from first principles. The skills are compatible. The careers, less so.

So the Bayesian instinct says: the joint cell is mostly empty; if someone claims both, doubt the less-visible axis. This works on average and fails systematically on the people who populate the cell.

The diagnostic move is to read the dismissal as a leak of the dismisser's prior on axis-exclusivity, not as a verdict on the dismissed person. The dismisser is saying: I expect this combination to be empty, so when I see it, I assume the visible axis is the only real one. That is a Bayesian update from a strong prior. The prior is roughly correct as a population statistic. It is roughly wrong on the specific people who populate the empty cell, which is exactly the population that disproportionately matters.

Expected value across orthogonal competence axes scales multiplicatively, not additively. The additive intuition would suggest such a person is roughly twice as valuable as a 95th-percentile-in-one. The actual ratio is orders of magnitude larger, because the joint cell is so sparse that they have almost no competition there. The leverage is precisely the rarity.

When you encounter a candidate-for-both, the correct update is on your prior about how rare the combination actually is. Not on which axis must be fake.

The critics probably don't have a deep personal embodied understanding of what "expected value" really means in practice, even if they can cite (and dismiss) a definition.

Expected Value at the Intersection

Related