May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Jeff Gill on "Circular Data in Political Science and How to Handle It" | Main | Predicting Pennsylvania, Updated »

22 April 2008

Gelman's Paradox (or, The Probabilistic Backwards Reasoning Fallacy)

Andy Gelman posted this forwarded item regarding an apparent fallacy with averages and the misunderstanding of uncertainly. Essentially, it boils down to this reversal:

a) 100 students take a class, and 50 pass.
b) Given that next time, 50 students pass the (identical) class, how many students, on average, were enrolled?

The "fallacy" is in assuming that the expected number of original enrollees is 100, when it must necessarily be greater than 100 due to the uncertainty in the estimation of passing the class. The article points out that it's ignorance of the prior distribution of passing students that's at fault for the "fallacy" - I argue that it's the prior distribution of one student passing a test that's the cause of the paradox.

Break the problem in two:
a) 100 students take a class, and 50 pass.

Assume for the moment that a student passes or fails the class independent of their peers (which is a reasonable assumption for the initial problem, dealing with the failure rate of vehicles.) Let's assume the standard noninformative prior case, that "half a student" passes and "half a student" fails (the Jeffreys prior) and that students are basically identical. Then the posterior distribution of the probability of passing the test is equivalent to a Beta(50.5,50.5) distribution.

b) Given 50 students passed, on average how many enrolled?
The number of students enrolled in the class for each one who passed is then 1/p - but the mean of 1/p (in this case, 2.02) is necessarily greater than 1/(the mean of p), 2. So the expected class size must be greater under these assumptions. So roughly 101 students enrolled.

The original authors, however, make a profound overestimation of the average of starting students, choosing a "posterior" distribution that yields a class size of 150. To get an expectation this big with this prior information, we would observe a posterior of Beta(2.0,2.0) - or, 1.5 students passing and 1.5 failing! Putting this in perspective, the most likely way I can see this happening is that students pooled their talents and produced 3 distinct final papers: one good, one bad, and one just good enough to get the professor to flip a coin.

It does, however, seem to explain why Harvard classrooms always seem to overflow chaotically at the beginning of each term.

P.S. The original authors call this the "backwards reasoning fallacy", even though Google says the name is better applied to startling schoolchildren deterministically rather than failing them stochastically. Resolving the namespace collision here, does this problem go by another name, or shall we go via Stigler and call it Gelman's paradox?

Posted by Andrew C. Thomas at April 22, 2008 10:33 AM

Comments

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)