May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« The probability of vanishing posts | Main | Statistics and baseball »

19 October 2006

Simpson’s Paradox

Jim Greiner

As a lawyer, I have to be interested not just in what quantitative principles are true, but also in how to present “truth” to people without quantitative training. To that end, HELP! One of the maddening things about statistics is Simpson’s paradox. The quantitative concept, undoubtedly well-known to most readers of this blog, is that the correlation between two variables can change sign and magnitude, depending on what is conditioned on. That is, Corr(A, B | C) might be positive, while Corr(A, B | C, D) might be negative, while Corr (A, B | C, D, E) might be positive again. At bottom, this is what’s going on when regression coefficients become (or cease to be) significant as one adds additional variables to the right-hand side. Because regression currently enjoys a stranglehold on expert witness analyses in court cases (I’ll be ranting on that in the future), communicating Simpson's Paradox a matter of real concern for someone like me who cares about what juries see, hear, and think. Any ideas on how to get this concept across?

Posted by James Greiner at October 19, 2006 11:13 AM

Comments

James,

In class, I use the Jeter vs. Justice batting average example from _A Mathematician at the Ballpark_ by Ken Ross (2004), pp. 12-14.

It might not go over so well with Red Sox fans.

Another good one is the Berkeley graduate school admissions example from Freedman et al's _Statistics_, pp. 17-20, 3rd edition.

For the latter, it's easier to explain the lack of controls driving the results.

Eric

Posted by: Eric at October 19, 2006 11:46 AM

Of course, Simpson's Paradox is an example of Stigler's Law of Eponymy ("no scientific discovery is ever named after its original discoverer"), having been identified as early as 1899 (and possibly earlier). This website has lots of fun facts about the earliest use of various terms in mathematics and statistics.

Posted by: Mike Kellermann at October 19, 2006 1:54 PM

For categorical data, the circumstances under which collapsing won't reverse the directions is given in Agresti (Categorical Data Analysis), in sections 5.3.2 and 5.5.4.2. I hope that will be of some help.

Posted by: Barry at October 19, 2006 3:05 PM

James,

Is that the correct definition of Simpson's Paradox? The site mentioned by Mike claims that:

SIMPSON'S PARADOX is the name given to a result in conditional probability by C. R. Blyth: "The paradox is the possibility of P{A | B}

At the very least, I think that it is possible to have regression coefficients change with the addition of more variables (the situation which worries you) even if the data is not an example of Simpson's Paradox as defined above.

Or perhaps I am missing something? Simpson's Paradox has always confused me . . .

Posted by: David Kane at October 20, 2006 8:48 PM

I would definitely use a concrete example to explain it - as simple an example as you can find. Nancy Cartwright has some discussion of the Berkeley grad school case in her famous paper "Causal Laws and Effective Strategies" which you might find useful. I think it's pretty understandable.

Whatever way you think will be the easiest to explain it, why don't you try explaining it to a child that way and see if they get it.

Posted by: E. at October 23, 2006 12:36 AM

As far as an example of Simpson's paradox goes, I took the example (with explanation) from my math book and put it up on a webpage. See http://people.hmdc.harvard.edu/~jgill/simpson/node1.html

Posted by: jgill at October 26, 2006 3:47 PM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)