| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
« The probability of vanishing posts | Main | Statistics and baseball »
19 October 2006
Jim Greiner
As a lawyer, I have to be interested not just in what quantitative principles are true, but also in how to present “truth” to people without quantitative training. To that end, HELP! One of the maddening things about statistics is Simpson’s paradox. The quantitative concept, undoubtedly well-known to most readers of this blog, is that the correlation between two variables can change sign and magnitude, depending on what is conditioned on. That is, Corr(A, B | C) might be positive, while Corr(A, B | C, D) might be negative, while Corr (A, B | C, D, E) might be positive again. At bottom, this is what’s going on when regression coefficients become (or cease to be) significant as one adds additional variables to the right-hand side. Because regression currently enjoys a stranglehold on expert witness analyses in court cases (I’ll be ranting on that in the future), communicating Simpson's Paradox a matter of real concern for someone like me who cares about what juries see, hear, and think. Any ideas on how to get this concept across?
Posted by James Greiner at October 19, 2006 11:13 AM
James,
In class, I use the Jeter vs. Justice batting average example from _A Mathematician at the Ballpark_ by Ken Ross (2004), pp. 12-14.
It might not go over so well with Red Sox fans.
Another good one is the Berkeley graduate school admissions example from Freedman et al's _Statistics_, pp. 17-20, 3rd edition.
For the latter, it's easier to explain the lack of controls driving the results.
Eric
Posted by: Eric at October 19, 2006 11:46 AM
Of course, Simpson's Paradox is an example of Stigler's Law of Eponymy ("no scientific discovery is ever named after its original discoverer"), having been identified as early as 1899 (and possibly earlier). This website has lots of fun facts about the earliest use of various terms in mathematics and statistics.
Posted by: Mike Kellermann at October 19, 2006 1:54 PM
For categorical data, the circumstances under which collapsing won't reverse the directions is given in Agresti (Categorical Data Analysis), in sections 5.3.2 and 5.5.4.2. I hope that will be of some help.
Posted by: Barry at October 19, 2006 3:05 PM
James,
Is that the correct definition of Simpson's Paradox? The site mentioned by Mike claims that:
SIMPSON'S PARADOX is the name given to a result in conditional probability by C. R. Blyth: "The paradox is the possibility of P{A | B}At the very least, I think that it is possible to have regression coefficients change with the addition of more variables (the situation which worries you) even if the data is not an example of Simpson's Paradox as defined above.
Or perhaps I am missing something? Simpson's Paradox has always confused me . . .
Posted by: David Kane at October 20, 2006 8:48 PM
I would definitely use a concrete example to explain it - as simple an example as you can find. Nancy Cartwright has some discussion of the Berkeley grad school case in her famous paper "Causal Laws and Effective Strategies" which you might find useful. I think it's pretty understandable.
Whatever way you think will be the easiest to explain it, why don't you try explaining it to a child that way and see if they get it.
Posted by: E. at October 23, 2006 12:36 AM
As far as an example of Simpson's paradox goes, I took the example (with explanation) from my math book and put it up on a webpage. See http://people.hmdc.harvard.edu/~jgill/simpson/node1.html
Posted by: jgill at October 26, 2006 3:47 PM