August 2008
Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.12


« Applied Statistics – Ben Hansen | Main | Mind the Coding »

26 September 2006

Publication bias, really?!?

Amy Perfors

I'm a little late into the game with this, but it's interesting enough that I'll post anyway. Several folks have commented on this paper by Gerber and Malhotra (which they linked to) about publication bias in political science. G&M looked at how many articles were published with significant (p<0.05) vs. non-significant results, and found -- not surprisingly -- that there were more papers with significant results than would be predicted by chance; and, secondly, that many of the significant results were suspiciously close to 0.05.

I guess this is indeed "publication bias" in the sense of "there is something causing articles with different statistical significance to be published differentially." But I just can't see this as something to be worried about. Why?

Well, first of all, there's plenty of good reason to be wary of publishing null results. I can't speak for political science, but in psychology, a result can be non-significant for many many more boring reasons than that there is genuinely no effect. (And I can't imagine why this would be different in poli sci). For instance, suppose you want to prove that there is no relation between 12-month-olds' abilities in task A and task B. It's not sufficient to show a null result. Maybe your sample size wasn't large enough. Maybe you're not actually succeeding in measuring their abilities in either or both of the tasks (this is notoriously difficult with babies, but it's no picnic with adults either). Maybe A and B are related, but the relation is mediated by some other factor that you happen to have controlled for. etcetera. Now, this is not to say that no null results are meaningful or that null results should never be published, but a researcher -- quite rightly -- needs to do a lot more work to make it pass the smell test. And so it's a good thing, not a bad thing, that there are fewer null results published.

Secondly, I'm not even worried about the large number of studies that are just over significance. Maybe I'm young and naive, but I think it's probably less an indication of fudging data than a reflection of (quite reasonable) resource allocation. Take those same 12-month-old babies. If I get significant results with N=12, then I'm not going to run more babies in order to get more significant results. Since, rightly or wrongly, the gold standard is the p<0.05 value (which is another debate entirely), it makes little sense to waste time and other resources running superfluous subjects. Similarly, if I've run, say, 16 babies and my result is almost p<0.05, I'm not going to stop; I'll run 4 more. Obviously there is an upper limit on the number of subjects, but -- given the essential arbitrariness of the 0.05 value -- I can't see this as a bad thing either.


Posted by Amy Perfors at September 26, 2006 11:00 PM

Comments

I am in fact deeply troubled (though not much surprised) by the Gerber and Malhotra report. I grant that your benign explanations all could be true. But I just don't believe that they account for much. Short of flat out ying about their z-scores ("hey, what's 1.94 against 1.98?") I rather doubt that many political scientists - or any social scientists - are going out to collect more data, it's just not the thing we can do most of the time. Much more likely, they are practicing self-deception via ill-applied regression diagnostics: hunting down outliers that influence the results whether or not these cases actually belong in the analysis; eliminating small subsets of cases that make a difference because no reviewer will know; omitting one or two control variables, trimming a year from a long panel; messing with missingness; using some seemingly nifty but deep down inapporpriate way of "adjusting" standard errors for imagined offenses etc.

Posted by: Felix Elwert at September 27, 2006 12:51 PM

In general, I think that the publication bias problem is far more significant (if you'll pardon the expression) in largely observational sciences than it is in the experimental context that you are describing. Leaving aside the question of the appropriateness of hypothesis testing in general (something that I suspect we will hear more about on this blog in the future), in the experimental sciences there is the possiblity of true replication (as in generating new data, not just successfully reanalyzing existing data) to weed out the cases where the experimenter got lucky. We usually don't have the same good fortune in observational sciences - we only get to run the world once, so it becomes difficult to sort out the "lucky" results from the rest.

Posted by: Mike Kellermann at September 27, 2006 3:03 PM

Hmm, I can definitely see what you mean: if one doesn't (by and large) run experiments, then my possible explanation for why there are so many studies just over significance doesn't apply. The null result explanation, though, still seems valid to me (though it might occur in the context of other, more undesirable, reasons for having fewer null results).

I guess the authors argue that even in the case where you've run an experiment and then added a few more N in order to "push it over" to p>0.05, this is unfortunate because -- in combination with the "file drawer" problem -- it makes doing meta-analyses problematic (you aren't truly "randomly sampling" from results space). That's true, but I think meta-analyses are often problematic for many other reasons also. :)

Posted by: Amy at September 27, 2006 4:55 PM

On reported statistical tests being too
close to significance due to rounding,
see the paper by Diaconis and Freedman,
On Rounding Percentages,JASA, vol. 74, 359-364

Posted by: Bruce McCullough at October 4, 2006 10:58 AM