October 2005
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« October 26, 2005 | Main | October 28, 2005 »

27 October 2005

Don't Use Hypothesis Tests for Balance

Gary King

Jens' last two blog posts constitute an excellent statement of where the literature on matching is, but I think almost all of the literature has this point wrong. Hypothesis tests for checking balance in matching are in fact (1) unhelpful at best and (2) usually harmful.

Suppose you had a control group and a treatment group that are identical (exactly matched) except for one person, or except for a bunch of people in one very minor way. Suppose hypothesis tests indicate no difference between the groups, and so you'd be in the situation of reporting balance was great and no further adjustment was needed. (We might think of this as a real experiment where the outcome variable hasn't been collected but is expensive to do so.) If you were given the chance of dropping the one or few people that caused the two groups to differ and replacing them with others that exactly matched, would you do so? Since the dimension on which the inexact match or matches occurred might be the one that has a huge effect on your outcome variable, the bias due to not switching could be huge. So you'd undoubtedly make the switch, despite the fact that the hypothesis test indicated that there was no problem. Hence (1) the tests are unhelpful: passing the test does not necessarily protect one from bias more than failing the test.

Now suppose you have data that don't match very well by all hypothesis tests and you randomly (rather than systematically to improve matching) drop observations, in a bad application of matching. what will happen? Your t-tests or ks-tests or any other hypothesis tests will lose power and so will indicate that balance is getting better and better. Yet, bias is not changing at all, and efficency is dropping fast. The tests are telling you to discard data! Hence (2) hypothesis tests to evaluate balance are harmful, quite seriously so.

The fact is that there is no superpopulation to which we need to infer features of the explanatory variables; all analysis models we regularly use after matching are conditional on X. Balance should be assessed on the observed data, and not be the subject of inference or hypothesis tests.

This message rehearses an argument in a to-be-revised version of our matching paper by Ho, Imai, King, and Stuart that we hope to be finished with and post in a couple of weeks.

Posted by Gary King at 4:40 AM