May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Gosnell Prize nominations | Main | Statistics and the law »

12 May 2007

Does Multiple Imputation Make Up Data?

This is a common question, commonly misunderstood. It certainly does seem like MI makes up data, since if you look at the 5 or so imputed data sets, the missing values are indeed filled in. But in fact, the point of MI has nothing to do with making up data, and everything to do merely with putting the data in a more convenient format.

The fact is that the vast majority of our statistical techniques require rectangular data sets, and so data that look like swiss cheese make it really hard to do anything sensible with directly. Listwise deletion, where you excise horizontal slices out of the cheese wherever you see holes, discards a lot of cheese! What MI does instead is to fill in the holes in the data using all available information from the rest of the data set (thus moving some information around) and adding uncertainty to these imputations in the form of variation in the values across the different imputed data sets (thus taking back assertions of knowledge from the imputations when it is not predictable from the rest of the data and from duplication of the same information in different places in the data). If done properly, MI merely puts the data in a convenient rectangular format and enables the user (with some simple combining rules) to apply statsitical techniques to data acting as if it were fully observed. MI standard errors then are not too small, which would be the case if data were being made up.

The particular models for imputation can be used incorrectly or inappropriately (and so should be used with priors when additional information is available; see e.g., "What to do About Missing Values in Time Series Cross-Section Data"), but proper usage of MI makes up no information other than that genuinely available.

Posted by Gary King at May 12, 2007 4:01 PM

Comments

Interesting thoughts and ideas.

Posted by: anonymous at May 13, 2007 10:48 PM

Nice article! in my opinion a synthetic data can provide more disclosure protection.

Posted by: bob at May 18, 2007 6:04 AM

You've provided a very clear basic explanation of MI. Swiss cheese is a great metaphor for missing data!

Posted by: Abbie at May 21, 2007 2:18 PM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)