September 2008
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30        

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« September 18, 2008 | Main | September 24, 2008 »

23 September 2008

A Handy Trick for Multiple Imputation of Categorical Data

As an applied researcher, I've often come across missing data problems where my data are categorical. This can raise issues because most standard multiple imputation packages assume the multivariate normal (MVN) distribution, which may not hold for certain types of categorical and binary data.

The standard shortcut for overcoming this problem is to just impute under the MVN assumption, then use rounding to finish out the imputation. But as Yucel Recai, Yulei He, and Alan Zaslavsky point out in their May 2008 article in The American Statistician, naive rounding can bias estimates, particularly when the underlying data are asymmetric or multimodal.

So what should the applied researcher do when multiply imputing categorical data? The authors propose a method of calibration whereby one duplicates the original data but sets the observed values for the variable of interest to missing in the duplicated data. The original data and the duplicated data are then stacked and imputation is carried out on the stacked dataset. By comparing the fraction of 1's among the originally observed (but imputed) observations in the duplicated data (Y_obs(dup)) with the fraction of 1's in the original observed data (Y_obs), one can find the appropriate cutoff (c) and assign 0's and 1's using that.

This is a neat technique which benefits from the fact that it's very easy to implement in practice. In any case, check out the entire paper for more details on the method.


Posted by John Graves at 9:01 PM

Benjamin Fry and Data Visualization

Please join us tomorrow (Wednesday, 9/24) when we welcome Ben Fry to the applied statistics workshop. Ben's research explores data visualization--more details can be found here -- including details of his recently completed book "Data Visualization" and samples from his previous work .

The workshop will meet at 12 noon in room K-354, CGIS-Knafel (1737 Cambridge St) with a light lunch served. The presentation will begin at 1215 and usually ends around 130 pm. All are welcome--

Posted by Justin Grimmer at 10:39 AM