| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
| 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| 21 | 22 | 23 | 24 | 25 | 26 | 27 |
| 28 | 29 | 30 |
« September 18, 2008 | Main | September 24, 2008 »
23 September 2008
As an applied researcher, I've often come across missing data problems where my data are categorical. This can raise issues because most standard multiple imputation packages assume the multivariate normal (MVN) distribution, which may not hold for certain types of categorical and binary data.
The standard shortcut for overcoming this problem is to just impute under the MVN assumption, then use rounding to finish out the imputation. But as Yucel Recai, Yulei He, and Alan Zaslavsky point out in their May 2008 article in The American Statistician, naive rounding can bias estimates, particularly when the underlying data are asymmetric or multimodal.
So what should the applied researcher do when multiply imputing categorical data? The authors propose a method of calibration whereby one duplicates the original data but sets the observed values for the variable of interest to missing in the duplicated data. The original data and the duplicated data are then stacked and imputation is carried out on the stacked dataset. By comparing the fraction of 1's among the originally observed (but imputed) observations in the duplicated data (Y_obs(dup)) with the fraction of 1's in the original observed data (Y_obs), one can find the appropriate cutoff (c) and assign 0's and 1's using that.
This is a neat technique which benefits from the fact that it's very easy to implement in practice. In any case, check out the entire paper for more details on the method.
Posted by John Graves at 9:01 PM
Please join us tomorrow (Wednesday, 9/24) when we welcome Ben Fry to the applied statistics workshop. Ben's research explores data visualization--more details can be found here -- including details of his recently completed book "Data Visualization" and samples from his previous work .
The workshop will meet at 12 noon in room K-354, CGIS-Knafel (1737 Cambridge St) with a light lunch served. The presentation will begin at 1215 and usually ends around 130 pm. All are welcome--
Posted by Justin Grimmer at 10:39 AM