May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Applied Statistics - Janet Rosenbaum | Main | Thoughts on SUTVA (Part I) »

27 February 2006

Resources for Multiple Imputation

Jens Hainmueller

As applied researchers, we all know this situation all too well. Like the alcoholic standing in front of the bar that is just about to open, you just downloaded (or somehow compiled) a new dataset. You open your preferred statistical software and begin to investigate the data. And there again you are struck by lightening: Holly cow - I have missing data!! So what do you do about it? Listwise deletion as usual? In the back of your mind you recall your stats teacher saying that listwise deletion is unlikley to result in valid estimates but hitherto you have simply ignored these caveats. Don't be a fool, you can do better -- use multiple imputation (MI).

As is well known in the statistcial literature on the missing data problem, MI is not the silver bullet for dealing with missing values. In some cases, better (primarily more efficent) estimates can be obtained using weighted estimation procedures or specialized numerical methods (EM, etc.) Yet, these methods are often complicated and problem specific and thus not for the faint of heart applied researcher. MI in contrast is relatively easy to implement and works well in most instances. Want to know how to MI? I suggest you take a look at www.multiple-imputation.com, a website that brings together various ressources regaring the method, software, and literature citations that will help you to add MI to your toolkit. A nice (non-technical) introduction is also provided on Joseph Schafer's multiple imputation FAQ page. Gary and co-authors have also written extensivley on this subject offering lots of practical advice for applied rearchers. Last but not least, I recommend searching for "multiple imputation" on Andrew Gelman's blog; you will find many of interesting entries on the topic. Good luck!

Posted by Jens Hainmueller at February 27, 2006 6:00 AM

Comments

I am using the EM option available in SPSS 13.0 to impute some user-missing data ('I don't know's). EM, however gives some values that are either negative or too high. This is a problem given my survey questions allow only for values ranging between 0 and 6.

My initial guess is that EM is simply an algorithm, and therefore negative values are just as 'legitimate' as any other numbers.

SPSS, as far as I can see, does not allow for placing a limit in the computation (i.e. asking it to produce values only within the 0 to 6 range).

Does anyone know a way around? Especially one that is available in SPSS, since learning another statistical application package is a bit of a luxury I can't afford right now.

ES

Posted by: Emmanuel at March 12, 2006 2:34 PM

Hi Emmanuel,

I don't know much about SPSS, but your problem sounds somewhat different than the ususal missing data problem for which MI works. In your case you know that people reponded "I don't know" so it's not that these datapoints are missing.

Best,
Jens

Posted by: Jens at March 13, 2006 11:48 AM

Thanks Jens. I see your point, but the 'I don't know' answers are treated as (user)missing values in SPSS. Furthermore the respondents who answered "I don't know" differ, in respect to certain dependent and independent variables' means. Doesn't this imply the missing data are MAR (at best) or non-ignorable? Isn't then imputation 'necessary'?

ES

Posted by: Emmanuel Sigalas at March 14, 2006 9:32 AM

SPSS has a special software available (unfortunately for additional purchase)called SPSS Categories that allows for customized scaling and replacement. In this software imputation for "I don't know" wouldn't be necessary.

There's also a nice website at Ohio State that deals with multiple imputation of categories

Posted by: kca at April 1, 2006 9:14 PM

SPSS has a special software called SPSS Categories allows customized scaling and replacement. In this software imputation it would be necessary here. It is very important here

Posted by: elizakrzd2dnyp [TypeKey Profile Page] at April 29, 2008 7:34 AM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)