March 2006
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« March 14, 2006 | Main | March 16, 2006 »

15 March 2006

Incompatibility: Are You Worried?

Jim Greiner

I’m a teaching fellow for a course in missing data this semester, and one topic keeps coming up peripherally in the course, even though we haven’t tackled it head-on just yet. That topic is incompatible conditional distributions. And here’s my question for blog readers: how much does it bother you?

Reduced to its essence, here’s the issue. Supposed I have a dataset with three variables, A, B, and C. There are multiple missing data patterns, and suppose (although it’s not essential to the problem) that I want to use multiple imputation to create six or seven complete analysis datasets. Suppose also that it’s very difficult to conceive of a minimally plausible joint distribution p(A, B, C). Perhaps A is semi-continuous (e.g., income), B is categorical with 5 possible values, and C has support only over the negative integers. What (as I understand it) is often done in this case is to assume conditional distributions, for example, p*(A|B, C), p*(B|A, C), and p*(C|A, B). The idea is that one does a “Gibbs� with these three conditional distributions, as follows. Find starting values for the missing Bs and Cs. Draw missing As from p*(A|B, C). Then draw new Bs from p*(B|A, C) using the newly drawn As and the starting Cs. Continue as though you were doing a real “Gibbs.� Stop after a certain number of iterations and call the result one of your multiply imputed datasets.

The incompatibility problem is that there may be no joint distribution that has conditional distributions p*(A|B, C), p*(B|A,C), and p*(C|A, B). Remember, (proper) joint distributions determine conditional distributions, but conditional distributions do not determine joint distributions, and in some cases, one can actually prove mathematically that no joint distribution has a particular set of conditionals. If you ran your “Gibbs� long enough, eventually your draws would wander off to infinity or become absorbed into a boundary of the parameter space. In other words, your computer would complain; exactly how it would complain depends on how you programmed it.

I confess this incompatibility problem bothers me more than it appears to bother some of my mentors. If the conditional distributions are incompatible, then I KNOW that the "model" I’m fitting could not have generated the data I see. It seems like even highly improbable models are better than impossible ones. On the other hand, I am sympathetic to the idea of doing the best one can, and what else is there to do in (say) large datasets with multiple, complicated missing data patterns and unusual variable types?

How much does incompatibility bother you?

Posted by James Greiner at 6:00 AM