| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
« So You Want to Do a Survey? | Main | Are people Bayesian? (and what does that mean?) Part II »
15 March 2006
Jim Greiner
I’m a teaching fellow for a course in missing data this semester, and one topic keeps coming up peripherally in the course, even though we haven’t tackled it head-on just yet. That topic is incompatible conditional distributions. And here’s my question for blog readers: how much does it bother you?
Reduced to its essence, here’s the issue. Supposed I have a dataset with three variables, A, B, and C. There are multiple missing data patterns, and suppose (although it’s not essential to the problem) that I want to use multiple imputation to create six or seven complete analysis datasets. Suppose also that it’s very difficult to conceive of a minimally plausible joint distribution p(A, B, C). Perhaps A is semi-continuous (e.g., income), B is categorical with 5 possible values, and C has support only over the negative integers. What (as I understand it) is often done in this case is to assume conditional distributions, for example, p*(A|B, C), p*(B|A, C), and p*(C|A, B). The idea is that one does a “Gibbs� with these three conditional distributions, as follows. Find starting values for the missing Bs and Cs. Draw missing As from p*(A|B, C). Then draw new Bs from p*(B|A, C) using the newly drawn As and the starting Cs. Continue as though you were doing a real “Gibbs.� Stop after a certain number of iterations and call the result one of your multiply imputed datasets.
The incompatibility problem is that there may be no joint distribution that has conditional distributions p*(A|B, C), p*(B|A,C), and p*(C|A, B). Remember, (proper) joint distributions determine conditional distributions, but conditional distributions do not determine joint distributions, and in some cases, one can actually prove mathematically that no joint distribution has a particular set of conditionals. If you ran your “Gibbs� long enough, eventually your draws would wander off to infinity or become absorbed into a boundary of the parameter space. In other words, your computer would complain; exactly how it would complain depends on how you programmed it.
I confess this incompatibility problem bothers me more than it appears to bother some of my mentors. If the conditional distributions are incompatible, then I KNOW that the "model" I’m fitting could not have generated the data I see. It seems like even highly improbable models are better than impossible ones. On the other hand, I am sympathetic to the idea of doing the best one can, and what else is there to do in (say) large datasets with multiple, complicated missing data patterns and unusual variable types?
How much does incompatibility bother you?
Posted by James Greiner at March 15, 2006 6:00 AM
Haven't the copula people already tangled
with this problem? (Copulae are currently
quite popular in some areas of finance.)
Posted by: Bruce McCullough at March 15, 2006 1:06 PM
Professor McCullough, sorry, but I'm unfamiliar with the copula people? Who are they, and where do they hide out?
Posted by: Jim at March 15, 2006 3:39 PM
I don't know that the copula people have solved
this problem, but I imagine they have encountered
it (though I might be wrong). Try asking
Andrew Patton of LSE, whose dissertation
was entitled:
"Applications of Copula Theory in Financial Econometrics, June 2002."
The URL for his homepage is
http://fmg.lse.ac.uk/~patton/
Posted by: Bruce McCullough at March 15, 2006 6:39 PM