May 2006
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« Detecting Attempted Election Theft | Main | Running Statistics On Multiple Processors »

8 May 2006

Coarsened at Random

Jim Greiner

I’m the “teaching fellow” (the “teaching assistant” everywhere but Harvard, which has to have its lovely little quirks: “Spring” semester beginning in February, anyone?) for a course in missing data this semester, and in a recent lecture, an interesting concept came up: coarsened at random.

Suppose you have a dataset in which you know or suspect that some of your data values are rounded. For example, ages of youngsters might be given to the nearest year or half-year. Or perhaps in a survey, you’ve gotten some respondents’ incomes only within certain ranges. Then the data has been “coarsened” in the sense that you know that the true value is within a certain range, but you don’t know where within that range.

Happily, techniques have been developed to handle this sort of situation. In many ways, the game is the same as that in the missing data setting. Just as in the missing data context good things happen when the data are missing at random, so also in this context good things happened when the data are coarsened at random. Thus, to begin with, you have to consider (among other things) whether you think the probability that you will observe only a range of possible data values, as opposed to the specific true value, depends on something you don’t observe (such as that specific true value). A good place to start on all this is Heitjan & Rubin, “Inference from Coarse Data via Multiple Imputation with Application to Age Heaping,” 85 JASA 410 (1990).

One final point: you might think that coarsened at random is a specific case of missing at random. Actually, it’s the other way around. Data can be (and often is assumed to be) coarsened at random but not missing at random. Think and you’ll see why.

Posted by James Greiner at May 8, 2006 6:00 AM