May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Detecting Attempted Election Theft | Main | Running Statistics On Multiple Processors »

8 May 2006

Coarsened at Random

Jim Greiner

I’m the “teaching fellow” (the “teaching assistant” everywhere but Harvard, which has to have its lovely little quirks: “Spring” semester beginning in February, anyone?) for a course in missing data this semester, and in a recent lecture, an interesting concept came up: coarsened at random.

Suppose you have a dataset in which you know or suspect that some of your data values are rounded. For example, ages of youngsters might be given to the nearest year or half-year. Or perhaps in a survey, you’ve gotten some respondents’ incomes only within certain ranges. Then the data has been “coarsened” in the sense that you know that the true value is within a certain range, but you don’t know where within that range.

Happily, techniques have been developed to handle this sort of situation. In many ways, the game is the same as that in the missing data setting. Just as in the missing data context good things happen when the data are missing at random, so also in this context good things happened when the data are coarsened at random. Thus, to begin with, you have to consider (among other things) whether you think the probability that you will observe only a range of possible data values, as opposed to the specific true value, depends on something you don’t observe (such as that specific true value). A good place to start on all this is Heitjan & Rubin, “Inference from Coarse Data via Multiple Imputation with Application to Age Heaping,” 85 JASA 410 (1990).

One final point: you might think that coarsened at random is a specific case of missing at random. Actually, it’s the other way around. Data can be (and often is assumed to be) coarsened at random but not missing at random. Think and you’ll see why.

Posted by James Greiner at May 8, 2006 6:00 AM

Comments

Unless I am missing the point -- and sadly that occurs too often -- rounding is the same thing as adding a uniformly distributed "mismeasurement" term to the variable. Further because with rounding the exact distribution of the mismeasurement term is known, one can correct for the bias in the estimates. For example, in an OLS regression the bias is a reduction in the absolute value of the coefficients (other than the constant, whose variable values are presumably not mismeasured).

I quit reading JASA over 20 years ago so I am not going to try to get the referenced article and see what is being offered. I find it hard to believe that (again for instance) in regression treating the rounding as a missing data problem will improve on treating it as the actual variable plus the mismeasurement term whose distribution is known. But if that is the case, I would be interested in how and why -- maybe enough to travel to an academic library :-).

Posted by: Martin Ringo at May 12, 2006 5:43 PM

If you encounter this sort of problem regularly, it might be worth the trip to the library. I won't try to go through the paper on the blog, but sometimes you need heavier artillery than a mismeasurement term. Suppose, for example, that different people have different coarsening (rounding) patterns. Some people, for example, might round to the nearest half-year, while others might round to the nearest year. If you think this might be going on, you'll need to model the kind of person who's giving you the information before applying a rounding adjustment.

Posted by: Jim at May 12, 2006 6:35 PM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)