May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Why does repeated lying work? | Main | Applied Statistics - Jim Greiner »

30 January 2006

Ecological Inference in the Law, Part III

Jim Greiner

In two previous posts here and here, I discussed the ecological inference problem as it relates to the legal question of racially polarized voting in litigation under Section 2 of the Voting Rights Act. In the latter of these two posts, I suggested that this field needed greater research into the case of R x C, as opposed to 2 x 2, tables.

Here's another suggestion from the courtroom: we need an individual level story.

The fundamental problem of ecological inference is that we do not observe data at the individual level; instead, we observe row and column totals for a set of aggregate units (precincts, in the voting context). This fact has led to some debate about whether a model or a story or an explanation about individual level behavior is necessary to make ecological inferences reliable, or at least as reliable as they can be. On the one hand, Achen & Shively, in their book Cross-Level Inference, have argued that an individual level story is always necessary to assure the coherence of the aggregate model and to assess its implications. On the other hand, Gary King, in his book A Solution to the Ecological Inference Problem, has argued that because we never observe the process by which ecological data are aggregated from individual to group counts, we need not consider individual level processes, so long as the row counts (or percentages) are uncorrelated with model parameters.

From a social science point of view, this question is debatable. From a legal point of view, we need an individual level story, regardless of whether such a story produces better statistical results. When judges and litigators encounter statistical methods in a litigation setting, they need to understand (or, at least, to feel that they understand) something about those methods. They know they will not comprehend everything, or perhaps even most things, and they have no interest in the gritty details. But they will not credit an expert witness who says, in effect, "I ran some numbers. Trust me." What can quantitative expert witnesses offer in an ecological inference setting? The easiest and best thing is some kind of individual level story that leads to the ecological model being used.

Posted by James Greiner at January 30, 2006 6:01 AM

Comments

Jim, I think you're conflating two different types of stories. One is the individual stories of the rational choice modeling type that political scientists like to pose. These involve assumptions about how individual voters make decisions. The other involves how these individual voters get aggregated into precincts or districts. On the first, we know from much theory and empirical evidence that individuals base their decisions on candidates, issues (or ideology), party, and a few other factors; as one piece of evidence, with this knowledge we do a very good job of predicting how people will vote. On the second, little reliable or predictive knowledge exists. Precincts get created out of administrative convenience by whoever happens to be in charge, rather than rationally calculating politicians pursuing some known goal we can use to develop good models. (For those few situations where we have no precinct-level data, there's a better case to be made that districts are created by some precictable process, but anyone who's ever been involved in redistricting will know the enormous number of idiosyncratic issues involved even at that level.)

It may make lawyers and judges feel better to have a good "individual level story, regardless of whether such a story produces better statistical results," but it shouldn't make those of us who live by their decisions feel good. Since even if the story that gives them a warm feeling inside predicted all individual votes perfectly, the lack of an accurate theory of how precincts are formed means that the inferences about individuals from aggregate data will still rely on assumptions not known to be true. Giving them an individual level story, which is of no use in this instance, is exactly the same as saying "I ran some numbers. Trust me." except that in addition there's a smoke screen there to confuse everyone. I know it is difficult to figure out how to convey complicated information to people without the background, but its probably better to figure out how to explain the key assumptions (which in this case are about aggregation processes and not individuals) so the judges and litigators will understand.

Posted by: Gary King [TypeKey Profile Page] at January 30, 2006 2:59 PM

Gary, I take your point about it being possible to divide the data-generating process into two phases. I wonder, however, whether a story that finesses one or the other, or combines the two in some sense, might still be useful. Suppose, for example, I want to articulate a model in which, within each precinct (and thus each precinct contingency table), the distribution of the unobserved cell counts for each row (race) is a multinomial with a probaility vector characteristic of that racial group in that precinct. I can tell the following individual-level story: For a particular election, take the precinct boundaries as fixed and given. Within each precinct, each member of racial group A has the same probability of voting for candidate 1, voting for candidate 2, etc., and of not voting. Member of racial group B also have their own (single) set of probabilities, which (again) depend only on that member's (i) precinct, and (ii) racial group. Every voter makes independent decisions.

This story leads to the desired row-level model because the sum of n multinomials of size one with the same probability vector is a multinomial of size n with that probability vector. (For more on this model, by the way, attend my presentation to the Applied Statistics Workshop the day after toomrrow). We know that the independence assumption is probably wrong, but using it to tell the story helps communicate the hazards of working with ecological data. I have explained this much to lawyers lacking formal quantitative training before, and they have understood it.

If you don't buy this, it might help if you share with readers (including me) how it is that you explain the King's EI method to judicial audiences. In legal briefs I've helped write, lawyers have described the method as a combination of ecological regression and the method of bounds. I am not comfortable with this characterization, as it surpresses the distributional (truncated bivariate normal) and lack-of-aggregation-bias assumptions. How have you explained your method? (Given the length of our two comments, perhaps if you have time to answer, make it an independent post?).

As always, thanks very much the instructive comment.

Posted by: Jim Greiner at January 30, 2006 4:02 PM

Let's leave how EI is presented in court for a different post, and focus here on your model, as you describe it in your post.

Most aspects of this model do not apply in reality and so they would not be of use in court or elsewhere so far as I can see. For one example, you assume that the district boundries are fixed and then people decide how to vote. The problem of redistricting is precisely the reverse: voters first have certain partisan propensities (which you want to build a model to explain) and redistricters use these to draw district or precinct boundaries. Even if the individual-level model predicts voter preferences perfectly, the districting can induce aggregation bias that violates the assumptions of any ecological inference model. (Its true that the boundaries will then have subsequent effects on voter preferences, but this is a third-order effect and normally assumed away.) That would seem to make the individual-level model superfluous. I don't see how we benefit from a model of individuals in this type of model.

Posted by: Gary King [TypeKey Profile Page] at February 4, 2006 11:06 PM

Gary, thanks again for your comment. I'm not sure I made my individual-level story clear in my previous response. The story depends on the assertion that PRECINCT boundaries are fixed, not DISTRICT boundaries. I think we can all agree voter propensities may preceed the drawing of district lines; that is the essence of gerrymandering. But, do "redistricters use [partisan propensities of voters] to draw . . . precinct boundaries", as you say above? From your 1997 book, page 57: "Fortunately, the equivalent of precincts in most countries are not often the subject of intentional gerrymandering and are smaller." Same page, footnote 11: "I know of only a few attempts to gerrymander precinct boundaries (all told to me in confidence)."

As always, many thanks for the comment.

Posted by: Jim Greiner at February 5, 2006 10:11 PM

I understand your point, but it doesn't change the fact that if you have a model of individuals alone and then aggregate via areal units of any kind, created in any way, intentional or unintentional, before or after the voter preferences are determined, you may be left with aggregation bias. If you don't know anything about how the precinct lines are drawn, then you can't reasonably assume that they are somehow drawn in a manner unrelated to the variables of interest, although that might happen by chance of course.

The key info needed to make accurate ecological inferences is the process by which the individuals get aggregated into groups. If you don't have that information, and as you point out via quotes from my book this is hard to come by, then you're left making unverified assumptions about that process. Your independence assumption (if I understand what you're doing) is an example of such an unverified assumption. The bounds can help considerably in this situation of course, but whatever uncertainty is left is still an assumption.

Adding the assumptions of your model to a method of ecological inference will not obviously make it more accurate, unless the assumptions were known to be correct. However, your model does offer one way to convey how the process of aggregation into precincts can bias inferences.

Posted by: Gary King [TypeKey Profile Page] at February 5, 2006 11:28 PM

Gary, as always, this has been helpful. Here's my final thought; you can have the last word if you'd like.

My intention in suggesting that we need an individual-level story was not to imply that such a story could obviate the necessity for unverifiable assumptions in the ecological inference context. In fact, the quantitative community largely agrees that unverifiable assumptions are an unfortunate part of ecological inference (Wakefield, 2004; Gelman et al., 2001). My intent, rather, was to suggest that an individual-level story can be a valuable tool for (a) communicating the model to an audience, quantitatiavely untrained or quantitatively sophisticated, and (b) articulating the unverifiable assumptions as clearly as possible so that potential users can assess them. I continue to believe that both are important goals.

Finally, I agree that aggregation bias, difficult if not impossible to observe, can cause real problems for ecological inference. In my view, our only protections against it are (a) modeling it correctly, which is hard to do (and itself impossible to verify), and (b) making sure that our predictions always obey the bounds, even if the precinct-level tables are larger than 2 x 2. The latter is a focus of my current research.

Thanks again for the exchange.


Posted by: Jim Greiner at February 6, 2006 9:29 PM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)