November 2009
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


May 2, 2007

Is There a Statistics/Economics Divide?

OK, so now that I have a job, I feel like I can stick my foot in something smelly to see what happens. When I was on the market this past year, I was often asked about the difference (lawyers are always careful to ask about "the difference, if any") between a degree in statistics and a degree in something more "traditional" for a law scholar, such as economics or political science or sociology. Because of the prevelance and power of the Law & Economics movement in legal scholarship, there was particular interest in the difference between statistics and economics/econometrics. I had a certain amount of trouble answering the question. It was easy to point out that the best quantitative empiricists move within all fields and are able to read all literatures. As an aspiring statistician, it was also easy to give the statistical version of things, which is that statisticians invent data analysis techniques and methods that, after ten to twenty-five to forty years, filter into or are reinvented by other fields (whenever I said this, I clarified that this story was a caricature).

So what is the difference between an empirical, data-centered economist and an applied statistician? The stereotypes I've internalized from hanging out in an East Coast statistics department are that economists tend to focus more on parameter estimation, asymptotics, unbiasedness, and paper-and-pencil solutions to problems (which can then be implemented via canned software like STATA), whereas applied statisticians are leaning more towards imputation and predictive inference, Bayesian thinking, and computational solutions to problems (which require programming in packages such as R). Anyone care to disabuse me of these notions?

Posted by James Greiner at 12:07 PM

April 18, 2007

Appellate Cases and SUTVA Violations

Around a month ago, I blogged about the dangers of using appellate case outcomes as datapoints. The basic idea is that most models or inference structures assume some kind of independence among the units, perhaps independence given covariates (in which case the residuals are assumed to be i.i.d.), or perhaps the "Stable Unit Treatment Value Assumption" in the causal inference context. When applied to appellate cases in the United States legal system, these analyses assume away precedent. The instincts I developed as a practicing litigator tell me not to believe a study that assumes away precedent.

One solution to this problem previously proposed in the causal inference literature is to match "treated" and "control" appellate cases that are very close in time to each other (whatever "treated" and "control" are here). After a conversation I had with Mike Kellermann a week or so ago, I think this cure may be worse than the disease. The idea behind comparing cases very close in time to one another is that the general state of the law (in part defined by precedent) for the two cases will be similar. That's right, but recent developments in the law are more on the minds of judges.

Suppose Case A got treatment, and Case B got control. If the matching algorithm has worked, Case A and Case B will be similar in all ways except the treatment. If Case A and Case B are also close in time to one another, how plausible is it the judges who decide both will decide them without regard to each other?

Posted by James Greiner at 4:48 PM

April 4, 2007

Trial-Level Criminal Outcomes

With a coauthor, I am involved in a project which in part attempts to assess the effect of assigning judge A versus judge B to outcomes at the trial level in criminal cases. I've begun a literature search on this, and it seems like most attention thus far has focused on the sentencing stage (particularly relating to the controversy over the federal sentencing guidelines), and that few authors have used what one might call modern or cutting edge causal inference thinking. Can anyone out there help here? I'm I missing important studies?

(Feel free to email me off-blog if you'd prefer.)

Posted by James Greiner at 3:24 PM

March 20, 2007

Judicial Decisions as Data Points

Empirical, particularly quantitative empirical, scholarship is all the rage these days in law schools. (By the way, as a quantitative legal empiricist,that makes me really nervous. If there's one constant in legal academia, it's that things go in and out of style as fast in law schools as they do in Milan fashion shows.)

One thing that has been bothering me lately about this next phase, new wave, dance craze aspect of legal scholarship is the use of appellate cases as datapoints. It's tempting to think that one can code appellate decisions or judicial opinions pursuant to some neutral criteria, then look for trends, tease out inferences of causation, etc. Here's a note of caution: they're not i.i.d. They're probably not i.i.d. given X (whatever X is). Precedent matters. In our legal system, the fact that a previous appellate case (with a published opinion) was decided a certain way is a reason to decide a subsequent, facially similar appellate case the same way, even if the first decision might have been (arguably) wrong. Folks will argue over how much precedent matters; all I can tell say is that as a law clerk to an appellate judge, I participated in numerous conversations that resulted in the sentiment, "I might/would have decided the present case differently had Smith v. Jones not been on the books, but I see no grounds for departing from the reasoning of Smith v. Jones here." I.i.d. models, or analyses that assume non-interference among units, should be viewed with great caution in this setting.

Posted by James Greiner at 4:40 PM

March 7, 2007

More on Cheating

In my last post, I solicited comments on ways to cheat when using a design-before-analysis framework for analyzing observational studies. My claim was that if one does the hard work of distinguishing intermediate outcomes from covariates (followed usually by discarding the former) and of balancing the covariates (often done by discarding non-comparable observations) without access to the outcome variable, it should be hard(er) to cheat. Felix suggested one way that should work but that should also be fairly easy to spot: temporarily substitute in a "good" (meaning highly predictive of the outcome variable) covariate as the outcome and find a design that achieves the desired result, then use this design with the "real" outcome. In a comment, Mike suggested another way: do honest observational studies, but don't tell anyone about those that don't come to desired results.

Here's my thought: in many observational settings, we have a strong prior that there is either an effect in a particular direction or no effect at all. In an anti-discrimination lawsuit, for example, the issue is whether the plaintiff class is suffering from discrimination. There is usually little chance (or worry) that the plaintiff class is in fact benefiting from discrimination. Thus, the key issue is whether the estimated causal effect is statistically (and practically/legally) significant. With that in mind, it seems like a researcher might be able to manipulate the distance metric essential to any balancing process. When balancing, we have to define (a) a usually one-dimensional distance metric to decide how close observations are to one another, and (b) a cutoff point beyond which we say observations are too far from one another to risk inference, in which case we discard the offending observations. If one side of a debate (e.g., the defendant) has an interest in results that are not statistically significant, that side can insist on distance metrics and cutofff points that result in discarding (as too far away from their peers) a great many observations. A smaller number of observations generally means less precision and a lower likelihood of a significant result. The other side can, of course, do the opposite.

I still think we're way better off in this world than in the model-snooping of regression. What do people think?

Posted by James Greiner at 4:53 PM

February 21, 2007

How do I cheat with potential outcomes?

As some folks know, I'm on the legal academic job market this year. My job talk paper is on the application of the potential outcomes framework for causation to legal matters, particularly anti-discrimination issues that arise in litigation. As I've presented the framework, I've highlighted one of its advantages as being the fact that much of the hard work of separating covariates from intermediate outcomes and balancing covariates can (and should) be done without access to the outcome variable. The idea is that without access to the outcome variable, it is harder for a researcher (or, God forbid, an expert witness) to model-snoop, i.e., to fit model after model until finding one that "proves" a pet theory.

In a few schools, reaction to the claim of increased objectivity has been chilly. Skeptics have said, in essense, "I don't know enough about statistics to argue with you, but I'm REALLY SURE that your method is just as manipulable as, say, regression, even if you don't have access to the outcome variable when you do the hard work." The skeptics have then asked me to tell them how to manipulate the potential outcomes framework (i.e., to tell them why they are right and I am wrong), assuming no access to the outcome variable.

Any ideas on this? I'm able to think of one way it can be done (although the results of "my" way would not be nearly as bad as those from model-snooping), but I'd prefer not to stifle any comments folks might have by putting forth my own thoughts.

Posted by James Greiner at 3:33 PM

February 7, 2007

Timing Is Everything

Jim Greiner

Per previous blog posts, I'm giving today's presentation at CGIS on causal inference and immutable characteristics. I've previewed some of the ideas from this research in blog posts. Basically, the idea is that if we shift our thinking from "actual" immutable characteristics (e.g., race), a concept I find poorly defined in some situations, to perceived immutable characteristics, then the potential outcomes framework of causation can sometimes be usefully applied to things like race, gender, and ethnicity.

A key point here is the timing of treatment assignment. If treatment is conceptualized in terms of perceptions, then a natural point at which to consider treatment applied is the moment the decision maker whose conduct is being studied first perceives a unit's race, gender, ethnicity, whatever. This works well only if we're willing to exonerate the decision maker from responsibility for whatever happened before that moment of first perception. In the law, sometimes we're willing to do so. Sometimes, we're not.

Take the employment discrimination context. Typically, we don't hold an employer responsible for the discrimination of someone else, particular when it occurred (say) prior to a job application, even if that prior discrimination means that some groups (e.g., minorities) have less attractive covariates (e.g., educational achievement levels) than others (e.g., whites). Perhaps potential outcomes could work here; a study of the employer's hiring can safely condition on educational achievement levels (i.e., take them as given, balance on them, etc.) and other covariates. More covariates means that the ignorability assumption required for most causal inference is more plausible.

Contrast the employment discrimination setting to certain standards applying to education institutions. For example, we may not want to allow a university to justify allocating fewer resources to female sports teams on the grounds that its female students show less interest in sports (even if we believed the university to be telling the truth). Here, we might consider that the preferences of the female students were probably shaped by prior stereotyping, and we might want to force the university to take steps to combat those stereotypes and change the female students' preferences. If so, we are unwilling to take the previous social pressure as "given," so we cannot balance on it. The result is fewer covariates and greater pressure on the ignorability assumption.

My thanks to Professor Roderick Hills of NYU law school, whose insightful question during a job talk I recently gave there helped solidify the above Title IX example.

Posted by James Greiner at 4:00 PM

January 24, 2007

The Goal of Causal Inference

Jim Greiner

I’ll be giving the talk at the Gov 3009 seminar in early February, and I’ll be presenting a paper I’m writing with Don Rubin on applying the potential outcomes framework of causation to what lawyers call “immutable characteristics” (race, gender, and national origin, for example). I’ll be previewing some of the idea from this paper on the blog.

One key point from this paper is the recognition that in law (specifically, in an anti-discrimination setting), the goal of causal inference may be different from that in a more traditional social science setting. A sociologist, for example, might study the effect of tax breaks for married couples on marriage rates; the obvious goal of the study is to see whether a contemplated intervention (tax breaks) has a desired effect. An economist might evaluate a job training program for a similar reason. In anti-discrimination law, however, we study the effect of units’ perceived races (or genders or whatever) on some outcome (e.g., hiring or promotion), but we have no interest in intervening to change these perceptions. Rather, we’re contemplating action that would mitigate the effects we find. The “intervention” we’re considering might be compensating the victim of discrimination, as is true in an employment discrimination suit. Or it might be ceasing a certain type of government action, such as the death penalty. But we’re not interesting in implementing a policy promoting or effectuating the treatment that we’re studying.

Posted by James Greiner at 1:14 PM

November 30, 2006

Remembering the Baldus Study, Part II

Jim Greiner

In a previous post, I summarized the Baldus Study of the role of race in Georgia’s system of capital sentencing during the 1970s. To review, the Study concluded that the race of the victim, but not that of the defendant, was an important factor in determining whether a capital defendant received the death penalty. The Study was a pioneering effort to apply what were then cutting-edge statistical techniques (logistic regression) to questions of race discrimination, and it came within a single justice of rendering Georgia’s capital sentencing system constitutionally invalid.

As part of my dissertation research, which focuses on applying a potential outcomes understanding of causation to perceptions of immutable characteristics, I am reexamining the Baldus Study data. With the benefit of 25+ years of hindsight, I have reluctantly concluded that the Study’s findings are questionable (which is different from wrong). The Study authors collected no data on cases resulting in acquittals or convictions of crimes of severity less than voluntary manslaughter, or indeed on cases that were initially charged as murders but in which charges were reduced prior to trial. The sampling scheme, a complicated one involving some stratification on the outcome variable (imposition of the death penalty), renders calculation of standard errors difficult, and the method the Study authors used to address this problem depends on asymptotics despite (in some cases) a small number of units.

Which leads me to my big questions. Assume for the moment that I’m right about the idea that modern thinking suggests that the conclusions of the Baldus Study are questionable. What if the Supreme Court had accepted the Study and struck down Georgia’s capital sentencing system? Would we now think such a decision was based on questionable science? Or should courts accept the best statistical evidence available at the time, even if later researchers believe it questionable, because there are also costs to inaction? (After all, at most, I can “prove” is that Baldus et al. did not prove their case, not that their conclusions were wrong.) And what will happen to my own “conclusions” in 25 years?

Posted by James Greiner at 1:53 PM

November 29, 2006

Remembering the Baldus Study, Part I

Jim Greiner

One of my current research interests is the application of a potential outcomes framework of causation to perceptions of what lawyers call “immutable characteristics" like race, gender, or national origin. In that vein, I’d like to pay tribute to one of the early greats in the area of quantitative analysis of race in the legal setting: the so-called “Baldus Study” of the role of race in imposition of the death penalty in Georgia. The Study authors, David Baldus, George C. Woodworth, and Charles A. Pulaski, Jr,, gathered data on over 1000 Georgia homicides from 1973-1979. Although the Study attempted to tackle a variety of questions, the most publicized was whether recent reforms to Georgia’s sentencing process (enacted in response to the Supreme Court’s decision in Furman v. Georgia) had succeeded in removing the relevance of race in the state’s capital sentencing system. The Study’s primary conclusion on this point was that the race of the victim, but not the race of the defendant, played a significant role in deciding whether death was imposed.

The Study was highly publicized, and it led to its own Supreme Court case. In McCleskey v. Kemp, four justices thought that the conclusions of the Baldus Study were sufficient to render Georgia’s capital sentencing system unconstitutional. Five justices disagreed; they thought that the capital defendant in the case had to show that race had played a role in HIS trial, not that race generally played a role in the set capital trials.

More on the Baldus Study in my next post.

Posted by James Greiner at 1:53 PM

November 15, 2006

Gender as a Personal Choice

Jim Greiner

Greetings from the job market for legal academics, which combines the worst aspects of the job markets of all other fields. Apologies for being slow to bring this up, but an article in last week’s New York Times (Tuesday, November 7, 2006, page A1, by Damien Cave) is worth a look. The subject area is recording gender in New York City records. The City’s Board of Health is considering a proposal to allow persons born in the City to change the sex as documented on their birth certificates upon providing certain documentation (e.g., affidavits from doctors and mental health professionals) asserting that the proposed gender change would be permanent. Previously, the City required more physical manifestations of a sex change before it would change its records.

Question: are we moving toward a world in which sex, like race, becomes a personal choice, at least as recorded in official records? Note that in the race context, the law can’t seem to make up its mind on this. The Census Bureau records self-reports only, and many modern social scientists consider race a social construct only, with no relevant biological component. But some existing statutes still define race in terms of biology (e.g., 18 U.S.C. § 1093(6)).

Second question: suppose we are moving toward such a world; what will it do to our efforts to enforce anti-discrimination laws?

Posted by James Greiner at 1:51 PM

November 1, 2006

An Individual-Level Story and Ecological Inference

Jim Greiner

I blogged some last year (see here) on whether an individual-level story is necessary, or useful, to ecological inference. For a review of what ecological inference is, and what I mean by an individual-level story, see the end of this entry. Last year, I stated that such a story was helpful in explaining an ecological inference technique, even if it might not be strictly necessary for modeling. Gary disagreed that such a story was at all helpful, and we had a little debate on the subject, which you can access here. Lately, though, I’ve been thinking that an individual-level story really is necessary for good modeling, not just for communication of a model. In particular, it seems like an individual-level model is required to incorporate survey information into an ecological inference model. Survey data is, after all, data collected at the level of the individual, and with only an aggregate-level model, it’s hard to see how one could incorporate it. Any thoughts from anyone out there?

To review: ecological inference is the effort to predict the values of the internal cells of contingency tables (usually assumed to be exchangeable) when only the margins are observed. A classic example is in voting, where one observes how many (say) black, white, and Hispanic potential voters there are in each precinct, and one also observes how many votes were cast for Democratic and Republican candidates. What one wants to know if, say, how many blacks voted Democrat. By an individual-level story, I mean a model of voting behavior at the level of the individual voter and a mathematical theory of how to aggregate up to the precinct-level counts.

Posted by James Greiner at 12:00 PM

October 25, 2006

Unconscious Bias & Expert Witnesses

Jim Greiner

Quantitative expert witnesses are essential to modern litigation. But why do they disagree so often?

An excerpt from an article by Professor Franklin Fisher appears below. It’s a tad long, but it’s really worth reading. Does it ring a familiar bell with anyone out there?

“It is not, however, always easy to avoid becoming a ‘hired gun’ . . . The danger is sometimes a subtle one, stemming from a growing involvement in the case and friendship with the attorneys. For the serious professional, concerned about preserving his or her standards, the problem is not that one is always being asked to step across a well-defined line by unscrupulous lawyers. Rather, it is that one becomes caught up in the adversary proceeding itself and acquires the desire to win. . . . Particularly because lawyers play by rules that go beyond those of academic fair play, it becomes insidiously easy to see only the apparent unfairness of the other side while overlooking that of one’s own.”

Franklin M. Fisher, Statisticians, Econometricians, and Adversary Proceedings, 81 J. AM. STAT. ASS’N. 277, 285 (1986)

Posted by James Greiner at 12:00 PM

October 19, 2006

Simpson’s Paradox

Jim Greiner

As a lawyer, I have to be interested not just in what quantitative principles are true, but also in how to present “truth” to people without quantitative training. To that end, HELP! One of the maddening things about statistics is Simpson’s paradox. The quantitative concept, undoubtedly well-known to most readers of this blog, is that the correlation between two variables can change sign and magnitude, depending on what is conditioned on. That is, Corr(A, B | C) might be positive, while Corr(A, B | C, D) might be negative, while Corr (A, B | C, D, E) might be positive again. At bottom, this is what’s going on when regression coefficients become (or cease to be) significant as one adds additional variables to the right-hand side. Because regression currently enjoys a stranglehold on expert witness analyses in court cases (I’ll be ranting on that in the future), communicating Simpson's Paradox a matter of real concern for someone like me who cares about what juries see, hear, and think. Any ideas on how to get this concept across?

Posted by James Greiner at 11:13 AM

October 6, 2006

Causation and Manipulation, V

Jim Greiner

Fair warning: This entry includes a plug for one of my papers

Anti-discrimination laws require lawyers to figure out the causal effect of race (gender, ethnicity) on certain decision making. Previous posts have been exploring the often-tossed-around idea of considering the treatment to be perceived race, as opposed to "actual" (whatever that means) or self-identified race, to answer the no-causation-without-manipulation objection. This feels like a good idea, but it really only works in some cases and not others. It works when we can identify a specific actor (or an institution) whose behavior we want to study. Capital sentencing juries and a defendant firm in an employment discrimination lawsuit are two that work. We can think about changing these specific actors' perceptions of particular units (capital defendants, potential employees), and we can think about WHEN it makes sense to think of treatment (the perception) as being applied: at the moment the actor first perceives the unit's race (or gender or whatever). In contrast, "the public" or "the set of all employers in the United States" are two examples of actors that don't work. The timing of treatment assignment no longer makes sense, the counterfactuals are too hard to imagine, and the usual non-interference-among-units assumption becomes hard to think about.

What does all this buy us? A fair amount. First, this line of thinking identifies cases in which rigorous causal inference based on the potential outcomes framework remains beyond our reach. Figuring out the causal effect of gender or salaries nationwide is one example; another is the causal effect of candidate race on election outcomes. Second, in those cases in which we can identify a specific actor, we get a coherent conceptualization of the timing of treatment assignment, which allows us to distinguish pre- from post-treatment variables. This is a big deal. Entire lawsuits sometimes turn on it.

All this has important implications for civil rights litigation, as I discuss in my paper, "Causal Inference in Civil Rights Litigation." You can get a draft (pdf) of this paper from my website, which you can access by clicking on my name to the left. I'd appreciate any reader reactions/suggestions.

Posted by James Greiner at 10:19 PM

September 28, 2006

Causation and Manipulation

Jim Greiner

In a 1986 JASA article, Paul Holland reported that he and Don Rubin had once made up the motto, “NO CAUSATION WITHOUT MANIPULATION.” The idea is that even in an observational study, causal inference cannot proceed unless and until the quantitative analyst identifies an intervention that hypothetically could be implemented (although Professor Holland accepts the idea that the manipulation may be not ever be carried out for physical or ethical reasons). The idea of studying the causal effect of things that we as human beings could never influence is incoherent because such things could never be the subject of a randomized experiment.

My question: do we really adhere to this principle? Take the one causal link established via observational studies that pretty much everyone (even Professor Freedman, see below) agrees on: smoking causes lung cancer. Has anyone ever bothered to imagine what manipulation to make people smoke is contemplated? Aren’t we pretty sure it wouldn’t matter how we intervened, i.e., however it happens that people smoke, those who smoke get lung cancer at a higher rate? (It might matter what they smoke, how much they smoke, perhaps even where and when, but what got them started and what keeps them at it?) If folks agree with me on this, what’s left of Professor Holland’s maxim?

Paul W. Holland, Statistics and Causal Inference, 81 J. Am. Stat. Ass’n 945, 959 (1986)

David Freedman, From Association to Causation: Some Remarks on the History of Statistics, 14 Stat. Sci. 243, 253 (1999)

Posted by James Greiner at 11:00 PM

May 23, 2006

Inheritance Laws

Jason Anastasopoulos, guest blogger

Question: Many political philosophers that focused on questions of property (including Plato) believed that equality of conditions were necessary for the development of a virtuous citizenry and virtuous leaders. The key to creating this equality of conditions, they argued, was the implementation of strict inheritance laws limiting the transfer of wealth from one generation to the next. Does anyone know of any quantitative models or empirical studies that examine the interaction between social stratification and inheritance laws? If you do, email me at janastasopoulo@law.harvard.edu.

Posted by James Greiner at 6:00 AM

May 16, 2006

Communication, Anyone?

Jim Greiner

The course I co-taught this semester on Quantitative Social Science & Law has come to an end. There were a lot of “lessons learned” in the class, both for the students (at least, I hope so) and for the teaching staff (more definitely). Of all of these lessons, one sticks in my head: we ought to focus on teaching quantitative students how to communicate with folks without formal statistical training.

Some quantitative folks will graduate and spend the rest of their lives talking to and working with only quantitative people. Some, but not many. Most of us will be talking and working with people who have little or no statistics classes under their belts. But do we ever teach the communication skills needed to function effectively with the proles? I’ve never seen or heard of a class that focuses on these skills. Not one. Does that strike anyone besides me as odd?

Posted by James Greiner at 6:00 AM

May 8, 2006

Coarsened at Random

Jim Greiner

I’m the “teaching fellow” (the “teaching assistant” everywhere but Harvard, which has to have its lovely little quirks: “Spring” semester beginning in February, anyone?) for a course in missing data this semester, and in a recent lecture, an interesting concept came up: coarsened at random.

Suppose you have a dataset in which you know or suspect that some of your data values are rounded. For example, ages of youngsters might be given to the nearest year or half-year. Or perhaps in a survey, you’ve gotten some respondents’ incomes only within certain ranges. Then the data has been “coarsened” in the sense that you know that the true value is within a certain range, but you don’t know where within that range.

Happily, techniques have been developed to handle this sort of situation. In many ways, the game is the same as that in the missing data setting. Just as in the missing data context good things happen when the data are missing at random, so also in this context good things happened when the data are coarsened at random. Thus, to begin with, you have to consider (among other things) whether you think the probability that you will observe only a range of possible data values, as opposed to the specific true value, depends on something you don’t observe (such as that specific true value). A good place to start on all this is Heitjan & Rubin, “Inference from Coarse Data via Multiple Imputation with Application to Age Heaping,” 85 JASA 410 (1990).

One final point: you might think that coarsened at random is a specific case of missing at random. Actually, it’s the other way around. Data can be (and often is assumed to be) coarsened at random but not missing at random. Think and you’ll see why.

Posted by James Greiner at 6:00 AM

May 2, 2006

The 80% Rule, Part II

Jim Greiner

In my last post, I introduced the so-called 80% rule in employment discrimination cases. In this post, I discuss some of the reasons why it stinks. For the sake of illustration, pretend I’m interested in knowing whether a company discriminates against women in hiring, and recall that the 80% rule says that I should see whether the hiring rate for women is less than 80% of the hiring rate for men.

The first issue with the 80% rule is that it means different things depending on the hiring rate for men. Suppose 90% of men that apply for a job are hired. 80% of 90% is 72%, so the difference between men and women is 18%; that might seem like something worth investigating. But suppose the company at issue is very exclusive, so it only hires 5% of men who apply; 80% of 5% is 4%. Is this 1% difference something to worry about? Perhaps it is, perhaps it isn’t, but it sure is different from the 18% difference in the previous example.

A second issue with the 80% rule is that it varies depending on whether we’re talking about success rates or failure rates ("success" means getting hired here, "failure" means not getting hired). In one of my hypotheticals above, a company hired 90% of the men who applied. So the success rate is 90%, and the failure rate is 10%. If we apply the 80% rule to the success rate, we should worry if the hiring rate for women is below 72%. But what happens if we apply the reasoning of the rule to the failure rate for men? By analogy to the 80% rule’s reasoning, it seems like we should worry if the failure rate for women is greater than, say, 120% (100% + 20%), or perhaps 125% (1/.8 = 1.25), of the failure rate for men. Take the 125% for the sake of argument, and return to our hypothetical in which the failure rate for men was 10%. 125% of 10% is 12.5%, so we should worry if the failure rate for women is greater than 12.5%. But a failure rate for women of greater than 12.5% corresponds to a success rate for woment of less than 87.5%, and we just said that we’re supposed to worry if the success rate was less than 72%. So which is it, 87.5% or 72%?

A final criticism (for the purposes of this post; I could go on and on here): is any of this significant in the statistical sense? P-values, anyone? Significance tests? Posterior intervals? Anything at all?

Next time you hear someone applying the 80% rule in an employment discrimination case, invite the speaker join us on this planet.

Posted by James Greiner at 6:00 AM

April 24, 2006

The 80% Rule, Part I

Jim Greiner

I’ve blogged previously about the course in statistics and law I’m co-teaching this semester (see, for example, here). The course is now in its second simulation, which deals with employment discrimination. In a recent class, the 80% rule came up. I wish it hadn’t. In fact, I wish the ``rule�? had never seen the light of day. In this post, I’ll explain what the 80% rule is. In a subsequent post, I’ll explain why it stinks.

Suppose we’re interested in figuring out whether members of a protected class (say, women) are being hired, promoted, fired, disciplined, whatever at a different rate from a comparison group (say, men, and for the sake of discussion, let’s say we’re interested in hiring). Long ago, the Equal Opportunity Employment Commission (“EEOC�?) released a statement saying that it would ordinarily regard as suspect a situation in which the hiring rate for women was less than 80% of the hiring rate for men. Note that the EEOC has the authority to bring suit in the name of the United States against a defendant that has violated federal employment discrimination laws.

It would be bad enough of the EEOC used the 80% rule for the purpose it gave, i.e., a statement about how the agency would exercise its investigative and prosecutorial discretion. Alas, courts, perhaps desperate for guidance on quantitative principles, have picked up on the idea, and some now use it as an indicator of which disparities are legally significant. Courts do so despite the outcry of those in the quantitative community interested in such things. More on that outcry in my next post.

Posted by James Greiner at 6:00 AM

April 18, 2006

Censoring or Truncation Due to "Death"?: What’s the Question? (Part II)

Jim Greiner

In my last post, I pointed out that when presented with a causal inference situation of treatment, intermediate outcome, and final outcome, we have to be careful to define a sharp question of interest. Sometimes, we’re interested in the ITT, or the effect of the treatment on the final outcome. At other times, we’re interested in the effect of the intermediate outcome on the final outcome, and the treatment is our best way of manipulating the intermediate outcome so as to draw causal inferences.

In my view, these principles are important in the legal context. Take race in capital sentencing, for example.

To begin, it’s a big step to draw causal inferences about race in a potential outcomes framework; the maxim "no causation without manipulation"? (due, I believe, to Paul Holland) explains why. I believe that step can be taken, but that’s another subject. Suppose we take it, i.e., we decide to apply a potential outcomes framework to an immutable characteristic. The treatment (applied to the capital defendant) is being African-American, the intermediate outcome is whether the defendant is convicted, and the final outcome is whether a convicted defendant is sentenced to die. (Note that, in an instance of fairly macabre irony, if one applies the language of censoring or truncation due to death here, "death"? is an acquittal on the capital charge.)

What causal question do we care about? If all we want to study is the relationship between race and the death penalty, then we don’t care whether a defendant avoids a death sentence via acquittal or avoids a death sentence after a conviction by being sentenced to life. If, on the other hand, what we want to study is fairness in sentencing proceedings, then we need principal stratification; we need to isolate a set of defendants who would be convicted of the capital charge if African-American and convicted of the capital charge if not African-American. Both are potentially interesting causal questions. Let’s just make sure we know which we’re asking.

Posted by James Greiner at 6:00 AM

April 12, 2006

Censoring or Truncation Due to ``Death": What’s the Question?

Jim Greiner

A few weeks ago, Felix Elwert gave a bang-up presentation at the Wednesday seminar series on the effect of cohabitation on divorce rates (see here). One of the most interesting points I took away from the discussion was the following: in some social science situations in which a treatment is followed by an intermediate outcome, then by a final outcome, we might be interested in different causal questions. One causal question is the effect of the treatment on the final outcome; this is commonly called the intention-to-treat effect (ITT). The name comes from, I believe, an encouragement design context; the treatment is an encouragement to, say, get a vaccine, the intermediate outcome is whether a test subject gets a vaccine, the final outcome is whether the test subject gets a disease, and the ITT is the effect of encouragement on disease rates.

A second causal question different from the ITT is the effect of the intermediate outcome on the final outcome; in the vaccine example above, the question here would be the effect of the vaccine on disease rates.

Felix’s point was that if we think of cohabitation as the treatment, marriage as the intermediate outcome, and divorce as the final outcome, there are different causal questions we might want to ask. Those of us steeped in a principal stratification and a truncation due to ``death" way of looking things might jump to the conclusion that the idea of divorce makes no sense for people who don’t get married. Thus, the only ``right" way to look at this situation, we might say, is to isolate the set of people who would get married regardless of cohabitation (the treatment). Not so. If what we’re really interested in is avoiding divorce per se (maybe because divorce is stigmatizing, more stigmatizing than not ever having been married), then perhaps we don’t care whether people avoid divorce by not getting married or avoid divorce by getting married and staying that way. In that case, what we’re after is the ITT. If, however, what we want is stable marriages, then we need to do the principal stratification and truncation due to death bit.

I think Felix’s insight has some applicability to the legal context. More on that in a subsequent post.

Posted by James Greiner at 6:00 AM

April 4, 2006

Academic Ego

Jim Greiner

In a previous post, I brought up the subject of how we quantitative analysts can abuse the trust decision makers (judges, government officials, members of the public) put in us, when they are inclined to trust us at all. Decision makers should be able to depend on us to give them not just a (clearly and understandably stated) summary of inferences we believe are plausible, but also a (clearly and understandably stated) statement of the weak points of those inferences. “No kidding,? you might say. OK. If it’s that obvious, how come none of us is able to do it?

Here’s an exercise, again, something that’s come out of my experience in teaching a class on statistical expert witnesses in litigation. Next time you think you’ve “got it,? that you’ve done the right thing with a dataset and have drawn some solid inferences, step back and ask: “Suppose I was paid $____/hour to convince people that the work I’ve just done is not worthy of credence. What would I say??. If all you can come up with are criticisms that make you laugh (because they’re so silly) or ideas that you can dismiss as unscrupulous babbling motivated by a desire for fees, then you might suffer from a mutilating and disfiguring disease: AE.

In the litigation and expert witnesses class, we’re giving students datasets and assigning them positions (plaintiffs or defendants). One of the refreshing things about this exercise has been that it is forcing the student-experts to think about where attacks on their reports will come from. Perhaps even more importantly, because the sources of those attacks are their friends and peers (i.e., people they respect), students begin to remember something they knew before the academic environment tried to make them forget it: there are weaknesses in everything they do.

I don’t know if all academics suffer from AE. Perhaps I’ve been unlucky in meeting a great many who suffer from especially severe cases. Who knows? Perhaps I’m a carrier myself? (Nah . . .)

Posted by James Greiner at 6:00 AM

March 20, 2006

Making Diagnostics Mandatory

Jim Greiner

Teaching a class (see here) on the interaction between lawyers, most of whom lack quantitative training, and quantitative analysts has me thinking about the danger statistical techniques pose. As is true of those who study any branch of specialized knowledge, statisticians can abuse the trust decision makers (judges, government officials, members of the public) put in us all too easily, and often with impunity. (Of course, “we? all know that “we? would never do any such thing, even though “we? know that “everyone else? does it all the time. Gee.)

If it’s of interest (or perhaps more accurately, unless a barrage of comments tells me I’m being boring), I’ll be blogging about ways “everyone else? abuses trust, and ways “we? can try to stop it. Here’s my first suggestion: make diagnostics mandatory.

Here’s what I mean. I’ve previously blogged (see here) on the double-edged sword posed by the recent trend towards academics’ writing free software to fit models they’ve developed. One way for software-writers to lessen the danger that their models will be abused is to write diagnostics into their programming . . . and make those diagnostics hard to turn off. Suppose, for example, that some analysts are writing code to implement a new model, and the fitting process requires fancy MCMC techniques. These analysts should write MCMC convergence diagnostics into the software, and should set their defaults so that the fitting process produces these diagnostics unless it’s told not to. Perhaps, the analysts should even make it a little tough to turn off the diagnostics. That way, even if the user doesn’t look at the diagnostics, someone else (perhaps an opposing expert in a court case?) might have easier access to them.

The worry, of course, is that the output from all new software will end up looking like it came out of SAS (a package I wouldn’t wish on my worst enemy). Still, as our cognitive psychologist could probably tell us, people are incredibly lazy. Even if a user of software just has to go to a drop-down menu to look at a diagnostic, chances are he/she won’t bother.

Posted by James Greiner at 6:00 AM

March 15, 2006

Incompatibility: Are You Worried?

Jim Greiner

I’m a teaching fellow for a course in missing data this semester, and one topic keeps coming up peripherally in the course, even though we haven’t tackled it head-on just yet. That topic is incompatible conditional distributions. And here’s my question for blog readers: how much does it bother you?

Reduced to its essence, here’s the issue. Supposed I have a dataset with three variables, A, B, and C. There are multiple missing data patterns, and suppose (although it’s not essential to the problem) that I want to use multiple imputation to create six or seven complete analysis datasets. Suppose also that it’s very difficult to conceive of a minimally plausible joint distribution p(A, B, C). Perhaps A is semi-continuous (e.g., income), B is categorical with 5 possible values, and C has support only over the negative integers. What (as I understand it) is often done in this case is to assume conditional distributions, for example, p*(A|B, C), p*(B|A, C), and p*(C|A, B). The idea is that one does a “Gibbs? with these three conditional distributions, as follows. Find starting values for the missing Bs and Cs. Draw missing As from p*(A|B, C). Then draw new Bs from p*(B|A, C) using the newly drawn As and the starting Cs. Continue as though you were doing a real “Gibbs.? Stop after a certain number of iterations and call the result one of your multiply imputed datasets.

The incompatibility problem is that there may be no joint distribution that has conditional distributions p*(A|B, C), p*(B|A,C), and p*(C|A, B). Remember, (proper) joint distributions determine conditional distributions, but conditional distributions do not determine joint distributions, and in some cases, one can actually prove mathematically that no joint distribution has a particular set of conditionals. If you ran your “Gibbs? long enough, eventually your draws would wander off to infinity or become absorbed into a boundary of the parameter space. In other words, your computer would complain; exactly how it would complain depends on how you programmed it.

I confess this incompatibility problem bothers me more than it appears to bother some of my mentors. If the conditional distributions are incompatible, then I KNOW that the "model" I’m fitting could not have generated the data I see. It seems like even highly improbable models are better than impossible ones. On the other hand, I am sympathetic to the idea of doing the best one can, and what else is there to do in (say) large datasets with multiple, complicated missing data patterns and unusual variable types?

How much does incompatibility bother you?

Posted by James Greiner at 6:00 AM

March 8, 2006

EM And Multi-level Models

Jim Greiner

One of the purposes of this blog is to allow us to share quantitative problems we’re currently considering. Here’s one that arose in my research, and I’d love any comments and suggestions readers might have: can one apply the EM algorithm to help with missing data in multi-level models?

Schematically, the problem I ran into is as follows: A_ij | B_i follows some distribution, call it p1_i, and I had n_i observations of A_ij. A_ij was a random vector, and some parts of some observations were missing. B_i | C follows some other distribution, call it p2. Suppose I’m a frequentist, and I want to make inferences about C. The problem I kept running into was that I couldn’t figure out how to use EM without integrating the B_i’s out of the likelihood, a mathematical task that exceeded my skills. I ended up switching to a Bayesian framework and using a Gibbs sampler, i.e., drawing from the distribution of the missing data given the current value of the parameters, then from the distribution of the parameters given the now-complete data. But I couldn’t help wondering, are hardnosed frequentists just screwed in this situation, do they have to resort to something like Newton-Raphson, or is there an obvious way to use EM that I just missed?

Posted by James Greiner at 6:00 AM

March 2, 2006

Freaks And "Parameter"

Jim Greiner

In a previous post, I briefly described the joint Law School/Department of Statistics course I’m currently co-teaching in which law students act as lawyers and quantitative students act as experts in simulated litigation. I’ll be writing about some of the lessons learned from this course in blog entries, especially lessons about what is quickly becoming the course’s central challenge for the students: communication between those with quantitative training and those without. Here’s my first lesson for the quantitatively adept: avoid the word “parameter.?

Of course it isn’t the word “parameter? so much as is it is any of the jargon that we in the quantitative social science business use every day. And everyone knows that if you’re speaking to persons from another field, you have to speak in regular English (if that’s what you’re speaking). The hard part is remembering what regular English sounds like. We in quantitative social science don’t realize what freaks we become.

Here’s the vignette. In a recent session of the class, a student sought to explain to some lawyers how simulation can be used to test whether a model is doing what it’s supposed to do. She got as far as explaining how one could use a computer to simulate data, but when she began to explain checking to see whether an interval produced by the model covered the known truth, she used the word “parameter.? The change in expression on the law students’ faces resembled air going out of a balloon.

Of course, every first year statistics undergraduate knows what a “parameter? is, and as far as jargon goes, “parameter? is a lot less threatening than some other terms. But it was enough to cause the lawyers in the room to give up on following her. It the recovery period was longer than it might otherwise have been because this episode occurred early in the class, when the lawyers and experts were still getting a feel for each other. The lesson for us is, when communicating with the rest of the world, even the most seemingly innocuous words can make a difference. We have to recognize that we’ve become freaks.

Posted by James Greiner at 6:00 AM

March 1, 2006

Thoughts on SUTVA (Part II)

Alexis Diamond, guest blogger

In part I (yesterday), I introduced the subject of SUTVA (the stable unit treatment value assumption), an assumption associated with Rubin's causal model. Well, why have SUTVA in the first place? What work is it actually doing? What does it require? "The two most common ways in which SUTVA can be violated appear to occur when (b) there are versions of each treatment varying in effectiveness or (b) there exists interference between units" (Rubin 1990, p. 282).* But this two-step SUTVA shorthand is frequently implausible in the context of many important and interesting causal questions.

SUTVA allows for a precise definition of causal effects for each unit. When SUTVA obtains, the inference under investigation relates to the difference between what would have been observed in a world in which units received the treatment and what would have been observed in a world in which treatment did not exist. SUTVA makes the inference, the causal question under investigation, crystal clear.

But SUTVA is not necessary to perform inference in the context of Rubin's causal model--what is necessary is to precisely define causal effects of interest in terms of potential outcomes and to adhere to the principle that for every set of allowable treatment allocations across units, there is a corresponding set of fixed (non-stochastic) potential outcomes that would be observed. In my peacekeeping analysis, I define units as country-episodes; each unit is an episode during which a country experienced civil war and was either treated/not-treated by a UN peacekeeping mission.

I define my causal effects precisely: I am interested in causal effects for treated units, and I define the causal effect for each treated unit as the difference between the observed outcome and what would have been observed had that unit's treatment been turned-off and peacekeeping had not occurred. There are many other potential outcomes one could contemplate and utilize to make other causal inferences; these others are beyond the scope of my investigation. I don't need SUTVA or other exclusion restrictions to exclude them. I exclude them in the way I pose my causal question.

I am not claiming that all peacekeeping missions are exactly the same—that would be silly. I also do not claim non-interference across units—after all, how could this be true, or even approximately true? History matters. Peacekeeping missions affect subsequent facts on the ground within and across countries. So SUTVA is going to be violated. But what allows me to proceed with an attempt at analysis is that my causal question is, nevertheless, well-defined. Clearly, I mean only one thing when referring to the "estimated effect of peacekeeping": the difference between the observed outcome for each and every treated unit and what would have been observed for each unit under the control regime of no-peacekeeping. I define the average effect for the treated (ATT), my ultimate estimand of interest, to be the average of these estimated unit-level effects.

Three caveats apply: (1) I am not claiming this ATT represents what it does under SUTVA, namely the average difference in potential outcomes that would have been observed given all selected units experiencing treatment vs. all experiencing control; (2) I must assume there is only one version of the control intervention; (3) estimation will require additional assumptions, and if estimating treatment effects under exogeneity (eg., via matching), one must still make the case for ignorable assignment. This last caveat is very different from, and subsequent to, the others, in the sense that estimation and analysis via matching (or any other method) only makes sense if the first two caveats obtain and the causal question is well-defined.

As social science moves increasingly toward adoption of the Rubin causal model, I predict that political scientists (and social scientists more generally) will frame their SUTVA-like assumptions and inferential questions in this way. I think this is consistent with what Gary King and his coauthors were doing in Epstein et al. (2005)**, when they asked about the effect of war on Supreme Court decision-making. They were not claiming that occurrences of treatment (war) had no effect on subsequent Supreme Court decisions; they were asking about what would have happened if each episode of treatment had been turned off, one at a time. And in many cases, this is the only kind of question there is any hope of answering—the only kind of question close enough to the data to allow for plausible inference. As long as these causal questions themselves are interesting, this general approach seems to me to be a coherent and sensible way forward.

*Rubin, Donald B. Formal Modes of Statistical Inference For Causal Effects. Journal of Statistical Planning and Inference. 25 (1990), 279-292.

** Epstein, Lee; Daniel E. Ho; Gary King; and Jeffrey A. Segal. The Supreme Court During Crisis: How War Affects only Non-War Cases, New York University Law Review, Vol. 80, No. 1 (April, 2005): 1-116.

Posted by James Greiner at 6:00 AM

February 28, 2006

Thoughts on SUTVA (Part I)

Alexis Diamond, guest blogger

I gave a talk on Wed, Feb 8 at the IQSS methods workshop where I described my efforts to estimate the effects of UN intervention and UN peacekeeping on peacebuilding success following civil war. One of my goals was to demonstrate how matching-based methods and the Rubin model of causal inference can be helpful for answering questions in political science, particularly in fields like comparative politics and international relations.

An important issue in this context relates to Rubin's SUTVA, the stable-unit-treatment-value assumption typically assumed whenever matching-based methods are performed. SUTVA requires that the potential outcome for any particular unit i following treatment t is stable, "in the sense that it would take the same value for all other treatment allocations such that unit i receives treatment t (Rubin 1990, p. 282). This is a stronger form of a basic assumption at the heart of the Rubin causal model: that for every set of allowable treatment allocations across units, there is a corresponding set of fixed (non-stochastic) potential outcomes that would be observed.

Rubin (1990) goes on to say that "The two most common ways in which SUTVA can be violated appear to occur when (a) there are versions of each treatment varying in effectiveness or (b) there exists interference between units" (ibid., p. 282).* But how exactly do "versions" and "interference" cause violations, and what are the consequences? Don't these violations occur frequently in political science and the other social sciences? In my research agenda, for example, treatment is peacekeeping, and peacekeeping is going to vary in effectiveness from country to country. Moreover, it is ridiculous to suppose a country's potential outcomes are independent of what is happening (or has already happened) to its neighbors, especially in the context of war and political conflict involving refugees, cross-border skirmishes, etc... (although this kind of independence is typically claimed—at least implicitly—whenever regression-based approaches are used.)

Why do multiple versions of treatment pose SUTVA problems? Because SUTVA posits, for each unit and treatment, a single fixed potential outcome, not a distribution of potential outcomes. Thus, if there is a potential outcome for the weak version of treatment A and a different potential outcome for the strong version of treatment A, then one cannot speak of the potential outcome that would have been observed following treatment A: there are in fact two treatments. Note that a causal question framed in terms of a single type of treatment A (eg., "What is the effect of treatment A-strong version?") does not present these problems. Similarly, as long as there is a single version of the control intervention, one could still coherently define causal effects for each unit in terms of the difference between (observed) potential outcomes under heterogeneous treatment interventions and (unobserved) potential outcomes under control. One might wonder if these causal effects are substantively interesting, and if and how they could be reliably estimated…these critically important issues are separate from and subsequent to the question of whether the inferential investigation is well-defined.

The problem posed by interference across units is very similar; if unit i's potential outcome under treatment A depends upon another unit j's assignment status, then there are really multiple (compound) treatments involving A for unit i, each of which involves a different assignment for unit j. Each of these multiple treatments is associated with a corresponding potential outcome. Note that this kind of interference across units does not necessarily present a problem for defining the effect of a single one of these compound treatment As. It just means that asking "What is the effect of treatment A?" makes no sense---it is not a well-posed causal question.

Because SUTVA is so frequently discussed in the context of matching-based methods, people often assume that the two are inextricably linked: that whatever SUTVA is useful for, it is useful only for matching-based analyses. A crucial point often missed is that SUTVA is useful for the discipline it imposes on study-design. Prior to the choice of analytical methodology (eg., regression, matching, etc.), SUTVA works to nail down the precise question under investigation.

Given these issues, can the peacekeeping question be addressed within Rubin's causal model? I return to this question in post II of this series.

*Rubin, Donald B. Formal Modes of Statistical Inference For Causal Effects. Journal of Statistical Planning and Inference. 25 (1990), 279-292.

Posted by James Greiner at 6:00 AM

February 7, 2006

Do People Think like Stolper-Samuelson? Part II

Jens Hainmueller and Michael Hiscox

Last week, we introduced the question of whether the Stolper-Samuelson theorem, i.e., that more educated people favour trade because it will increase their factor returns, accurately reflects the way people think. We also introduced our recent paper on this subject, “Learning to Love Globalization: Education and Individual Attitudes Toward International Trade“, in which we examine the alternative theory that more educated respondents tend to be more exposed to economic ideas about the overall efficiency gains for the national economy associated with greater trade openness, and tend to be less prone to nationalist and anti-foreigner sentiments often linked with protectionism.

Which of the very different interpretations of the education-pro trade link is more correct? We re-examine the available survey data on individual attitudes toward trade, conducting a simple test of the effects of education on support for trade that distinguishes clearly between the Stolper-Samuelson interpretation of this relationship and alternative ideational and cultural accounts. We find that the impact of education on attitudes toward trade is almost identical among respondents currently in the active labor force and among those who are not (even those who are retired). That the effects of education on trade policy preferences are not mediated by whether individuals are actually being paid for the employment of their skills strongly suggests that it is not primarily a product of distributional concerns.

The analysis also reveals clear non-linearities in the relationship between education and trade preferences: while individuals who have been exposed to college or university education are far more likely to favor trade openness than those who have not, other types of educational attainment have no significant effects on attitudes and some even reduce the likelihood that individuals support trade even though they clearly contribute to skill acquisition. These findings indicate that the particular ideational and/or cultural effects associated with college education, and not the gradual accumulation of skills, are critical in shaping individual attitudes toward trade.

We conclude that the impact of education on how voters’ think about trade and globalization has more to do with exposure to economic ideas, and information about the aggregate and varied effects of these economic phenomena, than it does with individual calculations about how trade affects personal income or job security. This is not to say that the latter types of calculations are not important in shaping individuals’ views of trade – just that they are not being manifest in the simple association between education and support for trade openness. As we discuss in the concluding section, we think it is likely that concerns about the effects of trade on personal income and job security might actually hinge on the particular impact of trade openness in specific industries. One of the key implications of our findings is that future empirical tests of the determinants of individual trade preferences need to be substantially refined to identify the impact of distributional concerns on attitudes towards trade and globalization and distinguish these from the impact of ideational and cultural factors.

Posted by James Greiner at 6:00 AM

January 30, 2006

Ecological Inference in the Law, Part III

Jim Greiner

In two previous posts here and here, I discussed the ecological inference problem as it relates to the legal question of racially polarized voting in litigation under Section 2 of the Voting Rights Act. In the latter of these two posts, I suggested that this field needed greater research into the case of R x C, as opposed to 2 x 2, tables.

Here's another suggestion from the courtroom: we need an individual level story.

The fundamental problem of ecological inference is that we do not observe data at the individual level; instead, we observe row and column totals for a set of aggregate units (precincts, in the voting context). This fact has led to some debate about whether a model or a story or an explanation about individual level behavior is necessary to make ecological inferences reliable, or at least as reliable as they can be. On the one hand, Achen & Shively, in their book Cross-Level Inference, have argued that an individual level story is always necessary to assure the coherence of the aggregate model and to assess its implications. On the other hand, Gary King, in his book A Solution to the Ecological Inference Problem, has argued that because we never observe the process by which ecological data are aggregated from individual to group counts, we need not consider individual level processes, so long as the row counts (or percentages) are uncorrelated with model parameters.

From a social science point of view, this question is debatable. From a legal point of view, we need an individual level story, regardless of whether such a story produces better statistical results. When judges and litigators encounter statistical methods in a litigation setting, they need to understand (or, at least, to feel that they understand) something about those methods. They know they will not comprehend everything, or perhaps even most things, and they have no interest in the gritty details. But they will not credit an expert witness who says, in effect, "I ran some numbers. Trust me." What can quantitative expert witnesses offer in an ecological inference setting? The easiest and best thing is some kind of individual level story that leads to the ecological model being used.

Posted by James Greiner at 6:01 AM

January 25, 2006

Is This a First?

Jim Greiner

This Spring, Harvard will be the site of something that has never been attempted before . . . I think. Matthew Stephenson of the Harvard Law School, Don Rubin of the Harvard Department of Statistics, and I will teach a seminar entitled Quantitative Social Science, Law, Expert Witnesses, and Litigation. The course will be offered jointly in the Law School and the Statistics Department and will, we hope, include students from the both places as well as other Departments in the Graduate School of Arts & Sciences (Government, Sociology, Economics, etc.).

In the course, the quantitatively trained students will act as expert witnesses by analyzing datasets relating to a given fact scenario. The experts will draft expert reports and testify at depositions, which will be taken by the law students acting as (what else?) lawyers. The lawyers will then use the transcripts and expert reports to draft cross motions for summary judgment and responses to those motions. By the way: A very big thanks to New England Court Reporting Institute for agreeing to provide court reporters free of charge to assist the course!

Our hope is that by forcing law students and quantitatively trained students to communicate effectively under the pressure-cooker conditions of pre-trial litigation, we can teach them something about the critical process of communicating with one another generally. In my view, this communication process is underemphasized in both law schools and quantitative departments around the nation. For example, how often does the average law student have to communicate with a person with greater knowledge of another field (anything from construction to exporting fruit)? How often are students trained in quantitative fields required to explain methods and conclusions to those not so trained?

When I began putting together this course a year ago, I searched for analogs in academic websites around the country but found none. My question: are there other for-credit classes like this one out there? By "like this one" I mean courses in which quantitative and law students are in the same classroom, forced to work with each other effectively?

Either way, I'll be sharing some of the lessons learned from this effort throughout the upcoming semester.

Posted by James Greiner at 6:00 AM

January 13, 2006

Ecological Inference in the Law, Part II

Jim Greiner

In a previous post, I introduced a definition of the ecological inference problem as applied to the legal difficulty of drawing inferences about racial voting patterns from precinct-level data on candidate support and racial makeup of the voting-age-population. As I mentioned as a previous post, very few lawyers and judges have ever contributed to the expansive literature on this question, despite the fact that ecological inference models are often used in high-profile courtroom cases.

Here's an initial contribution from the courtroom: forget about two by two tables.

The overwhelming majority of publications on the ecological inference problem concern methods for sets of two by two contingency tables. In the Voting Rights Act context, a two by two table problem might correspond to a jurisdiction in which almost every potential voter is African-American or Caucasian, and all we care about is who votes, not who the voters supported. In that case, the rows of each table are black and white, while the columns are vote and no-vote. For each precinct, we need only predict one internal cell count, and the others are determined.

This two by two case is of almost no interest in the law. The reason is that in jurisdictions in this country, the voters have three options in any electoral contest of interest: Democrat, Republican, and not voting. That means we have a minimum of three columns. In most jurisdictions of interest these days, we also have more than two rows. Hispanics constitute an increasingly important set of voters in the United States, and their voting patterns are rarely similar enough to those of African-Americans or Caucasians to allow an expert witness to combine Hispanics with one of these other groups.

Thus far, scant research exists into the R x C problem. Before a few years ago, one had two options: (i) run a set of C-1 linear models, a solution that often led to logically inconsistent predictions (such as 115 percent of Hispanic voters supported the Democrat), or (ii) pick a two by two model that includes information from the precinct-level bounds, and also available statistical information, and apply it in some way to the problem set of R x C tables at hand, perhaps by collapsing cell counts down to a two by two shape, perhaps by applying the two by two method repeatedly to draw inferences about the R x C problem at hand. Neither approach is very appealing.

A few years ago, Rosen et al. proposed a variant of a Dirichlet-Multinomial model, a serious improvement in this area. This model was and is a large step forward in the analysis of R x C ecological inference tables. Nevertheless, there is always room for improvement. The model does not respect the bounds deterministically, and it does not allow a great deal of flexibility in modeling intra-row and inter-row correlations. On the latter point, an example may clarify: Suppose we are analyzing a primary in which four candidates are running, two African-American and two Causacian. Would we expect, among (say) black voters, for the vote counts or fractions (by precinct) for the two African-American candidates to be positvely correlated?

I look forward to contributing to this research soon.

Posted by James Greiner at 5:57 AM

January 6, 2006

Ecological Inference in the Law, Part I

Jim Greiner

Alchemists' gold. The perpetual motion machine. One might also think of cold fusion and warm superconductors. These are some of the great mythical aims of the so-called "hard" sciences. A few of these concepts have also been compared to attempts at ecological inference, the search for accurate predictions about the internal cell counts of a set of contingency tables (such as one for each precinct) when only the row and column totals of table are observed. The fundamental problem of ecological inference is, of course, that radically different internal cell counts can lead to identical row and column totals, and because we only get to see the row and column totals, we cannot distinguish among these different sets of counts. Another way of saying this is that the problem is impossible to solve deterministically (since the relationship between the cell entries and row and column marginals is not one-to-one), causing some to label ecological inference an "ill-posed inverse problem". In fact, without making some statistical assumptions, the estimation problem would not be identified, although it would be bounded because some values for the cell entries are are ruled out for each precinct's contingency table by the observed column and row totals (these are called "the bounds").

Ecological inference arises in the legal setting in cases litigated under the Voting Rights Act. Section 2 of the VRA prohibits a state or municipality from depriving a citizen, on account of race or ethnicity, of an equal opportunity to participate in the political process and to elect candidates of his/her choice. The Delphic statute has been interpreted to disallow districting schemes that have the effect of diluting minority voting strength. In practice, to succeed in a vote dilution claim, a plaintiff must almost always prove that voting in the relevant jurisdiction is racially polarized, meaning that whites vote differently from blacks who vote differently from Hispanics. Because the secret ballot prevents direct observation of voting patterns, expert witnesses are forced to attempt the dangerous task of drawing inferences about racial voting patterns from precinct-level candidate support counts (column totals) and precinct-level racial voting-age-populations (row totals).

A large literature exists on the ecological inference problem. Bizarrely, one constituency has rarely if ever contributed to this debate: the lawyers and judges who consume a great deal of what the literature produces. I'll be attempting to start to fill this gap in subsequent entries.

Posted by James Greiner at 5:56 AM

January 4, 2006

Experts and Trials IV: Why?

John Friedman

In my previous posts on this subject (see here for the most recent), I have explored our legal system's reliance on expert witnesses from game-theoretic and personal perspectives. In this post, I take an entirely different approach, and ask the question: why is our system structured so?

The first question by many might be: what are the alternatives? The traditional example is the French system, known as the Civil Law system (as opposed to the British-based Common Law system). In France, a government judge acts as would the lawyers, judge, and jury in the American system. This judge calls witnesses suggested by the parties (plus some others of his choosing), questions them himself, and then decides upon the proper course of action. Trials often finish in one day; justice is summarily, if crudely, dispensed.

So why did these two systems develop differently, separated by less than 100 miles of the English Channel? Though many answers surely exist in the historical literature, I offer one theory presented by Edward Glaeser and Andrei Shleifer, both in the Harvard Economics Department. They place the roots of the two legal systems in the political circumstances in England and France in the 12th and 13th centuries, when the first characteristics of these procedures emerged.

The key element of a legal system, argue Glaeser and Shleifer, is its ability to limit the influence of corruption and coercion. Viewed from this perspective, the strengths and weaknesses of juries versus government (then royal) judges become clear. Juries, composed mostly of local commoners, would be subject to much coercion by local feudal lords. Royal magistrates, on the other hand, would be far less susceptible to such forceful persuasion, but would be far more easily bribed by the king. A country's choice between these two systems should depend on which problem is more dire: The threat of regional "bullies" or of royal domination.

Glaeser and Shleifer survey the historical record to argue that exactly this difference existed between England and France in the late middle ages. England, recently conquered by and still under the rule of the Normans, had a much stronger monarchy, which imposed order on the countryside. The smaller lords, with whom King John negotiated the Magna Carta, feared royal domination far more than they feared each other, and were willing to accept the possibility of local bias in juries so that the king would not interfere. France, on the other hand, was far more violent, torn between many competing barons. These dukes feared each other most of all, and knew that any jury would quickly fall under the sway of the local ruler; thus, they were willing to cede control of the legal system to the king.

I am not an historian, and so I cannot know whether these arguments accurately reflect the genesis of our legal system. But even if the true explanation lies elsewhere, surely it will have the same historical feel. These institutions have great inertia, and so it does not surprise me that factors so long ago have explanatory power. Nonetheless, is this the best we can do? Does our legal system reduce to an historical anachronism?

Posted by James Greiner at 3:07 AM

December 21, 2005

End-of-Year Hiatus

Jim Greiner

With universities out of session and many students away from their offices, the Social Science Statistics Blog will reduce the frequency of its postings. We will resume our at-least-one-per-day schedule in early January. Until then, check back periodically for the occasional entry.

Happy New Year!

Posted by James Greiner at 4:36 AM

December 19, 2005

Beyond Standard Errors, Part II: What Makes an Inference Prone to Survive Rosenbaum-Type Sensitivity Tests?

Jens Hainmueller

Continuing from my previous post on this subject, sensitivity tests are still somewhat rarely (yet increasingly) used in applied research. This is unfortunate, I think, because, at least according to my own tests on several datasets, observational studies do vary considerably in their sensitivity to hidden bias. Some results go away once you allow for only a tiny amount of hidden bias, others are rock solid weathering very strongest hidden bias. One should always give the reader this information I think.

One (and maybe not the most important) reason for why these tests are infrequently used is that they take time and effort to compute. So I was thinking, instead of computing the sensitivity tests each time, maybe it would be good to have some quick rules of thumbs to judge whether a study is insensitive to hidden bias.

Imagine you have two studies with identical estimated effect size and standard errors. Now, which one would you trust more regarding their insensitivity to hidden bias? In other words, are there particular features of the data, which makes an inference drawn from this data to excel on Rosenbaum type sensitivity tests? The literature I have read thus far provides little guidance on this issue.

We have a few ideas about this (which are still underdeveloped). For example, ceteris paribus, one could think that it’s better to have a rather imbalanced vector of treatment assignments (like only a few treated or only a few control). Another idea is that, ceteris paribus, inferences obtained from a smaller (matched) dataset should be less prone to get knocked over by hidden bias tests. Or, in the case of propensity score methods, one would like covariates that strongly predict treatment assignment so that an omitted variable cannot tweak the results much.

This is very much still work in progress; comments and feedback are highly appreciated.

Posted by James Greiner at 6:06 AM

December 16, 2005

Redistricting and Electoral Competition: Part II

John Friedman and guest blogger Richard Holden

Yesterday, we blogged about whether gerrymandering or something else a principal cause of low turnover in the House of Representatives and other elected bodies. We continue that discussion today.

How can we determine whether gerrymandering is the culprit, given that any number of reasons could account for the increase in the incumbent reelection rate? The key is that redistricting usually happens only once each decade (at least until the recent controversies in Texas.) Other factors, such as money or electoral polarization, tend to change more smoothly over time. One can tease these factors apart with a
"regression discontinuity" approach, separating the time series into 1) a smooth function and 2) jumps at the time of gerrymandering.

In a recent paper (available at here), we find that redistricting has actually slightly reduced incumbent reelection rates over time. We also look to see if there are systematic differences between "bipartisan" gerrymanders, designed to protect incumbents from both parties, and "partisan" gerrymanders, in which one party attempts to leverage its support into more representation in the state's Congressional delegation. There is no evidence that the incumbent reelection rate responds differently after any of these forms of redistricting.

This research suggests that factors other than redistricting are the more important culprits in today's lack of electoral competition. In some sense, this isn't all that surprising. While the technology available has become more advanced, so have the constraints on gerrymanderers. Supreme Court decisions interpreting the 14th amendment and the Voting Rights Act have consistently narrowed the bounds within which redistricting must occur.

There may, of course, be other reasons to support independent commissions. For instance, they tend to create more geographically compact districts. Neutral bodies also help to avoid the most extreme cases of partisan gerrymandering, in which the neighborhood of an incumbent is grouped with distant voters in a tortuously shaped district. Perhaps most importantly, independent commissions may be able to ensure minority representation - though the Voting Rights Act also plays a fundamental role in this area.

The basic premise of supporters of non-partisan commissions - that political competition is important - is a sound one. But the evidence suggests that these advocates are focused in the wrong place. The redistricting process is far from the only cause of limited competition.

To increase competition in elections for Congress and state legislatures, we must pay more attention to other potential causes of the increase in the incumbent reelection rate. We must better understand how factors such as money, television, and candidate quality impact elections. But if we can direct towards these aspects of democracy the same spirit of reform that now supports the drive towards independent redistricting commissions, new and more promising solutions can't be far away.

Posted by James Greiner at 2:53 AM

December 14, 2005

Consumer Demand for Labor Standards, Part III

Michael Hiscox and Nicholas Smyth, guest bloggers

Continuing our discussion begun yesterday and the day before on labor standards labeling, perhaps the most important comments we received at the workshop had to do with how we might design our next set of experiments. It is very difficult to do anything fancy when it comes to in-store experiments. It could never be practical (and ABC would never give permission) for us to randomize treatments to individual items or brands on, say, a daily basis. The manner in which products are displayed (grouped by brand), the costs associated with altering labels and prices, and the potential problems for sales staff (and the potential complaints from frequent customers) impose severe constraints. Several workshop participants suggested that we conduct the next set of experiments through an online retailer. That way we might be able to randomly assign labels (and prices) to customers when they view product information on a website and decide whether or not to make a purchase. There would still be plenty of difficulties to iron out, as was quickly noted (e.g., making allowances for customers who attempt to return to the same product page at a later point in time, and for customers who "comparative shop" for products at multiple retailers). But this seems like the way to proceed in the future.

On a related theme, we noted that Ezaria.com, an online retailer run by Harvard students, is already planning to track a variety of economic data on its customers. Ezaria has a mission which involves providing markets for independent artisans from the developing world and donating 25% of profits to charity. At a minimum, looking at data on whether a customer is more likely to make a purchase after being shown the company's "mission" page (that explains their policies) would provide some measure of consumer demand for companies that source from high-standard producers. Perhaps we can persuade Ezaria to cooperate with us in a future experimental project. Or perhaps we can arrange the experiment with an even larger online retailer, with customers who are not so obviously self-selected as socially conscious.

Posted by James Greiner at 3:49 AM

December 13, 2005

Consumer Demand for Labor Standards, Part II

Michael Hiscox and Nicholas Smyth, guest bloggers

We continue yesterday's entry discussing questions that arose during our recent presentation of our paper on consumer demand and labor standards labeling.

Another excellent question that was raised in the discussions concerned the evidence that sales of our labeled items actually rose (relative to sales of unlabeled control products) when their prices were raised. We have been interpreting this as evidence that consumers regarded the label as more credible when the product was more expensive relative to alternatives, since they expect to pay more for higher labor standards. One question was whether relative sales would have risen with price increases for any good (labeled or unlabeled) just because higher prices can signal better quality. Since we did not raise the price of unlabeled items, we cannot address this concern directly. It is not critical to one of our main findings: sales of labeled items increased markedly relative to sales of unlabeled alternatives when the labels were put in place (before prices were adjusted). But we will try to track down the research on the price-quality issue in the literature on consumer psychology. Our basic assumption is that the existing (equilibrium) product prices and sales levels at ABC (in the "baseline" period) accurately reflected the relative quality of treatment and control products.

Other questions raised concerned the evidence we discussed in the paper about the marked increase in sales of Fair Trade Certified coffee. It was pointed out that, to the extent that retailers like Starbucks are marketing only fair trade coffee as the brewed "coffee of the day" this seems more like a general CSR strategy by the firm and not a sign of demand for improved standards. We were really talking about sales of certified coffee beans, rather than brewed coffee. The labeled beans are sold in direct competition with similar (unlabeled) beans at both Starbucks and Peets. But it is important that we check the data and see if we can discriminate clearly between sales in different categories.

In general, we felt we have to do better in accounting for seasonal patterns in demand for home furnishings at ABC and how they might bear on our findings. This is obviously not a problem for our core results that hinge on the ratio of sales of labeled brands to unlabeled brands during each phase of the experiment. But for measuring price elasticities using changes in absolute sales of labeled items over time we would like to allow for the fact that sales of home furnishings were expected to dip during the summer months. To do this, we will probably need to estimate weekly sales for each brand using all the data we have from ABC prior to the start of our experiment (covering sales in 2004 and the first half of 2005). The relevant covariates would probably include recorded levels of total foot traffic in the store, total sales of other store products, some national or regional measures of economic activity and consumer confidence, variables accounting for any special sales and promotional campaigns, and seasonal dummies. We can then compare actual (absolute) sales of labeled brands with out-of-sample predictions based upon the estimations and thereby gauge the impact of our experimental treatments.

We will conclude our discussion in tomorrow's post.

Posted by James Greiner at 4:46 AM

December 12, 2005

Consumer Demand for Labor Standards, Part I

Michael Hiscox and Nicholas Smyth, guest bloggers

We are very grateful to all the members of the Applied Statistics Workshop for inviting us to present our paper (abstract here) in the workshop this week. Thanks, especially, to Mike Kellerman for organizing everything and playing host.This was the first time we have presented the results from our experiments, and we received some very valuable feedback and suggestions for future work on this topic. One important question that was raised was why we do not simply assume that firms already know how much consumer demand there is for good labor standards? That is, if firms could make a buck doing this sort of thing, why not assume they would already be doing it? We think there are probably a couple of answers to this question. As we noted at the workshop (and in the paper), credible labeling would require cooperation from, and coordination with, independent non-profit organizations that could certify labor standards in factories abroad. So part of the issue here for firms is the uncertainty surrounding whether such organizations would be willing and able to take on such a role. The uncertainty about establishing a credible labeling scheme with cooperation from independent groups, on top of the uncertainty about consumer demand itself, may explain why firms are not doing as much research in this area as (we think) is warranted.

The other answer, or part of the answer, is that many firms may consider it too risky to do market research on labor standards labeling. We talked a little about how many firms refused to participate in our labeling experiments because they could not vouch for labor standards in all the factories from which they source and they were anxious about negative publicity if consumers or activist groups became curious about unlabeled items in their stores. Note that this is not evidence that labeling strategies must also be too risky for firms to ever contemplate. The risks of doing research on this issue are not identical
to the risks attached with actually adopting a labeling strategy (which depend on what the research can tell us about consumer demand, and on whether a firm decides to switch to selling only labeled products or some combination of labeled and unlabeled products, etc).

More on our paper and the questions that arose in the presentation tomorrow.

Posted by James Greiner at 2:38 AM

December 9, 2005

What Did (and Do We Still) Learn from the La Londe Dataset (Part II)?

Jens Hainmueller

I ended yesterday's post about the famous LaLonde dataset, with the following two questions: (1) What have we learned from the La Londe debate? (2) Does it makes sense to beat this dataset any further or have we essentially exhausted the information that can be extracted from this data and need to move one to new datasets?

On the first point, VERY bluntly summarized, the comic strip history goes somewhat like this. First, La Londe showed that regression and IV do not get it right. Next, Heckman's research group released a string of papers in the late 80s and 90s trying to defend conventional regression and selection-based methods. Enter stage Dehija and Wahba (1999). They showed that apparently, propensity score methods (sub-classification and matching) get it right if one controls for more than one year of pre-intervention earnings. Smith and Todd (2002, 2004) are next in line, claiming that propensity score methods do not get it right. Once one slightly tweaks the propensity score specification, the results are again all over the place. The ensuing debate spawned more than five papers as Rajeev Dehejia replied to the Smith and Todd findings (all papers of this debate can be found here). Then last but not least, Diamond and Sekhon (2005) argue that matching does get it right, if it’s done properly, namely if one achieves a really high standard of balance (we’ve already had quite a controversy about balance on this very blog. See for example here).

So what does this leave applied researchers with? What do we take away from the La Londe debate? Does anyone still think that regression (or maximum likelihood methods more generally) and/or 2-stage least squares IV produce reliable causal inferences in real world observational studies? In all seriousness, where is the validation? . This is the $1 million-dollar question, because MLE and IV methods represent the great majority of what is taught and published across the social sciences. Also, can we trust propensity score methods? How about other matching methods? Or is there little hope for causal inference from observational data in any case (in which case I fear we are all out of a job, and the philosophers get the last laugh?) This is not necessarily my personal opinion, but I would be interested to hear people’s opinion. [The evidence is of course not limited to La Londe; there is ample evidence from other studies with similar findings. For example see Friedlander and Robins (1995), Fraker and Maynard (1987), Agodini and Dynarski (2004), Wilde and Hollister (2002) and various Rubin papers to name just a few].

On the second point, let me play the devil’s advocate again and ask: What can we still learn from the La Londe data? After all it’s just one single dataset, the standard errors even for the experimental dataset are large, and once we match in the observational data, why would we even expect to get it right? There is obviously a strong case to be made for selection on unobservables in the case of the job training experiment. So even if we manage to adjust observed differences, why in the world should we get the estimate right? [Again, this is not my personal opinion, but I have heard a similar contention both at a recent conference and in Stat 214.] Maybe instead of a job training experiment, we should first use experimental and observational data on something like plants or frogs, where hidden bias may (!) be less of a problem (given this is actually the case)? Finally, what alternatives do we have—how would we know what the right answer was if we were not working with a La Londe-esque framework? Again, I would be interested in everybody’s opinion on this point.

Posted by James Greiner at 6:14 AM

December 5, 2005

Anchoring Vignettes (II)

Sebastian Bauhoff

In my last post I mentioned how differences in expectations and norms could affect self-rated responses in surveys. One fix is to use anchoring vignettes that let the interviewer control the context against which ratings are made.

For example, in a 2002 paper on the use of vignettes in health research, Salomon, Tandon and Murray ask respondents to rank their own difficulty in mobility on a scale from 'no difficulty' to 'extreme difficulty'. Then they let respondents apply the same scale to some hypothetical persons using descriptions like these:

"Paul is an active athlete who runs long distances of 20km twice a week and plays soccer with no problems."

"Mary has no problems walking, running or using her hands, arms, and legs. She jogs 4km twice a week."

Using the difference in how people assess these controlled scenarios, one can adjust the rating of people's own health. Doing this across or within various populations then allows to examine systematic differences across groups. These vignettes have been used in recent World Health Surveys in a number of countries.

King, Murray, Salomon and Tandon introduced the vignettes approach and used the measured differences to correct responses to self-rated questions on political efficacy. The idea is that applying the vignettes to a sub-sample is cheap and sufficient to understand systematic differences in self-reports. Their methods are laid out in the paper, but the results show how much difference the vignettes method can make: instead of suggesting that there is a higher level of political efficacy in China than in Mexico (as self-reports would indicate), the vignette method shows the exact opposite because the Chinese have lower standards for efficacy and thus understand the scale differently.

Intuitively that's what we do all the time: once you talked to enough Europeans and Americans about their (and other peoples') well-being you use your mental model to adjust responses and stop taking the European's minor complaints too seriously. Using this insight in survey-based research can make a huge difference too.

Posted by James Greiner at 6:41 AM

December 2, 2005

Questions about Free Software

Jim Greiner

This past spring at Harvard, a group of students from a variety of academic disciplines agitated for a course in C, C++, and R focusing on implementating iterative statistical algorithms such as EM, Gibbs sampling, and Metropolis-Hastings. The result was an informal summer class sponsored by IQSS and taught by recent Department of Statistics graduate Gopi Goswami. Professor Goswami created (from scratch) class notes, problem sets, and sample programs as well as compiling lists of web links and other useful materials. Course participants came from, among other places, Statistics, Biostatistics, Government, Japanese Studies, the Medical School, the Kennedy School, and Health Policy. For those interested in the lecture slides and other materials Professor Goswami compiled, the link is here. Principal among the subjects taught in the course was how to marry R's data-processing and display capabilities to an iterative inferential engine (try saying that phrase quickly three times) such as an EM or a Gibbs, with the latter written in C or C++ so as to increase (vastly) the speed of runs. In other words, we learned how to have R do the front end (data manipulation, data formatting) and back end (analysis of results, graphics) of an analysis while letting a faster language do the hard work in the middle.

The course both demonstrates and facilitates a growing trend in the quantitative social sciences toward making open-source software stemming from scholarly publications freely available to the academic community. Two examples from the ever-expanding field of ecological inference are Gary King's EI program, based on a truncated bivariate normal model and implemented in GAUSS, and Kosuke Imai and Ying Lu's implementation of a Dirichlet-process-based model), implemented with an R-C interface.

The trend toward freely available, model-specific software has obvious potential upsides. Previously written code can save the time of a user interested in applying the model. Moreover, if the code is used often enough and potential bugs are reported and fixed, the software may become better than what a potential user could write on his or her own. After all, few of us interested in answers to real-world issues want to spend the rest of our lives coding in C.

Nevertheless, I confess to a certain amount of apprehension. For me at least, freely available, model-specific software provides a temptation to use models I do not fully understand. Relatedly, I often think that I do understand a model fully, that I grasp all of its strengths and weakness, only to discover otherwise when I sit down to program it. Finally, oversight, hubris, or a desire to make accompanying documentation readable may cause the author of the software not to describe fully details of implementation or compromises made therein. Thus, while I am excited by the possibilities freely available social science software holds, I worry about the potential for misuse as well.

Posted by James Greiner at 6:00 AM

November 23, 2005

AIDS And African Economies

Eric Werker (guest author)

I enjoyed the chance to present a work in progress that attempts to measure the impact of AIDS on the economies and populations in Africa at the Applied Statistics Workshop on Wednesday, November 9. Given the possibility for some omitted variable to influence both the national AIDS rate and economic performance or some other outcome variable, I chose to pursue an instrumental variable strategy using variations in the male circumcision rate (which the bulk of the medical literature on this subject believes to have a causal impact on the spread of HIV/AIDS). Comments from the audience were useful and illuminating, and the debate was most interesting around potential violations of the exclusion restriction as well as the use of 2SLS in a small sample setting.

(Blogger's note: For more on this talk, see here and here.)

Posted by James Greiner at 5:47 AM

November 22, 2005

Experts and Trials III: More Noise

John Friedman

In my previous two posts here and here, I discussed some of the game-theoretic reasons why lawyers' choice of experts in cases might only add noise to the process. In this post, I will draw on my own experience on a jury, evaluating expert witnesses, to speak to further pitfalls in our system.

First, some background on my case: I was on a jury for a medical malpractice trial, essentially deciding whether a tumor, which later killed the patient, should have been spotted on an earlier X-ray. The "standard of care" to which we were to hold the doctors in question was a completely relative metric: Did the doctors provide the level of care "expected" from the "ordinary" practicing radiologist. Predictably, radiologists testified for both the plaintiff and the defense, each claiming that it was obvious that the defendants violated/met the relevant standard of care.

My position, as might be expected given my earlier posts, was that these two experts, on net, provided very little information on the culpability of the defendants. For all I knew, 99% of qualified doctors could have believed these doctors were negligent, or not negligent - how would I ever know? Since my prior was uninformative in this case, I had no choice but to find for the defendants for lack of evidence in either direction.

My fellow jurors, however, had far stronger opinions. Many tended to believe or disbelieve an expert witness for irrelevant reasons. For instance, the physical attractiveness, speech pattern, and general "likeability" played a great role. Furthermore, the experts usually made or lost ground on their ability to explain the basics of the science underlying the issue at hand - the mechanics of an X-ray, for instance - to the jury. Of course, these basics were not in dispute by any party in the case. And, as any student at Harvard University knows, a witness's ability to clearly and succinctly explain the basics need not be related at all to her expertise in the field! That these facts influence juries should be of no surprise to anyone familiar with trials; the existence of an entire industry of "jury consultants," the legal equivalent of marketing professionals, should be evidence enough that these issues of presentation matter a great deal.

Finally, even after the experts presented their cases, the priors of some jurors seemed to greatly affect their opinions of the case. Though jurors are screened for such biases, the test cannot be perfect. I often found jurors relating personal experiences with radiologists as evidence for one side or another. Given my arguments above about the lack of information from experts, perhaps it is not surprising that priors mattered as they did, but this seemed to further add noise into the process.

In the end, I supported my jury's decision in this case. But I could not help feeling that it was simply by random chance, by a peculiar confluence of misinterpretation and biases, that we had reached the right decision.

Posted by James Greiner at 4:03 AM

November 16, 2005

Experts and Trials II: True Opinion & Slant

John Friedman

I ended my last post by showing, in the context of the brief model I sketched, what the optimal outcome would look like. In practice, though, the court suffers from two problems.

First, it cannot conduct a broad survey, but must instead rely on those testimonies presented in court. Each side will offer an expert whose "true opinion" is as supportive of their argument as possible, regardless of whether that expert is at all representative of commonly accepted views in the field. Second, the court cannot distinguish between an expert's true opinion and her "slant." Experts probably suffer some cost for slanting their views away from their true opinions, so one should not expect most slants to be large. But the legal parties will look to pick experts who suffer as little a cost from slanting as possible, so that, in equilibrium, the slants could be quite large.

Given these strategies from the legal parties, what does the court see? Each side presents an expert (or slate of experts) with the most favorable combination of "true opinion" and "slant." Even if the court could disentangle the two components of testimony, the court would only see the endpoints of the distribution of "true opinions" among the potential pool of experts. But since they cannot even distinguish the slant, the court actually sees only a noisy signal of the extremes of the distribution.

Finally, I have already argued that the experts chosen will be those most able (or willing) to slant their opinions, so that the ratio of signal to noise – or of "true opinion" to "slant" - for the experts will be very low, in expectation. When the court performs the required signal extraction problem, very little signal remains. Because of the optimizing action of each party, the court will draw very little inference from any of the witnesses in many cases, ironically nullifying the effect of the efforts of the experts. No one deviates from this strategy, though; if one side presented a more representative expert, while the other played the old strategy, the evidence would appear lopsided.

I noted in my last post that the "first-best," or socially optimal solution, would be for the court to collect a representative sample of the opinions of experts for their decision. Even when the parties present their own experts, each side would be better off if they could somehow commit not to use "slant" in their expert's opinions, since the decision in the case would be less noisy. But the structure of the problem makes such an agreement impossible.

Jim is correct when he remarks that, given the adversarial nature of the legal system, expert testimony could not happen any other way. We should not celebrate this fact, though; rather, we should mourn it. We are stuck in a terrible equilibrium.

Posted by James Greiner at 4:59 AM

November 15, 2005

Spatial Error

Sebastian Bauhoff

This entry follows up on earlier ones here and here on spatial statistics and spatial lag, and discusses another consequence of spatial dependence. Spatial error autocorrelation arises if error terms are correlated across observations, i.e., the error of an observation affects the errors of its neighbors. It is similar to serial correlation in time series analysis and leaves OLS coefficients unbiased but renders them inefficient. Because it's such a bothersome problem, spatial errors is also called "nuisance dependence in the error."

There are a number of instances in which spatial error can arise. For example, similar to what can happen in time series, a source of correlation may come from unmeasured variables that are related through space. Correlation can also arise from aggregation of spatially correlated variables and systematic measurement error.

So what to do if there is good reason to believe that there is spatial error? Maybe the most famous test is Moran's I which is based on the regression residuals and is also related to Moran's scatterplot of residuals which can be used to spot the problem graphically. There are other statistics like Lagrange multiplier and likelihood ratio tests, and each of them has different ways of getting at the same problem. If there is good reason to believe that spatial error is a problem, then the way forward is either model the error directly or to use autoregressive methods.

In any case it's probably a good idea to assess whether spatial error might apply to your research problem. Because of it's effect on OLS, there might be a better way to estimate the quantity you are interested in, and the results might improve quite a bit.

Posted by James Greiner at 3:54 AM

November 10, 2005

Experts and Trials I: Game Theory

John Friedman

No sooner had the recent posts on this blog by Jim Greiner about the use of statistics and expert witnesses in trials
(see here and here, as well as yesterday's' post) piqued my curiosity than I was empanelled on a jury for a 5-day medical malpractice trial. This gave me ample time to think through some of the issues of statistics and the law. I will spend my next posts discussing these issues from three different perspectives: the game-theoretic, the experiential, and the historical.

I first approach this problem from a game-theoretic framework. In Jim's second post, he spoke about how, in our adversarial legal system, an expert for one side tends to interpret the facts in the way most favorable for that side, without compromising her "academic integrity." He then listed several reasons why this might actually be best for the system. I tend to disagree on this final point; instead, I believe the adversarial nature of the system pushes us into a very bad situation.

To give my argument focus, we must first pin down the concept of "equilibrium." An equilibrium of a game is a strategy for each player such that, given the other players' strategies, the player is maximizing her return from the game. In this case, the game is relatively simple: Two parties to a lawsuit are the players, each with a set of expert testimonials interpreting the relevant statistics in the case (which makes up the strategy). We can represent the net message from the expert testimony for each side as a number on the real line: The more positive the number, the more pro-plaintiff the testimony.

We must make some simplifying assumptions to analyze this problem. Let us assume that the testimony for each side comprises two components: the "true opinion" and the "slant." When added together, "true opinion" + "slant" = testimony. (For simplicity, let us assume that these numbers are the actual impact of the testimony. Thus, if a testimony seems too biased and is discounted, the true number would not lie far from zero). In an ideal world, the court (either judge or jury) would survey the "true opinions" of many experts in the field; if enough opinions were positive, the case would go for the plaintiff. Economists often refer to such a case as the "first-best," the socially optimal outcome.

Many games do not yield the socially optimal outcome, though. Both parties can even be worse off playing the equilibrium strategies than if each played some other strategy, despite the fact that each party maximizes her payoff given the other players strategy. A classic example of such a situation is the "Prisoner's Dilemma." In my next post, I will explore how, in this legal setting, exactly this tragedy occurs.

Posted by James Greiner at 5:50 AM

November 7, 2005

Evolutionary Thoughts on Evolutionary Monte Carlo

Gopi Goswami

Thanks a lot to Mike Kellerman for inviting me over for the talk on Oct 26, 05 at the IQSS (see here for details). I really enjoyed giving the talk and getting interesting comments and questions from the audience. In particular, Prof. Donald Rubin, Prof. Gary King and others made important contributions which I really appreciate. Prof. Kevin Quinn gave me some excellent suggestions on how to improve the structure of the talk which I think will turn out to be very helpful in the near future when I prepare for the job market. In fact, along those lines, if anyone may have any inputs/suggestions/comments on the presentation please feel free to send them to me at goswami@stat.harvard.edu.

Here are some afterthoughts on the talk. The PBC (Population Based Clustering) moves I presented, namely, SCSC:TWO-NEW, SCSC:ONE-NEW and SCRC are new and they are very specific to the sampling based
clustering (which is a discrete space) problem. I haven't been successful in devising similar moves in dealing with general sampling problem on a continuous space. In the Evolutionary Monte Carlo (EMC) literature these types of moves are also called "cross-over" moves because these moves take two chromosomes (or states of two chains)
which are called two "parents" and implement some cross-over type operation with the parents to produce two chromosomes (or proposed states of two chains) which are called "children."

The main motivation behind devising the above mentioned moves, as I mentioned in the talk, is that we were looking for moves which propose to update "more than one coordinate but not too many" at a time. Gibbs sampler proposes one coordinate at a time update. This is the main reason why Jain and Neal (A Split-Merge Markov Chain Monte Carlo
Procedure for the Dirichlet Process Mixture Model with Radford M. Neal, Journal of Computational and Graphical statistics, volume 13, No. 1, pp. 158-182 . (2004)) proposed their sampler which updates more than one coordinates at a time but it does so for one too many of them. To counter this problem we proposed the above mentioned PBC moves which are kind of a middle ground between the Gibbs sampler and the Jain-Neal sampler.

The other main issue addressed by the two moves, namely, SCSC:ONE-NEW and SCRC, is that "they produce only one new child" after "cross-over." To expand on this, we note that since all the PBC moves, the mentioned ones included, are Metropolis-Hastings type moves, two "children" have to be produced to replace the parents so as to maintain reversibility or detailed balance. But the children produced by two good parents are usually not good enough, and one does not want to throw away some good parent by chance. Thus, it has long been desired to design some moves that both can take advantage of the "cross-over" strategy and can keep some good parent. Our new moves are the first such in the literature.

Lastly, some members of the audience in the talk were worried about the temperature placement problem in the parallel tempering set up. Prof. Jun Liu and I proposed a first cut solution to the problem which solves the problem in two steps. First, we determine the highest temperature to be used in the ladder, namely, $t_1 = \tau_{max}$. Next, we look at the length and the structure of the ladder i.e. the placement of the intermediate temperatures within the
range $(\tau_{min}, \tau_{max})$. You can find the details of this the paper at my website by clicking on "On Real-Parameter Evolutionary Monte Carlo Algorithm (ps file) [submitted]":

Posted by James Greiner at 5:42 AM

November 3, 2005

Expansion of Economics

John Friedman

In my last post, I wrote about the methodological identity of economics and some of the corresponding advantages. But perhaps the greatest benefit to economists from this definition of the discipline is the great range of subjects on which one can work.

There are, of course, areas of inquiry traditionally dominated by economists – monetary policy, or the profit-maximizing activities of companies, to name a few – and most people connect economics, as a field, to these subjects. Increasingly, though, economists are venturing further afield. Steven Levitt’s best-selling book, Freakonomics, exemplifies this trend, using the tools of economics to investigate corruption in sumo wrestling, cheating in Chicago schools, and ethnic names, to name a few. While Levitt currently sits farther from the mainstream than most economists, his work appears to be not a randomly scattered shot but rather the vanguard of a new generation of scholars.

What are the consequences of this expansion of economics across the social sciences? The increasing incidence of economists working on problems traditionally associated with other fields will, no doubt, create some conflict in the coming years. No local baron, ruling a fiefdom of land or knowledge, savors a challenge over his turf. And the “imperial? economists, many of whom view other fields as weak and primed for colonization, will surely disrespect the vast contributions of non-economists to date. But despite the inevitable (but still unfortunate) conflicts of ego, the majority of these interactions should be not only of great benefit to the world but also a wondrous sight to see. Nothing in academia is quite so spectacular as the collision of two great points of view, obliterating long-held dogmas and, in the heat of debate, forging new paradigms for generations to come.

As a young economist, I look forward to following (and even contributing to) these great arguments to come. And I hope that those of us writing this blog, viewing the questions in social science from diverse perspectives, can give you a look at the current state of these debates.

Posted by James Greiner at 4:00 AM

November 2, 2005

Human Statistical Learning

Amy Perfors

If it's of interest, I will be blogging every so often about the numerous ways that humans seem to be remarkably adept statistical learners. This is a big question in cognitive science for two reasons. First, statistical learning looks like a promising approach to help answer the open question of how people learn as well and as quickly as they do. Second, better understanding how humans use statistical learning may be a good way to improve our statistical models in general, or at least investigate in what ways they might be applied to real data.

One of the more impressive demonstrations of human statistical learning is in the area usually called "implicit grammar learning." In this paradigm, people are presented with strings of nonsense syllables like "bo ti lo fa" in a continuous stream for a minute or two. One of the first examples of this paradigm, by Saffran et. al., studied word segmentation -- for example, being able to tell that "the" and "bird" are two separate words, rather than guessing it is "thebird" or "theb" and "ird." If you ever listen to a foreign language, you realize that word boundaries aren't signaled by pauses, which is a huge problem if you're trying to learn the words. Anyway, in the study, syllables occurred in groups of three, thus making "words" like botifa or gikare. As in natural language, there was no pause between words; the only cues to word segmentation were the different transition probabilities between syllables -- that is, "ti" might be always followed by "fa" but "fa" could be followed by any of the first syllables of any other words. Surprisingly, people can pick up on these subtleties: adults who first heard a continuous stream of this "speech" were then able to identify which three-syllable items they heard were "words" or "nonwords" in the "language" they had just heard. That is, the people could correctly say that "botifa" was a word, but "fagika" wasn't, at an above chance level. Since the only cues to this information were in the transition probabilities, people must have been calculating those probabilities implicitly (none had the conscious sense they were doing much of anything). Most surprisingly of all, the same researchers demonstrated in a follow-up study that even 8-month old infants can use these transitional probabilities as cues to word segmentation. Work like this has led many to believe that statistical learning might be one of the most powerful resources infants use during the difficult problem of language learning.

From the modeling perspective, this result can be captured by Markov models in which the learner keeps track of the string of syllables and the transition probabilities between them, updating the transition probabilities as they hear more data. More recent work has begun to investigate whether humans are capable of statistical learning that cannot be captured by a Markov model -- that is, learning nonadjacent dependencies (dependencies between syllables that do not directly follow each other) in a stream of speech. For instance, papers by Gomez et. al. and Onnis et. al. provide evidence that discovering even nonadjacent dependencies is possible through statistical learning, as long as the variability of the intervening items is low or high enough. This has obvious implications for how statistical learning might help in acquiring grammar (in which many dependencies are nonadjacent), but it also opens up new modeling issues, since simple Markov models are no longer applicable. What more sophisticated statistical and computational tools are necessary in order to capture own unconscious, amazing abilities?

Posted by James Greiner at 4:20 AM

November 1, 2005

Judge Alito & Statistics

Jim Greiner

Social science statistics is everywhere. So is law. And both are tangled up with each other. I was forcefully reminded of these facts when my wife pointed out an article on Salon.com about an opinion Samuel Alito (as of yesterday, a nominee to the Supreme Court) wrote while a judge on the United States Court of Appeals for the Third Circuit in a case called Riley v. Taylor. The facts of the specific case, which concerned the potential use of race in preemptory challenges in a death penalty trial, are less important than Judge Alito's approach to statistics and the burden of proof.

Schematically, the facts of the case follow this pattern: Party A has the burden of proof on an issue concerning race. Party A produces some numbers that look funny, meaning instinctively unlikely in a race-neutral world, but conducts no significance test or other formal statistical analysis. The opposing side, Party B, doesn't respond at all, or if it does respond, it simply points out that a million different factors could explain the funny-looking numbers. Party B does not attempt to show that such innocent factors actually do explain the observed numbers, just that they could, and that Party A has failed to eliminate all such alternative explanations.

Such cases occur over and over again in cases involving employment discrimination, housing discrimination, preemptory challenges, and racial profiling, just to name a few. When discussing them, judges inevitably lament the fact that one side or the other did not conduct a multiple regression analysis, as if that technique would provide all the answers (Judge Alito's Riley opinion is no exception here).

The point is, of course, that how a judge views such cases has almost nothing to do with the facts at bar and everything to do with a judge's priors on the role of race in modern society. For judges who believe that race has little relevance in the thought processes of modern decision makers (employers, landlords, prosecutors, cops), Party A in the above situation must eliminate all potential explanatory factors via (alas) multiple regression in order to meet its burden of production. For judges who believe that race still matters, Party B must respond in the above situation or lose the case. Judge Alito's Riley opinion demonstrates where he stands here.

Is there a middle way? Perhaps. In the above situation, what about requiring some sort of significance test from Party A, but not one that eliminates alternative explanations? In the specific facts of Riley, the number-crunching necessary for "some sort of significance test" is the statistical equivalent of riding a tricycle: a two-by-two hypergeometric with row totals of 71 whites and 8 blacks, column totals of 31 strikes and 48 non-strikes, and an observed value of 8 black strikes yields a p-value of 0.

Posted by James Greiner at 3:58 AM

October 28, 2005

More Questions About Balance (And No Answers)

Jim Greiner

The recent posts on achieving good balance within matching have stimulated a certain amount of interest. To this debate I offer more questions and, alas, no answers, which are what I'd really like to know. (For what it's worth, I am not doing research in this area. All of my questions are genuine, not rhetorical.)

As I understand it, the genetic algorithm that Diamond and Sekhon favor searches for matches that minimize p-values from hypothesis tests. The subject of the hypothesis tests are the covariates, taken one at a time, and the two-way interactions, also taken one at a time.

My questions:
Is the objective in matching treated and control units to find sets of observations with the same JOINT distribution of the covariates, which is what one would have in a randomized experiment?

If so, do we expect achieving balance in all univariate (i.e. marginal) and two-way distributions to accomplish this goal, given that the marginal distributions of any multidimensional random vector do not determine the joint? On the other hand, if two sets of random vectors have the same joint distribution, would we expect hypothesis tests applied to individual (univariate) covariates or their interactions to achieve p-values of .15 or greater?

Does the dimension of the vector (i.e. the number of covariates) play a role here, in that if we had 20 covariates, we would expect a comparison of individual covariates marginally to produce a few p-values of below .15? Perhaps more broadly, what theory tells us that the genetic algorithm search is actually attempting to do the right thing - and what is it?

A propensity score method has answers to some of these questions, though it raises others. On the plus side, the theorems say that observations with the same propensity score have the same joint (not merely marginal) distribution of the covariates. Thus, if the goal is to replicate a randomized experiment's much-valued ability to produce observations with the same joint covariate distribution, conditioning on the true propensity score will do that. That's the theory that tells us what propensity score matching is attempting to do is the right thing. The problem is, of course, that in any case that matters, we don't know the true propensity scores, and estimation of them raises profound questions about model fit and adequacy. One can check disparities in marginal distributions, but for the reasons stated above, such checks are not really enough. A question for advocates of propensity scores is the following: if propensity score matching is designed to reduce dependence on the substantive model that relates outcomes to covariates, does it do so only by inducing dependence on proper specification of the propensity score model?

For those who would eschew hypothesis tests in assessing balance (see yesterday's post), how does one assess balance? True, one can always reduce the power of any test to reject a null by discarding observations (I have heard that K-S in particular has low power), but any comparison of distributions rests on some set of criteria. Looking at t-scores is a hypothesis test (how else would one decide when the set of scores is too big or too small?). Are hypothesis tests the worst method of assessing balance, except for all of the others?

I have only one suggestion on this subject: whatever method one uses to create matched sets of treated and control groups, after all ordinary checking of marginal distributions is complete, throw something completely wild at the results. For both groups, calculate a fifth moment of covariate one, interact it with a third moment of covariate two and a second moment of covariate three. Do a test and see what happens. If the two groups have the same joint distribution of their covariates . . . .

Posted by James Greiner at 3:19 AM

October 26, 2005

Did You Achieve Balance?! Part II

Jens Hainmueller

Continuing from yesterday's post, another popular way to test balance is to examine standardized differences (SDIFF) between groups (Rubin and Rosenbaum 1985). SDIFF capture the difference in means in the matched samples, scaled by the square root of the average variance in the un-matched groups. This test has been criticized for the lack of formal criteria for judging the size of the standardized bias. Moreover, it may be open to manipulation as one can add observations to the control group in order to decrease variance in the denominator (Smith and Todd 2005).

Staying in the realm of univariate balance tests, some claim that difference in means tests are insufficient and that Kolmogorov-Smirnov (KS) tests are needed to non-parametrically test for the equality of distributions (Diamond and Sekhon 2005). These KS tests need to be bootstrapped, by the way, to yield correct coverage in the presence of point masses in the distributions of the covariates (Abadie 2002). Again, these tests would substantially increase the balance hurdle. Are they necessary for reliable causal inference?

Apart from univariate tests there are also some multivariate balance tests floating around in the literature such as the Hotelling T^2 test of the joint null of equal means of all covariates, multivariate (bootstrapped) Kolmogorov-Smirnov (KS) and Chi-Square null deviance tests based on the estimated assignment probabilities, as well as various regression-based tests for joint insignificance, etc. Which of these tests is preferable in what situation? What is the relationship between uni- and multivariate balance?

Last but not least, there is the thorny question of significance levels. Is a p-value of 0.10, let's say against the null of equality of means, high enough for satisfactory balance? Is .05 permissible? There is evidence that conventional significance standards are too lenient to obtain reliable causal inference in the canonical LaLonde data set (Diamond and Sekhon 2005).

These are too many questions to which I do not know the answers. The current lack of a scholarly standard for covariate balance strikes me as troubling, because balance affects the quality of the causal inferences we draw. I think it is important to bring the balance issue to the forefront of the matching debate. That is why Jas Sekhon and I are currently working on a paper on this topic. Suppose you are reviewing a matching article. What does it take to convince you that the authors "achieved balance"? Please feel cordially invited to join the debate.

Posted by James Greiner at 4:08 AM

October 25, 2005

Did You Achieve Balance?! Part I

Jens Hainmueller

There exists a growing consensus in the causal inference literature that when it comes to bias adjustment under selection on observables, matching methods dominate ordinary regression (esp. when discrepancies between groups are large). But how do we judge the quality of a matching? My professors tell me: "We want good balance." Sounds great, so I thought at first. Reading more matching articles, however, I soon became somewhat startled by the scholarly disagreement about what actually constitutes "good" balance in observational studies. Despite the fact that matching methods are now widely used all across the social sciences, we still lack shared standards for covariate balance: Which tests should be used in what type of data? What are their statistical properties and how do they compare to each other? And how much balance is good enough?

From reading this literature (sincere apologies if I have missed something relevant), it seems to me that most people agree that paired t-tests for differences in means are obligatory. T-tests are useful because matching by construction produces matched pairs. But should we test by comparing whole groups (treated vs. matched-untreated) or within propensity score ("PS") subclasses? A problem with the latter may be that the choice of intervals can be arbitrary, which is critical as interval width affects the power of the test (Smith and Todd 2005).

Moreover, which covariates should we t-test balance on? At least all that are included in the matching (right?), but how about other moments, the full set of interactions and higher-order terms, etc? The latter seems helpful to minimize bias but is done once in a blue moon (at least in the papers that I encountered). Most authors avoid these additional tests since they exacerbate common support problems and substantially raise the hurdle for obtaining balance.

Finally, should we t-test balance on the PS score and or the covariates othorgonalized to the PS score? How do we deal with the estimation uncertainty in these variables? And what does it mean -- as happens sometimes in practice -- to have remaining imbalance on the PS while all covariates are balanced?

Stand by for part II of this post tomorrow.

Posted by James Greiner at 5:00 AM

October 19, 2005

Social Science and Litigation, Part II

Jim Greiner

Professor Kousser’s 1984 article on objectivity in expert testimony, which I first introduced to the blog here, raises fundamental questions about the role of expert witnesses in litigation. Among those questions is the following: when presenting conclusions to a court, how much are expert witnesses entitled to rely on the adversarial process that is the foundation of lawsuits? Some experts appear to believe that their job is to present the best statistical, engineering, chemical, or whatever, case for their sides. Of course, they would not perjure themselves. Still, such witnesses do not attempt to provide a balanced look at the factual information to be evaluated; rather, they focus on demonstrating how the relevant data can be interpreted in favor of the parties retaining them. After all, the opposing sides have their own lawyers and, ordinarily, its own experts who (surely) are doing the same thing.

To make matters more concrete, I provide the following simplified example. My colleagues and I retained a quantitative expert in a redistricting case to measure the partisan bias of several proposed redistricting plans. We used a measure of bias that assigned a score to each plan; a score of zero meant no bias, while a score of two meant roughly that the plan would give one party two “extra? seats. The (litigation) difficulty we ran into was that the scores did not appear to distinguish the plan we favored from the one the other side proposed. The bias in our plan was, say, .03, while that of the other side was something like .15. Thus, the difference in bias between the two plans was approximately one tenth of one seat. But our expert, at our prompting, presented the results differently: he emphasized the other side’s plan was five times more biased than our own.

Before dismissing this story, and the view of the expert as an extension of trial counsel, with a snort and a shake of the head about the lack of ethics in modern society, consider how the structure of the litigation process favors such choices. At trial, an expert (just like any other witness) is not allowed to relate his or her views directly to the court. Rather, the expert speaks to the judge or jury only in response to questions from lawyers under the duty to advocate their respective clients' cases, that is, the duty NOT to be neutral. Before trial, an expert who has consulted with a party to litigation may not be retained by the opposing party. And trial counsel, not the witness, decides whether the expert speaks to the court at all.

There are good reasons for all of these rules. The rule requiring testimony to come in response to questions from an attorney prevents witnesses from testifying about subjects deemed inadmissible (opposing counsel can object between question and answer). With respect to the prohibition on consulting with one side and then working for the other, experts who have consulted for Side A learn about Side A’s case in a way that Side B might pay handsomely to discover. But if, as many in the legal profession appear to believe, expert witnesses really are whores, could it be otherwise as litigation is presently structured?

Posted by James Greiner at 6:04 AM

October 18, 2005

A Social Science of Architecture

Gary King

After eight years of learning something about architecture (from Harry Cobb and his team) and extensive programmatic planning, the Institute for Quantitative Social Science this semester moves into the new Center for Government and International Studies buildings. Our official address is the Third Floor of 1737 Cambridge Street (the design is vaguely reminiscent of the bridge of the Starship Enterprise), although we also occupy some of the other floors and some of the building across the street. It is not really finished yet, but it is a terrific facility, with floor to ceiling windows in most offices, a wonderful seminar room for our Applied Statistics Workshop, and many other useful features. Perhaps even more remarkably, everyone seems to love it (Congratulations Harry!).

One issue I learned during this long process was how the field of architecture has the best science, engineering, and art, but very little modern social scientific analysis. Yet, social science, quantitative social science in particular, could greatly help architecture achieve its goals, I think. Ultimately the goal of this particular $100M-plus building, and of most buildings built by universities, is not only to create beautiful surroundings but also to increase the amount of knowledge created, disseminated, and preserved (my summary of the purpose of modern research universities). So do not limit yourself to asking how a building makes you feel, what architectural critics might think, how it fits in with the style of other buildings on campus, or whether your office is to your liking. Ask instead, or in addition, whether the building increases the units of knowledge created, disseminated, and preserved more than some other building or some other potential use for the money. This strikes me as the central question to be answered by those who decide what buildings to build, and yet the systematic scientific basis for this decision is almost nonexistent.

As such, some systematic data collection could have a considerable impact on this field. Do corridors or suites make the faculty and students produce and learn more? Does vertical circulation work as well as horizontal? Should we put faculty in close proximity to others working on the same projects or should we maximize interdisciplinary adjacencies? Which types of floor plans increase interaction? Which types of interaction produce the most knowledge created, generated, and preserved? Do we want to build buildings that encourage doors to be kept open, so as to make the faculty seem approachable or should we try to keep doors closed so that they can get work done? In this field as in most others, a great deal can be learned by directly measuring the relevant outcome variable; in architecture, quite remarkably, this has only rarely been attempted.

Of course it is done all the time via qualitative judgments, but in almost every field of science where a sufficient fraction of information can be quantified, statistical analysis beats human judgment. There is no reason to think that the same kind of statistical science wouldn't also create enormous advances here too.

I have heard of a couple of isolated academic works on this subject, but we're talking about some of the most important and expensive decisions universities make (and among the biggest decisions businesses, and many other institutions make too). There should be an entire subfield devoted to the subject. All it would take is some data collection and analysis. Outcome measures could include, for example faculty citation rates, publications, awards, grants, and departmental rankings, along with student recruitment, retention, graduation, and placement rates. The key treatment variables would include various information on the types of buildings and architectural design. Random assignment seems infeasible, but relatively exogenous features might include departmental moves or city and town building restrictions. Universities that allow faculty the choice of buildings could also provide useful revealed preference measures. I would think that a few enterprising scholars on this path could have an enormous impact both in creating a new academic subfield and in improving a vitally important set of university (and societal) decisions.

In the interm, we'll enjoy the new buildings and hope they have a positive impact.

Posted by James Greiner at 5:28 AM

October 17, 2005

Ideal Points

Michael Kellermann

One of the goals of this blog is to promote dialog between people working in different social science disciplines. As part of that, we have been posting reports from the Political Methodology conference in Tallahassee. Of course, even though we may all speak the same statistical language, we often speak it with distinct accents; similar concepts and methods often go by different names in different fields. For example, it turns out that estimating the ideal points of political actors is similar in many ways to the problem of estimating the difficulty of question on standardized tests, a commonality that has only been exploited in the last few years.

First things first, however; what exactly is an ideal point? People have long thought about politics in spatial terms: "left" and "right" have been used to describe political preferences since at least the French Revolution, when royalists sat on the right and radicals on the left in the Legislative Assembly. Ideal point models attempt to estimate the position of each legislator on the left-right or other dimensions using the votes that they cast on legislation. Basically, the models assume that a legislator will vote in favor of a motion if it moves policy outcomes closer to their most preferred policy. The resulting estimates from these models provide a descriptive summary of the distribution of preferences within a legislature. They are also important parameters in many formal models of legislative behavior.

Much of the recent work in the area of ideal point estimation has drawn on earlier research by education scholars. Item response theory studies the relationship between the ability (and other characteristics) of test subjects and the answers they give to particular test questions. The general idea is that every test question has an associated ability cutpoint; those with ability above the cutpoint will answer correctly on average. In a typical testing situation, the authors will attempt to include questions with an array of cutpoints in order to estimate the ability of the test takers.

The analogy between ability estimation and ideal point estimation is close; votes in the legislature correspond to questions on the test. One difference is that, in the item response context, the researcher will typically know the correct answer and can therefore associate those responses with higher estimated ability. In the ideal point context, it is not always clear whether a proposal moves policy left or right. Several recent articles have addressed this and other problems in translating item response models to the political context, including work by Harvard's own Kevin Quinn with Andrew Martin (Martin and Quinn 2002) , Clinton, Jackman, and Rivers (2004), and Bafumi, Gelman, Park, and Kaplan (2005). Dan Hopkins described some recent work on ideal point estimation in an earlier post.

Posted by James Greiner at 4:19 AM

October 13, 2005

Dangerous Statistics: Estimating Civilian Losses in Afghanistan

Felix Elwert

There are tougher tasks than appeasing the human subject review board. A few weeks ago, I met Aldo Benini at the American Sociological Association annual meeting in Philadelphia. Benini has worked for various humanitarian organizations over the past decades and specializes in what strikes me as the most dangerous subfield of social science statistics: he collects, analyzes, and models data on the direct and indirect casualties of war.

I had come across Benini before when I saw a presentation on his work with the Global Landmine Survey, which involved building quantitative models to assist the ongoing mine cleanup in Vietnam. Recently, Benini has been working on estimating the number of civilian victims during the first nine months of Operation Enduring Freedom in Afghanistan following 9/11/01. There, field staff visited all 600 communities directly affected by fighting (both airstrikes and ground combat). This survey improves on previous estimates in the news – not least by being a virtual census of the affected communities, employing trained interviewers, and using standardized questionnaires. It’s hard for me to imagine more dangerous conditions of data collection (but, wait, Benini currently works on a similar project in Iraq).

The resulting study establishes a number of important findings. It’s also methodologically interesting. All told, 5,576 residents were killed violently between 9/11/01 and June 2002. Another 5,194 were injured. These numbers are considerably higher than previous estimates. I’m not going to rehash their entire analysis* here. But with respect to the methodological focus of this blog, I’d like to highlight the authors' conclusion that there's evidence that modern war apparently facilitates considerable underreporting of civilian losses.

*Including an interesting zero-inflated Poisson model for the concurrent and historical factors affecting the distribution of civilian victims in Afghanistan.

Posted by James Greiner at 4:50 AM

October 11, 2005

Spatial Lag

Sebastian Bauhoff

In my last blog entry (here), I wrote that associations like space can mess up the assumptions underlying standard estimation techniques. This entry is about the first problem I mentioned, spatial lag: when neighboring observations affect one another. Such dependencies can lead to inconsistent and biased estimates in an OLS model. And even if you don't care about "space" in a geographic sense, you might be interested in related topics like technology diffusion among farmers, network effects, countries that share the same membership in international organizations (an idea picked up in Beck, Gleditsch and Beardsley; see below) etc. The point is that spatial lag is pervasive in many contexts and though it might be called different names, the basic problem remains the same.

Spatial lag models are similar to lagged dependent variable autoregression models in time series analysis but the problem is that the correlation coefficient cannot be easily estimated. That's a problem because to estimate the coefficient, a spatial weights matrix is needed but it is often not clear what that matrix should look like, i.e., what the actual spatial relation is.

So how much can it matter? James LeSage (in an excellent guide to spatial econometrics and his MATLAB functions, also below) provides an example of OLS and spatial lag estimations of the determinants of house values. The idea is that -- apart from the influence of the independent variables like county population density or unemployment rates -- areas with high house values might be adjacent to other high value areas, and therefore there is a spatial trend in the outcome variable. The example shows that an interesting variable like population density can become statistically insignificant when spatial dependence is taken into account, and that coefficients of other variables can change in magnitude. In addition, taking spatial lag into account also improves the model fit.

So one should really take space into account if it matters. How would you know if it does? There are a number of tests to check for spatial lag, but for most part just starting to think about it helps.

For some more information of spatial lag, take a look at the sources mentioned:

-- James LeSage's Econometrics Toolbox (www.spatial-econometrics.com), which has an excellent workbook discussing spatial econometrics and examples for the MATLAB functions provided on the same site; and
-- Beck, Gleditsch and Beardley (draft of April 14, 2005) "Space is more than Geography: Using Spatial Econometrics in the Study of Political Economy" (http://www.nyu.edu/gsas/dept/politics/faculty/beck/becketal.pdf).

Posted by James Greiner at 5:51 AM

October 7, 2005

Social Science and Litigation, Part I

Jim Greiner

Over twenty years ago, J. Morgan Kousser wrote an article with the provocative title, “Are Expert Witnesses Whores? Reflections on Objectivity in Scholarship and Expert Witnessing? (6 The Public Historian 5 (1984)). In answering the rhetorical question largely in the negative, Professor Kousser recounted his own experience as an expert in litigation under the Voting Rights Act, an experience which, according to him, “afforded me the opportunity to tell the truth and do good at the same time.?

As a historian of southern politics specializing in the post-Reconstruction and Progressive eras, Professor Kousser had concluded that at-large voting systems had a racially discriminatory impact upon disfavored minority groups, and that such systems were adopted for exactly that purpose. Having written on the subject, he was “‘discovered’? by a civil rights attorney, retained, and stood ready to provide “window-dressing? in Section 2 cases challenging at-large systems when the Supreme Court decided Mobile v. Bolden, 446 U.S. 55 (1980). Without delving into legal technicalities, and oversimplifying somewhat, Mobile compelled Section 2 plaintiffs to produce evidence regarding the motives of those who adopted the voting schemes under challenge. In doing so, Mobile “made historians . . . necessary participants in voting rights cases? (at least until Congress removed the intent requirement by amending Section 2 in 1982), and so Professor Kousser ended up testifying in several pieces of litigation regarding the motives of those who adopted at-large voting systems and the effectiveness of such systems in achieving their framers’ desires. After examining various meanings of bias and objectivity, and the threats to the latter in both expert witnessing and researching, Professor Kousser concludes his article with the statement, “Testifying and scholaring are about equally objective pursuits.?

As a former litigator of employment discrimination and voting rights cases, I believe that Professor Kousser’s vision of an expert witness is one few lawyers would recognize. As a budding statistician interested in application of social science to the litigation setting, I assert (admittedly with slightly less certainty) that Professor Kousser’s narrative would be unfamiliar to most expert witnesses as well. Few attorneys discover expert witnesses who have spent years studying a question critical in a case they are litigating, fewer still an expert who has reached the “right? answer. It is rare that scholars, having reached conclusions after years of study and research for academic purposes, suddenly discover that the law has evolved in a way to make those conclusions relevant to pending (and, in Professor Kousser’s case, high-profile) litigation.

I’ll be using Professor Kousser’s article as a springboard for a discussion on the relationship among courts, litigators, and expert witnesses in several blog posts. As is true of all members of the Content Committee of this blog, I remain eager for responses and comments.

(It should go without saying that I do not intend in any way to question Professor Kousser’s honesty or integrity, either in the testimony he gave or in his 1984 article. In case it does not go without saying . . .).

Posted by James Greiner at 6:02 AM

October 6, 2005

A Bit on Human "Irrationality"

Amy Perfors

One of the key applications of cognitive science to the other social sciences can lie in testing some of the assumptions made about human psychology in other fields. A classic example of this is in economics: as I understand it, for a long time economists envisioned people as rational actors who act to increase their utility (usually measured by money) as much as they can. The classic results of Kahneman & Tversky, which earned the Nobel Prize, were among the first to show that, contrary to this assumption, in many spheres people act "irrationally." I am putting the word "irrational" in quotes because it's not that we act completely randomly or without motivation, simply that we do not always simply exist to maximize our utility: we use cognitive heuristics to calculate the value of things, we value money not as an absolute but with respect to many other factors (such as how much we already have, how things are phrased and sold to us, etc), and our attitudes towards money and maximizing are influenced by culture and the social situation. This means that models of human economic or group behavior are often only as good as the assumptions made about the people in them.

One researcher who studies these problems is Dan Ariely at MIT. In a recent line of research, he looks at what he calls two separate markets, the monetary and the social. The idea is that if people perceive themselves to be in a monetary market (one involving money), they are highly sensitive to the quantity of compensation, and will do less work if they receive less compensation. If, on the other hand, they perceive themselves to be in a social market (one in which no money is exchanged), they will not be concerned with the quantity of "social" compensation, such as the value of any gifts received.

I really liked this article, in part because (unusual for academic articles) it is kind of funny in places. For instance, their methodology consisted of having the participants do a really boring task and measuring how well their effort correlated to how much they were paid, in either a monetary or social market. The task is really grim: repeatedly dragging a computerized ball to a specific location on the screen. As the authors dryly state, "pretesting and post-experiment debriefing showed that our implementation continues in the grandest tradition of tasks that participants view as being utterly uninteresting and without any redeeming value." (I do not envy that debriefer!)

Funny parts aside, the point this research makes is really interesting: people approach the same task differently depending on what they think it is. When they are not compensated or compensated with a gift (a "social" exchange) they will expend a high amount of effort regardless of the value of the gift. When compensated with money or a gift whose monetary value they are told of, effort is proportional to the value of the compensation. Methodologically, this makes an important point -- if we want to model all sorts of aspects of the market or even social behavior, it's good to understand how our behavior changes as a function of how we conceptualize what is going on. From the cognitive science side, the question is why our behavior changes in this way, and in what instances this is so.

And the message for all of us? If we have a task we need help on, the authors suggest "asking friends and offering them dinner. Just do not tell them how much the dinner costs."

Posted by James Greiner at 6:24 AM

October 5, 2005

Economics as Methodology

John Friedman

Most disciplines define themselves through their field of inquiry; historians study events of the past and the evolving stories of those events, psychologists study the working of the mind, and political scientists study the interaction of governments and people. Economists take a different approach, though, identifying themselves not through subject matter but instead through methodology.

What are these tenets of methodology? While the precise delineation of one’s field is always a tricky matter, I believe most economists would agree on three basic principles: Preferences, Optimization, and Equilibrium. In essence, economics operates under the assumption that people know what they want and then do their best (given limited means) to get it. Given these foundations, mathematics helps to formalize our intuition, since choosing the best alternate can be rewritten as the maximization of a function, often named “utility.? In many cases, of course, people will fail miserably to achieve these goals. The problem might be a lack of information, or unforeseen costs, or any number of other obstacles; but, in economics, it cannot be that people simply do not want something that is better for them.

To many, this definition of economics will seem extraordinarily narrow, disallowing the study of a great many human phenomena. No doubt, in many cases, this observation is correct. But I believe it exactly this methodological focus that has laid the foundations for the great success of economics in the past 70 years. As a foundation, the framework is straightforward and intuitive; why would someone not want something that, by definition, they prefer? Furthermore, the mathematical expression of economics ideas – a direct result of the assumption of optimization – has helped to lay bare the assumptions lurking behind arguments with great speed. And while I freely admit that economics cannot capture all relevant aspects of human behavior, it would seem a fool’s errand to find a research design that could.

(A brief aside: Mathematics, in economics, is no more than a language for expressing ideas. It is extremely helpful in many situations, as is much jargon, for discussions among experts within the field. But, far too often, economists allow this language to become a barrier between them and the world. I suggest you hold all economists you meet to this standard: If they cannot explain the intuition behind an economic idea, using only standard English words, in five minutes, it is their fault and not yours!)

Posted by James Greiner at 5:58 AM

October 4, 2005

"On The Fringe": The Probability of God, An Initial Look

Drew Thomas

Stephen D. Unwin made headlines - at least, in the Odds and Ends section – two years ago, with the publication of his book "The Probability of God". His idea was to determine, using some numerical method, whether conditions on earth would be enough to predict whether the Judeo-Christian construction of God does indeed exist.

Thankfully, the book is classified as humor. The actual problem being solved is somewhat irrelevant to the greater community, since matters of faith are conducted in the absence of fact. But this does represent the fringe of our discipline, and how numbers are perceived in the real world.

In this "real world," there are too many examples of numbers distorted for the sake of an agenda. For example, that 4 out 5 dentists choose a particular toothpaste to endorse tells us nothing about the sample size (or about a possible line of dentists they tossed beforehand). Sports statistics are mangled and mishandled all the time without a mention of sample size concerns or actual relevance. (The misuse of numbers in society is a favorite theme of mine; keep looking for it in my entries.)

At least Dr. Unwin has not only a clearly stated agenda behind his work, but also a clearly stated method and an acknowledgement of subjectivity. Unwin’s calculation puts the probability of God’s existence at 67%; Richard Dawkins, the famed atheist, used the same method and obtained a result of 2% -- about 2% higher than Dawkins would otherwise be willing to admit.

Most of this information came from a radio interview with the good Dr. Unwin. Stay tuned for the book review and a look at his technique.

Posted by James Greiner at 6:07 AM

September 30, 2005

Use of Averaged Data; Mature Cohort Size as an Instrument for Inequality

Jong-Sung You

In my paper with S. Khagram entitled "A comparative study of inequality and corruption" (ASR 2005, vol.70:136-157), we demonstrated that data averaged for a long period (say, 1971-1996) instead of single-year data can be useful for both reducing measurement error and capturing a long-term effect.

In previous empirical studies of causes of corruption, income inequality was found insignificant. We suspected this lack of significance might be due to attenuation bias because income inequality was poorly measured. We found that using averaged data for inequality and other control variables increased the coefficient for inequality and made it significant.

Another result from this paper used "mature cohort size" (ratio of population 40 to 59 years old to the population 15 to 69 years old) as an instrument for inequality in IV regressions; again, inequality was found significant. Higgins and Williamson (1999) have previously studied the effect of cohort size on inequality. Because fat cohorts tend to get low rewards, when these fat cohorts lie at the top of the age-earnings curve, earnings inequality is reduced. When the fat cohorts are old or young adults, earnings inequality is augmented. Indeed, the mature cohort size is a powerful predictor of inequality
across countries.

Note that by "fat cohorts" and "slim cohorts" I mean the relative size of the cohorts. When the mature cohorts is fat, or the relative size of the mature cohort is large, the earns differential (earnings gap between the mature cohort and the others) is reduced and hence earnings inequality is reduced.

You can view my paper here.

Posted by James Greiner at 7:00 AM

September 29, 2005

Near, Far, Wherever You Are

Sebastian Bauhoff

Tobler's First Law of Geography states that "everything is related to everything else, but near things are more related than distant things." Obviously there are many examples -- an infection is more likely to spread to a nearby person than to a far away one, a new highway might depress house prices for people living right next to it, and so on. The point is that there can be important dependencies and heterogeneities that vary with space, among other associations. And in those cases the usual assumptions that observations or errors are independently distributed don't hold. Urgh. Welcome to the world of spatial statistics.

As an estimation problem this is often addressed through clustering methods. Households in a village with some infected persons are at higher risks than households in neighboring villages. Or are they really? Clustering works when the locations are relatively homogenous and separated. What if there is no good way to classify observations into clusters, for example, if an area is evenly populated? Or if the infected household lives right at the end of the village road, and some neighbors are in the other village? The administrative boundaries commonly used for clustering (village name) might not properly account for the actual proximity or whatever defines the space between the observations. If a transmitting mosquito wouldn't care much about the village name when deciding who to bite next, why should an analyst rely on it?

Using clustering may often be a good approximation but in some cases it's not good enough and there can be substantial spatial lags (observations are spatially dependent), spatial errors (error terms are related) and spatial heterogeneity (model parameters vary across space). Those can lead to biased estimates, inefficient ones, or both. The bad news is that those effects can matter a lot. The good news is that there are methods to test for spatial dependence and correlation, and estimation techniques to deal with them.

Of course the underlying interactions we are trying to better capture can be anything from linear to more complicated relations. It is unlikely that they are perfecrly well described by any abstract spatial model, so we will still need to make assumptions. But at least there are some methods that can handle cases where the usual assumptions fail, and they can make an important difference to the analysis. I will write more about them in later blog entries. Meanwhile you might be interested in the following texts:

-- James LeSage's Econometrics Toolbox (www.spatial-econometrics.com) has an excellent workbook discussing spatial econometrics and examples for the MATLAB functions provided on the same site
-- Anselin (2002) "Under the Hood: Issues in the Specification and Interpretation of Spatial Regression Models" Agricultural Economics 27: 247-267 provides a quick overview of the issues
-- Anselin (1988) Spatial Econometrics: Methods and Models is the classic and widely quoted reference for spatial statistics

Posted by James Greiner at 6:00 AM

September 28, 2005

Extreme Values

Michael Kellermann

Every year, the host university of the Political Methodology conference invites a local scholar from some other discipline to share his or her research with the political science methods community. This year's special presentation, by James Elsner of the Florida State University Department of Geography, was sadly prescient. Professor Elsner's talk, "Bayesian Inference of Extremes: An Application in Modeling Coastal Hurricane Winds," applied extreme value theory in a Bayesian context to estimate the frequency with which hurricanes above a given strength make landfall in the United States. The devastating impact of Hurricane Katrina amply illustrates the importance of estimating maximum intensities; news reports suggest that as little as a foot or two of water overtopping the levees and eroding them from below may have caused the breaches that flooded New Orleans.

Extreme value theory provides a way to estimate the distribution of the maximum or minimum of a set of independent events. While this could be done directly if the distribution of the underlying events was known, in practice it is preferable to use the extremal types theorem to estimate the distribution of the maximum or minimum directly from data. The theorem states that, with appropriate transformations, the distribution of extreme values converges in the limit to one of three classes of distribution - Gumbel, Frechet, or Weibull - regardless of the shape of the underlying distribution.

There are several challenges in estimating the distribution of extreme values. The three classes of limit distributions for extreme values have different behavior in the extreme tail: one family has a finite limit, while the other two have no limit but decay at different rates. To the extent that we are interested in "extreme" extremes, these differences could have substantive implications. Compounding this problem, observations in the extreme tail are likely to be sparse. Finally, one might expect that the quality of data is lower when extreme maxima or minima are occurring. Consider Katrina: most of the instrumentation for recording wind speeds, storm surge, and rainfall rates were knocked out well before the height of the storm. (Nor is this just a problem with weather phenomena; imagine trying to measure precisely daily price changes during a period of hyperinflation). The Bayesian approach pursued in this work seems promising, as is allows the uncertainty in both the data itself and in the functional form to be modeled explicitly.

In talking with other grad students after the presentation, I think the consensus was that, while interesting methodologically and sobering substantively, it was hard to see how we would apply these methods in our own work. A quick Google search suggests that this approach is (not surprisingly) well established in financial economics, but not much else from the social sciences. With a little more time to reflect, however, I think that this may be more due to a lack of theoretical creativity on our part. Coming from the formal side of political science, I could see how thinking about extreme values might provide some insight into how political systems are knocked out of equilibrium, much like the levees in New Orleans.

Posted by James Greiner at 6:00 AM

September 26, 2005

A multilevel analysis of the WVS/EVS data

Jong-Sung You

In my draft paper on the "correlates of social trust" (presented at the ASA conference, August 2005), I argued that fairness of a society such as freedom from corruption (fair administration of rules) and distributive fairness (relatively equal and unskewed distributions) affects the society's level of social trust more than its homogeneity does. Based on a multilevel analysis of data from the World Values Surveys (WVS, 1995-97, 2000-01) and the European Values Study (EVS, 1999), I found that corruption and inequality are significantly negatively associated with social trust controlling for individual-level factors and other country-level factors, while ethnic diversity loses significance once corruption or inequality is accounted for. Also, I found that the inequality effect is primarily due to the skewness of income rather than its simple heterogeneity, and that the negative effect of minority status is greater in more unequal and undemocratic societies.

The WVS and the EVS have been conducted in close cooperation with (almost) identical questions. The WVS (1995-97) covers 50 countries, and the WVS/EVS (1999-2001) covers 66 countries in all continents of the world. By pooling the 1995-97 data and the 1999-2001 data, I was able to increase the number of countries to 80. My literature review has unearthed few articles employing multilevel modeling in the comparative politics or sociology literatures. I suspect the scarcity of adequate multilevel data is one reason for this. Schofer and Fourcade-Gourinchas (2001) used the 1991 WVS in a multilevel analysis of the "structural contexts of civic engagement," but the country coverage was just 32. Although they had a lot of observations at the individual level, the relatively small N at the country level prevented them from including many explanatory variables at the country level. Now, with a relatively large number of countries, the WVS/EVS data seems to be an ideal dataset for which many interesting multilevel analyses can be conducted.

Since my draft is rough, I will welcome any comments, either methodological or substantive. You can find a draft here.

Posted by James Greiner at 7:00 AM

September 23, 2005

Cog Sci Conf

Amy Perfors

The annual meeting of the Conference of the Cognitive Science Society took place in late July. Amid a slew of interesting debates and symposia, one paper stood out as having particularly interesting implications from the methodological perspective. The paper, by Navarro et. al., is called "Modeling individual differences with Dirichlet processes" (pdf found here).

The basic idea is that many questions in cognitive and social science hinge on identifying which items (subjects, features, datapoints) belong to which groups. The individual difference literature is replete with famous psychological theories along these lines: the factors contributing to IQ, the different "personality types", the styles of thought on this or that problem. In cognitive science specifically, the process of classification and categorization - arguably one of the more fundamental of the mind's capabilities - is basically equivalent to figuring out which items belong to which groups. Many existing approaches can capture different ways to assign subjects to groups, but in almost all of them the number of groups must be prespecified - an obvious (and large) limitation.

A Dirichlet process is a "rich-get-richer" process: as new items are seen, they are assigned to groups proportional to the size of the group, with some nonzero probability alpha of forming a new group. This naturally results in a power-law (Zipfian) distribution of items, which parallels the natural distribution of many things in the world. It also often seems to form groups that match human intuitions about the "best" way to split things up. Dirichlet process models, often used in Bayesian statistics, have been around in machine learning and some niches of cognitive science for at least a few years. However, the Navarro article is one of the first I'm aware of that (i) examines their potential in modeling individual differences, and (ii) attempts to make them more widely known to a general cognitive science audience.

It's exciting to see more advanced Bayesian statistical models of this sort poke their way into cognitive science. As I think about how useful these can be, I have some questions. For instance, Navarro et al.'s model gives a more principled mechanism for figuring out how many groups best fit a set of data, but the exact number of groups identified is still dependent on the alpha parameter. Is this a severe limitation? Also, the "rich-get-richer" process is intuitive and natural in many cases, but not all groups follow power-law distributions. How might we use models with other processes (e.g., Gaussian process models) to assign items to an unspecified number of groups in delete "other" ways that don't yield power-law distributions? I think we've only started to scratch the surface of the uses of this type of model, and I'm eager to see what happens next.

Posted by James Greiner at 7:00 AM

September 22, 2005

The Two Levels of Cognitive Science

Amy Perfors

Our job as social scientists is to learn how to take data that reflects various aspects of how people and societies work, and then use that data to form abstract theories or models about the world. Different fields in social science look at different data, but we all share common methods and (I imagine) some common general questions. This blog is set up to allow our different disciplines to discuss our commonalities of method and approach, sharing insights from our respective fields.

Cognitive science is a bit unusual because the questions of method and approach are simultaneously relevant on two levels rather than one. In cognitive science, the object of study (the brain) must solve the same questions as the scientists themselves. In other words, just as the job of the cognitive scientist is to figure out how best to take data in the world and form models about the world, the job of the brain is to figure out how to take data in the world and form a model about the world. As a result, the issues that crop up again and again for scientists—which quantitative approaches "compress" data most effectively and fastest, when statistical or symbolic models capture the world best, and how much needs to be built into our models from the beginning—are the very issues the brain needs to solve as it is learning about the world. They are thus issues that the cognitive science world continually debates about on both levels: not only what works for us as scientists (and when), but what works for the brain itself (and when).

When I post here, therefore, I'll be constantly playing with these levels: I'll be talking about quantitative methods in social science not just from the perspective of the scientist (as will everyone else here), but also from the perspective of the mind (which I'm guessing most other people won't). In short, the questions we all struggle with in terms of methodology are the same questions cognitive scientists struggle with in terms of content. It's my hope that playing with these questions on two levels at once will be edifying, entertaining, and lots of fun. I think it will be.

Posted by James Greiner at 7:00 AM

September 21, 2005

More on Affirmative Action

Felix Elwert

It's well known that African American college students on average (repeat: on average) have lower SAT scores than white students (see Bowen and Bok's book The Shape of the River). Now here's something that annoys me: Every now and then, I run into somebody who takes this observation as evidence that affirmative action dilutes academic standards. Hello? Differences in mean SATs among accepted students have little or nothing to do with affirmative action!!

Consider this: SAT scores are roughly normally distributed among both blacks and whites but the distribution for blacks is shifted a bit to the left (lower mean). Now consider a college that will admit every candidate above a certain cut-off point (same cut-off for everybody). Under these circumstances the average SAT score of accepted black students would be lower than the average SAT score among accepted white students, even though the college has applied a uniform, race-blind admission standard. Why? Because the tail area of the white SAT distribution extends farther to the right of the cut-off point than the tail area of the distribution for blacks, whatever the reason. Upshot: racial differences in test scores in a student body don't reveal whether a school practices affirmative action and by themselves certainly don't betray "diluted standards." In addition, more or less the only way to create a student body where black and white students have the same average SAT score, given these race specific SAT distributions, would be to set drastically higher admissions standards for blacks than for whites - i.e. to discriminate against blacks. Surely, that wasn't the point?

(This observation comes to me via friends of UCLA's Thomas Kane. Kane is now moving to Harvard - thus moving this blog closer to the source.)

Posted by James Greiner at 7:00 AM

September 20, 2005

Misreading Racial Disparities - Beware Of Ratios of Percentages

Felix Elwert

It's fascinating how far you can get by taking a second look at the simplest statistics - in this case percentages and ratios. Case in point, James Scanlan's clever and unjustly ignored observation that African Americans will necessarily appear to be losing ground relative to whites even as their standing improves in absolute terms. (Actually, the argument holds for any inter-group comparisons, not just race.) Scanlan shows that this is an artifact of measuring progress by focusing exclusively on ratios of percentages from dissimilar distributions. This insight begs the question of how best to measure progress. Here are some of Scanlan's examples.

Black-white differences in infant mortality: In 1983, 19.2 black infants but only 9.7 white infants died per 1000 births in each group. The resulting black-white ratio was 1.98. In 1997, infant mortality had decreased quite a bit, to 14.2 for blacks and 6.0 for whites. Note that in raw percentage terms, infant mortality had improved more for blacks than for whites. That should be good news, no? But, lo, now look at the black-white ratio in 1997 - it increased from 1.98 to 2.4. How can infant mortality have improved more for blacks than for whites in absolute terms at the same time as the relative position of
blacks to whites has worsened?

Here's another example for the same underlying statistical phenomenon: Moving the income distributions of blacks and whites up by the same dollar amount relative to the poverty threshold would increase the racial disparity in poverty (because relatively more blacks suffer extreme poverty than whites)! Except for extreme circumstances, this will be true even if we boost black real incomes more than white real incomes. How can it be that helping blacks more than whites in absolute terms would worsen blacks' relative economic position?

Here's my favorite example - racial disparities in college acceptance rates. Suppose that college admissions are solely a function of SAT scores (as I'm told they essentially are for some large, selective state schools) and that the SAT distribution of black test takers equals that of whites except it's shifted to the left (as it is). Let the cut-off point for college acceptance be the same for blacks and whites (i.e. no affirmative action). Lowering the admission standard (for everybody) would then reduce the racial disparity in admission rates. That's good, no? But at the same time - and necessarily so - the lowering of admission standards would increase the racial disparity in rejection rates. That's bad, no? Huh?

It turns out that seemingly straightforward comparisons of ratios of percentages may hide more than they tell (in these examples, with important policy implications). Interestingly, all three examples draw on the same statistical phenomenon. The secret lies in the funny shape of cdf-ratios from density functions that are shifted against each other. I plan to provide an intuitive explanation for this point once we've figured out how to post graphics on this blog. Until then, read James P. Scanlan's "Race and Mortality" in the Jan/Feb 2000 issue of Society.

Posted by James Greiner at 7:00 AM

September 19, 2005

Applied Stats Seminar

Michael Kellermann

The Research Workshop in Applied Statistics brings together the statistical community at Harvard for a lively exchange of ideas. It is a forum for graduate students, faculty, and visiting scholars to present and discuss their work. We advertise the workshop as "a tour of Harvard's statistical innovations and applications," with weekly stops in different disciplines such as economics, epidemiology, medicine, political science, psychology, public policy, public health, sociology and statistics. The topics of papers presented in recent years include matching estimators, missing data, Bayesian simulation, sample selection, detecting biological attacks, imaging the Earth's interior, incumbency in primary elections, the effects of marriage on crime, and revealed preference rankings of universities.

One of the strengths of the workshop is its diverse group of faculty sponsors. This year's sponsors include Alberto Abadie (Kennedy School), Garrett Fitzmaurice (School of Public Health), Lee Fleming (Business School), Guido Imbens (Economics), Gary King (Government), Kevin Quinn (Government), James Robins (School of Public Health), Donald Rubin (Statistics), and Christopher Winship (Sociology). The workshop provides an excellent opportunity for informal interaction between graduate students and faculty.

The workshop meets Wednesdays during the academic year; lunch is provided. If you are interested, come to our organizational meeting on Wednesday, September 21 at noon in Room N354 at the Institute for Quantitative Social Science (IQSS is located on the 3rd Floor of CGIS North, 1737 Cambridge St., located behind the Design School). Course credit is available for students as an upper-level class in either Government or Sociology.

For more information, check out our website at here . There you will find contact information, the schedule of presentations, and links to papers from previous presentations. We'll also be using this blog to announce speakers and to post reports from the workshop, so check back here often. We hope to see many of you there. If you have any questions, feel free to e-mail me at kellerm@fas.harvard.edu.

Posted by James Greiner at 12:53 PM

Censoring Due to Death, cont'd, & A Visit To Harvard

Censoring, cont'd
John F. Friedman

Continuing from the most recent post, for the economist, perhaps a more interesting incidence of this statistical problem is not researchers making this error within the literature but consumers making misjudgments in the marketplace. (Since most people approach problems in their lives with less rigor than a statistician, perhaps this is not surprising). In particular, once consumers make these inference mistakes, economic theory suggests that firms will take advantage. Edward Glaeser wrote at length on this phenomenon in 2003 in "Psychology and the Market."

One classic example of this phenomenon - as specifically related to censorship by death - is the mutual fund industry. Most brochures for management companies aggressively tout the high past returns that have accumulated in their funds. Consumers then extrapolate these historical earnings into the future, usually choosing managers based on past performance. Of course, their reasoning is tainted by the same statistical problem; companies will shut down those mutual funds which have poor past performance, leaving only their winners for customers to admire. (Another problem with this line of reasoning is that there is virtually no evidence that strong past performance predicts of strong future performance. In this sense, perhaps the greater error is to pay attention to past returns at all!) This problem is compounded in the market by the fact that any firm which attempts to educate consumers about their mistakes is unlikely to capture the value-added from that effort. The now-savvy consumers have no reason to invest at the firm that provided the information, and, even if they did, these firms make the most money from naive consumers rather the smart ones, who would now make up the clientele. See David Laibson and Xavier Gabaix (2004) for more on this phenomenon. Since no firm has an incentive to educate the public, the entire industry becomes geared towards taking advantage of naive consumers, obfuscating costs, and selectively presenting information.


A Visit To Harvard

Anton Westveld (Visiting from University of Washington Statistics Department)

This past week I had the opportunity to visit with Kevin Quinn, one of my main Ph.D. advisors, at for the Center for Government and International Studies at Harvard. Kevin and Gary King asked if I would provide a brief description of my recent visit.

I was fortunate enough to arrive in time to work in the new buildings for the Center. The new space has a modern design that is quite beautiful and utilitarian.

Currently we are working on developing statistical methodology for longitudinal social network data. Social network data consist of measured relations occurring from interactions within a set of actors. This type of data allows for the empirical investigation of the interconnectivity of the actors, which is a cornerstone of social science theory. The methodology focuses on data generated from the repeated interaction of pairs of actors, including temporal dyadic data resulting in an outcome for each actor at each time point (e.g. the level of exports from Canada to Japan in a given year). The methodology incorporates structure to account for correlation resulting from interactions as well as the repeated nature of the data. In particular, a random effects model is employed which accounts for five different types of network dependencies. These five dependencies are then correlated over time through the assumption that the random effects follow a weakly stationary process.

Kevin and I spent the last few days discussing appropriate methodology and writing C++ code. We also spent some time discussing the relationship between social network models and statistical game theory models, both of which seek to gain an understating of social phenomena by examining social interaction data. Due to the Center’s collegial environment, I also had opportunities to discuss my work with Gary King and Jake Bowers.

Posted by James Greiner at 7:00 AM

September 16, 2005

Censoring Due to Death, cont'd

John F. Friedman

The problem of "censoring by death" also surfaces up in a number of economic contexts. For instance, firms that go bankrupt as a result of poor corporate policies will not appear in many datasets, making any analysis of the impact of other financial events biased upwards. This problem has particularly plagued the literature on the impacts of corporate restructuring and leveraged buyouts (LBOs) of distressed firms. Since these firms are at high risk of failure by nature of their inclusion in the study in the first place, such firms exit the sample at high frequency, and the benefits of restructuring and LBOs may be overstated.

One can theoretically correct for this problem by modeling the ways in which the sample selection occurs, but these approaches have performed poorly in many economic settings due to the sensitivity of the results to the parametric assumptions of the econometric model. For instance, the "Heckman selection correction" - brought into Economics by Nobel laureate James Heckman in 1979 - models the death process as a first stage Probit based on observable characteristics. By estimating this first stage, one can correct for the lost observations. Bob LaLonde (1986) later tested this model by comparing the results from a job training study with random assignment to the results one would have gotten had one used Heckman's method on the treated group. Though the selection correction performed better than many alternative methods, such as matching or differences-in-differences, the estimates were rather imprecise and confidence intervals mismeasured. In this case, the problem is the joint assumption of normality and selection entirely on observables. Though more flexible models have come into Economics in recent years - the Propensity Score, for instance – these too have proven sensitive to the particular model properties in many applications.

Though perhaps an old-fashioned solution, the studies in economics that best avoid this problem have simply endeavored to correct for the sample selection problem by collecting otherwise unavailable data on firm deaths in the sample. These samples are often smaller, permitting less broad analysis, but effectively mitigate the selection by death problem.

Posted by James Greiner at 7:00 AM

September 15, 2005

Censoring Due to Death

D. James Greiner

I'm interested in the problem of "censoring due to death" within the framework of the Rubin Causal Model ("RCM").

As readers will know, the RCM is a framework for studying the effects of causes in which the science is represented via a set of potential outcomes for each unit. (A potential outcome is the value the dependent variable would take on if the treatment variable had a certain value, whether or not the treatment variable actually had that value). An assignment mechanism decides what treatment (e.g., active treatment or control) a unit receives and thus which potential outcome will be observed. Unit-level causal effects are defined as the difference in the potential outcomes of some quantity of interest. The fundamental problem of causal inference is that we can observe at most one potential outcome for each unit. Unobserved potential outcomes are treated as missing data. Observational studies are analyzed as "broken" randomized experiments, broken in the sense that the assignment mechanism was not recorded and therefore must be reconstructed in some approximate way. For a more complete discussion, see Holland, P.W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association 81: 945--960.

Censoring or truncation due to death occurs when some units' failure to comply with a post-treatment condition renders their values of the quantity of interest undefined. Consider for example a medical study designed to assess the effect of a new cancer treatment on the percentage of patients who survive cancer-free for ten years. Suppose some individuals die from car accidents or drug overdoses or other causes clearly unrelated to cancer before the ten-year time period has elapsed. Such individuals do not have a value for ten-year cancer-free survival, so their values of the quantity of interest are undefined. (The problem here is not that these individuals' values for cancer-free survival are missing data; rather, the problem is that they have no such values.) Under such circumstances, some quantitative analysts simply remove such individuals from the study and analyze the remainder. This course of action can bias results in several different ways. To illustrate one such way, it could be that individuals who die from non-cancer related causes might smoke, have less healthy diets, refuse to wear seat belts, or otherwise engage in more risky behavior than many of the other individuals in the study. If the treatment is effective in warding off cancer, there could be more deaths unrelated to cancer in the treated group than the control group, because some treated group members survive cancer that would otherwise have killed them long enough to be felled by, for example, car accidents, before ten years are up. This difference could render comparison of the units remaining in the treated and control groups an inappropriate method of assessing the effect of the treatment.

The key is to realize that a comparison of ten-year cancer-free survival rates only makes sense for units who would not die from causes unrelated to cancer if assigned treatment AND who would not die from causes unrelated to cancer if assigned control. Thus, removing individuals who died from causes unrelated to cancer is not enough.

The remaining group actually assigned control may include some units who would have died from non-cancer causes if they had been assigned treatment, and the remaining group actually assigned treatment may have some units who would have died from non-cancer causes had they been assigned control. The researcher must take appropriate steps to remove both sets of people from the study, so as to isolate the set of individuals who would not die from causes unrelated to cancer regardless of treatment assignment. Junni Zhang (Peking University) and Don Rubin (Harvard University) discuss these issues in "Estimation of Causal Effects Via Principal Stratification when Some Outcomes Are Truncated by 'Death,'" (2003). Journal of Educational and Behavioral Statistics 28:353-368. They extend them in a forthcoming paper with Fabrizia Mealli (University of Florence) currently entitled "Evaluating Causal Effects in the Presence of 'Truncation by Death' -Likelihood-based Analysis via Principal Stratification."

Posted by James Greiner at 7:00 AM

September 14, 2005

Math Camp

Michael Kellermann

When I was an undergrad, the first political science class that I took was taught by the late A.F.K. Organski. At one point, someone asked him what advice he would give to freshmen interested in political science as a major. "Take as many math courses as you can," he said with his inimitable accent. I'm pretty sure that this was not the advice that most people wanted to hear, and that it was honored more in the breach than the observance, but it was sound advice nonetheless.

In keeping with this idea, several Harvard programs offer short math refresher courses for incoming graduate students, including Government, Economics, and the Kennedy School. The Gov Department's "math (p)re-fresher" is held during the first two weeks of September. We cover calculus, probability, linear algebra, and a bit of optimization theory, along with an introduction to some of the software (R, Xemacs, and Latex) that we use in the department's methods courses. All told, it is a quick review of about five semesters worth of undergraduate math courses in the span of ten days. As you might imagine, there is considerable variation in the amount of "pre-freshing" versus "re-freshing" that goes on in the course.

I'm curious about the prevalence of these kind of "math camp" courses in the social sciences. I only know of a few others in political science, but I get the sense that they are more common in economics. Are there any sociology math camps out there? Psychology? Public health? If you have a math camp, I'd be interested in taking a look at your syllabus. Comments should be enabled.

Posted by James Greiner at 7:00 AM

September 13, 2005

Pol Meth Conf IV

Dan Hopkins, G4, Government (guest author)

Continuing with the discussion of papers presented at the recent Political Methodology Conference, Kevin Quinn and Arthur Spirling's paper begins with the problem of identifying legislators' preferences in conditions of strict party discipline. To tackle this challenge, they applied a Dirichlet process mixture model and presented some interesting results about the intra-party groups observed in the British House of Commons. They backed up the groupings recovered from the model with significant qualitative work, and showed how qualitative and quantitative work of this kind can go hand in hand. At the same time, the discussant, Andrew Martin, raised a valuable question: how does this method relate to other analyses of grouping/clustering? I am curious about this question as well.

James Honaker's paper tackled a question of substantive importance: what is the role of economic conditions in triggering sectarian violence? Honaker analyzed all available data, far more than anyone previously, and used a creative combination of ecological inference and multiple imputation to estimate the impact of the Protestant-Catholic unemployment ratio on a monthly basis. His substantive result was that this ratio matters: as the gap between Protestant and Catholic employment grows, so too does the risk of violence. One questioner suggested that we might want to instrument for unemployment, since unemployment could be endogenous to violence. Honaker responded that unemployment in Northern Ireland tracks unemployment in comparable cities elsewhere. This paper struck me as, among other things, a powerful (if implicit) rebuttal to those who are that one should never attempt ecological inferences. The question Honaker addressed is one scholars have already tried to answer - sometimes with counter-intuitive results - suggesting that we may not be able to simply wait for perfect, individual-level data.

Kosuke Imai presented co-authored work on an Internet experiment in Japan. As with the Jackman et al. paper, this work presented a single Bayesian model that dealt with 1) the problem of non-compliance; 2) the problem of non-response; and 3) estimated causal effects. The methods were compelling, although the data were less cooperative: almost no statistically significant treatment effects emerged. That result seems to fit with our priors: the experiment directed Japanese Internet users, presumably a relatively well-informed group, to click on a webpage containing party manifestos during the Upper House election. The fact that we are selecting our sample based on a set of covariates might help explain why the covariates are (at least individually) relatively helpless in predicting compliance. As with the Bowers and Hansen, I hope that the authors make their statistical code public and easily adapted to other applications-as these tools are well-suited to analyzing a wide range of randomized experiments.

David Epstein presented a joint paper with Sharyn O'Halloran that argued for using higher-dimension Markov models-that is, Markov models with more than two states-to model transitions to and from autocracy/democracy. The substantive argument: adding a third category of "partial democracy" helps us see that economic growth matters both for transitioning to democracy and for staying there. Discussant Jeff Gill and others questioned the appropriateness of the basic Markovian assumption (that the probability of transition conditional on the current state is equal to the probability of transition conditional on all previous states) and suggested exploring a higher-order Markov model (that is, models that allow previous states to influence present transition probabilities). I agree with their suggestion, but my question is more basic: if we have polity scores that are continuous on an interval, how much information is thrown away by transforming these scores into three discrete states? I have not seen the data, so I also wonder if these three states emerge naturally from it. In other words, how much would this analysis change if we redefined autocracy or democracy by a few polity points?

Posted by James Greiner at 7:00 AM

September 12, 2005

Pol Meth Conf III, & GOV 2000

Pol Meth Conf III
Dan Hopkins, G4, Government (guest author)

Continuing the discussion of the recent Political Methodology Conference, throughout its first two days the notion of the conference as the "Second Annual Conference on Matching" was a running joke, and definitely a fair joke, although the two matching papers were, well, matched by two ideal point papers. So on to ideal points. Michael Bailey's paper tackled an important problem: because major figures across the different institutions of the federal government are faced with different policy decisions, it is hard to make statements about how their preferences relate. Is the Supreme Court to the left of Congress? How would today's court rule on famous decisions from the past? Bailey's paper sought to extend ideal points across institutions, using such things as public statements and the court briefs of the Solicitor General to compare the ideal points of not just justices but of members of all three branches of the federal government. Bailey argued, for example, that if the first Bush administration filed a brief in support of a certain side in a court case, we could use that filing to put Bush in the same space as Chief Justice Rehnquist. Bailey used the same sort of logic to extend ideal points back in time, focusing on statements about preferences-for instance, Clarence Thomas's statement that Roe was wrongly decided-to allow figures from different time periods to be placed on the same scale. Especially impressive was the data collection effort this project entails, as the author tracked down public statements from a wide range of figures.

One of the challenges of making these kinds of cross-institutional inferences, though, is that we need to implicitly assume non-strategic behavior. Needing to build a majority of five, justices in the Supreme Court face a task distinct from that of the President—or from that of the average member of the House. These strategic contexts will undoubtedly affect politicians' decisions: Presidents have little incentive to make public statements that put them at odds with the majority of Americans, even if those statements reflect their preferences. Also, if Presidents (or others in the system) are selective about the subjects of their commentary, we might wind up with a biased idea of where they actually stand. Still, Bailey provided quite a neat paper, one that provides useful tools for tracking inter-institutional dynamics. The substantive results were also very interesting, with the median ideal point of the Court almost always between that of the House and the Senate.

The next ideal point paper came from Simon Jackman, Matthew Levendusky, and Jeremy Pope. Here, the goal was to estimate the baseline propensity of a Congressional district to support Democratic or Republican candidates—although much of the Q&A was taken up by questions about whether this was best thought of as the "natural vote" or something else. The authors emphasized that measurement and structural modeling go hand-in-hand because inaccurate measurement may well bias the structural estimate of quantities like the incumbency advantage. They also pointed out that in this field we are content with rough proxies of district tendencies despite the fact that in other areas we demand much more precision in our measurements. Jackman, Levendusky, and Pope's model was a Bayesian hierarchical ideal point model that draws on information about both Congressional and Presidential results to make inferences about districts' underlying partisan preferences.

For me, one provocative result from this paper was that the discrimination parameter-that is, the impact of the covariates on the estimated vote share-increased over the decades. In other words, demographic characteristics are becoming increasingly effective predictors of districts' preferences. I would love to see the authors try to get at exactly why that is. One possibility, which Levendusky mentioned in making his presentation, is redistricting: politicians get better at picking their constituents, districts become more homogeneous, and so district-level demographics become better predictors of aggregate vote choices. To test this theory, one might re-estimate the model without the least populous states (because such states have less potential for gerrymandering. Consider Wyoming: no gerrymandering there). Another possibility is that the electorate is sorting itself into more politically homogeneous groups, something one might test in a preliminary way by running the model separately for high-mobility and low-mobility districts. The Census gives data on how many people have lived in the same house for their entire lives, data that could help with these questions.


GOV 2000
Kevin Quinn

This fall I am teaching GOV 2000 Quantitative Methods for Political Science I. This course is also offered for credit through Harvard's distance learning program as GOVT E-2000. GOV 2000 is the first course in the Department of Government's methodology sequence and it is designed to introduce students to statistical modeling with emphasis on least squares linear regression. Although we will not ignore the theory underlying the linear model, much of the course will focus on practical issues that arise when working with regression models. Topics covered in the course include: data visualization, statistical inference for the linear model, assessing model adequacy, when is a regression model a causal model, dealing with leverage points and outliers, robust regression, and methods for capturing nonlinearities. We will also be working with real social science datasets throughout the course. For more information, please visit the course website here .

Posted by James Greiner at 7:00 AM

September 9, 2005

Pol Meth Conf II

Dan Hopkins, G4, Government (guest author)

Continuing with the matching theme on which I ended the post of two days ago, Alexis Diamond and Jas Sekhon presented a paper on genetic matching that claimed to be a significant improvement on past approaches. One of the challenges of matching is to weight each of the covariates so as to produce the optimal set of matches. Genetic matching uses a genetic algorithm to search across the set of possible weight matrices to find the weight matrix that minimizes some loss function. Of course, what exactly that loss function should be is debatable. In Rawlsian fashion, Diamond and Sekhon argued that it should be to maximize the p-value of the most unbalanced covariate, and Sekhon's software (link here) does exactly that. In some applications, one could certainly imagine other loss functions; seeking the best possible balance on the most unbalanced covariate could jeopardize the overall balance, a libertarian sort of rebuttal. The discussion of the paper also raised the question of whether using a p-value is the right criterion. If the algorithm is comparing p-values from samples with different sizes, for instance, it could disproportionately favor a smaller sample.

Despite the questions, I buy Diamond and Sekhon's argument. Genetic matching makes effective use of computing power to search across a high-dimensional space for the most balanced sample that the data can provide. In cases where there is insufficient overlap on covariates, data analysts will know this quickly rather than devoting weeks to Holy Grail-style quests for optimal matches. And in cases where there is sufficient overlap on the covariates to make causal inferences, data analysts will be far more certain that they have attained the best possible balance—again, subject to the constraints about the loss function.

Posted by James Greiner at 5:37 PM

September 7, 2005

Pol Meth Conf I

The 14 papers presented at the 2005 Conference of the Society for Political Methodology, held July 21st-July 23rd in heavily air-conditioned rooms at Florida State, provided plenty of good fodder for discussion. I will focus on several I found especially provocative--and on which I could reasonably comment--in blog posts over the next few days.

Starting off the conference, Gary King's presentation of his paper Death by Survey: Estimating Adult Mortality without Selection Bias'' with Emmanuela Gakidou argued that we need to take a new approach to estimating death rates in the many countries that do not have vital registration systems. The dominant approach at the present assumes that larger families do not have differing mortality rates, but given the uneven pace of development in so many countries, that seems a heroic assumption, and their paper shows it is completely wrong empirically. King and Gakidou's approach involves two fixes: first, weighting to deal with the over-sampling of families with more surviving children during the observation period (since samples are drawn in proportion to survivors rather than those alive at the start of the period), and second, extrapolation to deal with the fact that families with no surviving children are entirely excluded from the sample. The first problem is fixed exactly by weighting; the second requires assumptions beyond the range of the data. Some discussion focused focused on one of the main challenges to this second fix—that it involves extrapolation, extrapolation based on a small number of data points, and extrapolation based on a quadratic term. The paper deals with the danger of extrapolation through repetition in different data: By showing that the relationship between mortality and the number of siblings is constant in its shape across a wide range of countries, King and Gakidou argue that we can be reasonably confident about the fit of the curve from which we are extrapolating. The authors are now gathering survey data to replicate this approach in cases where we know the answer--that is, where we also have accurate, non-survey data on mortality rates. That is especially critical since families without any surviving children might be disproportionately the victims of wars or other violence, for instance, making it challenging to use data about families with surviving children to make inferences about families without any surviving children.

Another early paper came from Kevin Clarke, who argued that political science as a discipline has become too worried about omitted variable bias. Clarke took another look at the familiar theoretical omitted variable bias result and pointed out that contrary to conventional wisdom, including additional variables can, under certain circumstances, exacerbate problems of omitted variable bias. The circumstances are that something else is wrong: that is, omitting a variable that is causally prior to and correlated with the treatment variable and affects the outcome variable will bias inferences in a predictable direction, and including it will reduce bias -- but only when other modeling assumptions are correct. If you have five things wrong with your model and you fix four, it is at least possible that you can make things worse.

In my view, the sociology of the discipline embedded in Clarke's presentation was right on. In substantive presentations, it is incredibly common for presenters to be barraged with questions that are of this form: "did you account for [insert favorite variable]? What about [insert second favorite variable]? Or maybe [insert random variable that no one before or since has ever heard of]?" Reviewers, too, seem to find this an easy way to respond to articles. One way to deal with this problem--sensitivity tests--was highlighted during the discussion. Our models are almost never perfectly specified, so there will always be omitted variables, and knowing how those variables would need to look to overturn a result is a good (if incomplete) start to deal with this problem. One example of how these kinds of sensitivity analyses might work, by the way, is David Harding's 2003 "Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on Dropping Out and Teenage Pregnancy." (American Journal of Sociology 109(3): 676-719). Another is Paul Rosenbaum and Don Rubin's 1983 "Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome." (Journal of the Royal Statistical Society, Series B 45: 212-218).

One other case against overly-saturated models, one that did not come up in the discussion but that is probably familiar to many, is the challenge of thinking in terms of conditional effects as the number of variables increases. For instance, if we think about vote choices as our dependent variable, I understand what it means to talk about the impact of income conditional on race, but it is much harder to know what it means to say the impact of income conditional on ten other, inter-correlated variables. This problem becomes all the more difficult when we remember that we are conditioning not just on the inclusion of certain variables but also on the functional form specified for them.

Because I am a Harvard graduate student, I should also play to type and say something briefly about how matching (which these days is well-represented in hallway conversations at IQSS) relates to omitted variables. Obviously, it is no panacea, as unobserved confounders can be just a troublesome as in the case of more conventional models. But there is one way in which matching adds value here. In cases where we are matching observations of units for which we have information not quantified in our dataset, looking at the list of matched pairs can help identify the omitted variable. If, say, we are studying countries, and see that our observed variables wind up pairing Ethiopia and Greenland, we can use that pairing to think through what kinds of unobserved variables might be potential confounders.

Dan Hopkins, G4, Government (guest author)

Posted by James Greiner at 5:11 PM