October 2005
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« October 24, 2005 | Main | October 26, 2005 »

25 October 2005

New IR Data Set with 10 Million Dyadic Events

I thought you might be interested in a newly updated dataset of almost 10 million individually coded international events (1990-2004). Each event is summarized in the data as "Actor A does something to Actor B", with Actors A and B coded for about 450 countries (and other actors) and "does something to" coded in an ontology of about 200 types of actions. The data are coded by a computer "reading" millions of Reuters news reports. Will Lowe and I wrote an article* that evaluated the software system (produced by VRA) that performs this task and found that for the numbers of events it was possible to convince humans (trained Harvard undergraduates) to coded by hand, the machine did as well as the humans. However, in part since there is only so much pizza you can feed undergraduates, the machine clearly dominates for larger numbers of events. We previously released a dataset with 3.5 million events; this one is bigger, more accurate (since the software has been improved), and covers a longer time period.

Most international relations data are limited to analyses aggregated to the year or month. Yet, as we say in the article, when the Palestinians launch a mortar attack into Israel, the Israeli army does not wait until the end of the calendar year to react. We think there is much to be learned about international relations from data like these. For the data, documentation, and our article, see this site.

Gary

*Gary King and Will Lowe. 2003. "An Automated Information Extraction Tool For International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design" International Organization, 57, 3 (July, 2003): Pp. 617-642.

Posted by Gary King at 5:31 PM

Did You Achieve Balance?! Part I

Jens Hainmueller

There exists a growing consensus in the causal inference literature that when it comes to bias adjustment under selection on observables, matching methods dominate ordinary regression (esp. when discrepancies between groups are large). But how do we judge the quality of a matching? My professors tell me: "We want good balance." Sounds great, so I thought at first. Reading more matching articles, however, I soon became somewhat startled by the scholarly disagreement about what actually constitutes "good" balance in observational studies. Despite the fact that matching methods are now widely used all across the social sciences, we still lack shared standards for covariate balance: Which tests should be used in what type of data? What are their statistical properties and how do they compare to each other? And how much balance is good enough?

From reading this literature (sincere apologies if I have missed something relevant), it seems to me that most people agree that paired t-tests for differences in means are obligatory. T-tests are useful because matching by construction produces matched pairs. But should we test by comparing whole groups (treated vs. matched-untreated) or within propensity score ("PS") subclasses? A problem with the latter may be that the choice of intervals can be arbitrary, which is critical as interval width affects the power of the test (Smith and Todd 2005).

Moreover, which covariates should we t-test balance on? At least all that are included in the matching (right?), but how about other moments, the full set of interactions and higher-order terms, etc? The latter seems helpful to minimize bias but is done once in a blue moon (at least in the papers that I encountered). Most authors avoid these additional tests since they exacerbate common support problems and substantially raise the hurdle for obtaining balance.

Finally, should we t-test balance on the PS score and or the covariates othorgonalized to the PS score? How do we deal with the estimation uncertainty in these variables? And what does it mean -- as happens sometimes in practice -- to have remaining imbalance on the PS while all covariates are balanced?

Stand by for part II of this post tomorrow.

Posted by James Greiner at 5:00 AM