January 2010
Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


January 9, 2010

Sequential Ideal Points

Simon Jackman puts together a plot of how the estimation of ideal points of the 111th U.S. Senate changes as he adds each roll call. Every Senator starts the term at 0 and then branches out. It illustrates an interesting feature of these IRT models:

The other thing is that there doesn't seem to be any obvious "vote 1″ update for ideal points. That is, there is no simple mapping from the ideal point estimate based on m roll call to ideal point estimates based on m+1 roll calls. You have to start the fitting algorithm from scratch each time (and hence the appeal of exploiting multiple cores etc), although the results from the previous run giving pretty good start values.

Posted by Matt Blackwell at 3:57 PM | Comments (0)

October 14, 2009

The Fundamental Regret of Causal Inference

Tim Kreider at the New York Times has a short piece on what he dubs "The Referendum" and how it plagues us:

The Referendum is a phenomenon typical of (but not limited to) midlife, whereby people, increasingly aware of the finiteness of their time in the world, the limitations placed on them by their choices so far, and the narrowing options remaining to them, start judging their peers' differing choices with reactions ranging from envy to contempt. ...Friends who seemed pretty much indistinguishable from you in your 20s make different choices about family or career, and after a decade or two these initial differences yield such radically divergent trajectories that when you get together again you can only regard each other's lives with bemused incomprehension.

Those familiar with casual inference will recognize this as stemming from the Fundamental Problem of Causal Inference: we cannot observe, for one individual, both their response to treatment and control. The article is an elegant look at how we grow to worry about those mysterious missing potential outcomes--the paths we didn't choose--and how we use our friends' lives to impute those missing missing outcomes. Kreider goes on to make this point exactly, with a beautiful quote from a novel:

The problem is, we only get one chance at this, with no do-overs. Life is, in effect, a non-repeatable experiment with no control. In his novel about marriage, "Light Years," James Salter writes: "For whatever we do, even whatever we do not do prevents us from doing its opposite. Acts demolish their alternatives, that is the paradox." Watching our peers' lives is the closest we can come to a glimpse of the parallel universes in which we didn't ruin that relationship years ago, or got that job we applied for, or got on that plane after all. It's tempting to read other people's lives as cautionary fables or repudiations of our own.

Perhaps the only response is that, while so close to us in so many respects, friends may be poor matches for gauging these kinds of effects. In any case, "Acts demolish their alternatives, that is the paradox" is the best description of the problem of causal inference that I have seen.

Posted by Matt Blackwell at 4:19 PM

June 8, 2009

Was there Really a Hawthorne Effect at the Hawthorne Plant?

The idea of the Hawthorne effect is that individuals may change their behavior because they are being studied, in addition to any real effects of the intervention. Steven Levitt and John List have revisited the illumination experiments at the Hawthorne plant that gave name to the effect, and argue that many of the original conclusions do not hold up to scrutiny. There's an Economist article on the paper here but its subtitle "Being watched may not affect behavior, after all" is misleading: even if the earlier research was sloppy by today's standards the contribution was to point out the possibility of these effects. A better subtitle could have commended replication as important scientific method.


Levitt and List (2009) "Was there Really a Hawthorne Effect at the Hawthorne Plant? An Analysis of the Original Illumination Experiments" NBER Working Paper #15016, http://www.nber.org/papers/w15016.pdf

The Economist (June 4, 2009) "Light work: Questioning the Hawthorne effect", http://www.economist.com/finance/displayStory.cfm?story_id=13788427

Posted by Sebastian Bauhoff at 8:53 AM

May 23, 2009

Distribution of Swine Flu Cases by Weekday

How will you expect swine flu cases to be distributed by weekday? More specifically, will you expect more cases distributed in weekdays or in weekends? My first reaction is that there will be more cases if there are more social gatherings.

Following this logic, the reasons for supporting more cases in weekdays may include that susceptible population have more contacts with infected population in weekdays, either through school or through work, etc. In addition, as people are more likely to travel in weekends, it means that they will have more contacts with infected subjects during their traveling, but because it takes around two days for the virus to have impacts, the cases will not be identified until a couple of days later. Could this also be due to the fact that there are less clinical services provided in weekends and that people are less likely to visit clinics in weekends?

Here is an old graph I made according to the swine flu updates (4/26/2009 - 05/21/2009) published on WHO's website. To be more accurate, I drew a new graph using the number of confirmed new cases rather than the cumulative number of confirmed cases.

As the reporting times for confirmed new cases vary, some at 18:00 while others at 6:00, etc., I kept only records between 05/01 and 05/21 whose reporting time is at 6:00 and redrew the graph. Weekdays are redefined as well. For example, Thursday 6:00 to Friday 6:00 is defined as Thursday. Could you still see any salient patterns, like the differential between weekdays and weekends? Why Friday is so spiky this time?
SwineFlu3.jpeg

Posted by Weihua An at 12:38 AM

May 20, 2009

Debates on government transparency websites

A few weeks ago my friend Aaron Swartz wrote a blog post called Transparency is Bunk, arguing that government transparency websites don't do what they're supposed to do, and in fact have perversely negative effects: they bury the real story in oceans of pro forma data, encourage apathy by revealing "the mindnumbing universality of waste and corruption," and lull activists into a false sense of accomplishment when occasional successes occur. It's a particularly powerful piece because Aaron uses the platform to announce he's done working on his own government website (watchdog.net). The piece appears to have caused a stir in government transparency/hacktivist circles, where Aaron is pretty well known.

On looking back at it I think Aaron's argument (or rant, more accurately) against the transparency websites is not very strong: indeed, data overload, apathy, and complacency are all dangers these efforts face, but that shouldn't have come as a surprise.

I had two other responses particular to my perch in academia. First, there is some good academic research showing that transparency works, although the evidence on the effectiveness of grassroots watchdogging is less strong than the evidence on auditing from e.g. Ferraz and Finan on Brazilian municipalities (QJE 2008, working paper version) or Olken's field experiment in Indonesia (JPE 2008, working paper version).

Second, my own work and that of other academics benefits greatly from these websites. I have a project right now on the investments of members of Congress (joint with Jens Hainmueller) that is possible only because of websites like the ones Aaron criticizes. I think this paper is going to be useful in helping watchdogs understand how Congress invests and whether additional regulation is a good idea, and it would be a shame if the funders of these sites listened to Aaron and shut them down.

I do agree with Aaron that professional analysis may be better than grassroots citizen activism in achieving the goals of the transparency movement. Sticking with the example of the congressional stock trading data I'm using, I suspect that not much useful watchdogging came out of the web interface that OpenSecrets provides for the investments data. While it may be interesting to know that Nancy Pelosi owns stock in company X, it's hard to get any sense of patterns of ownership across members and how these investments relate to political relationships between members and companies. This is what our paper tries to do. It takes a ton of work, far more than an investigative journalist is going to put in. We do it because of the rewards of publishing interesting and original and careful research, and also because these transparency websites have made it much more manageable: OpenSecrets.org converted the scanned disclosure forms into a database and provided lobbying data, and GovTrack provided committee and bill info, as well as an API linking company addresses to congressional districts. Most of the excitement around these websites seems to center on grassroots citizen activism, but their value to academic research (and the value of academic research to government accountability) should not be overlooked.

Posted by Andy Eggers at 10:53 PM

May 13, 2009

Natural Languages

The social sciences have long embraced the idea of text-as-data, but in recent years, increasing numbers of quantitative researchers are investigating how to have computers find answers to questions in texts. This task might appear easy on the outset (as it apparently did to early researchers in machine translation), but, as we know, natural languages are incredibly complicated. In most of the applications in social science, analysts end up making a "bag of words" assumptions--the relevant part of a document are the actual words, not their order (this is not a unreasonable assumptions, especially given the questions being asked).

When I see applications of natural language processing (NLP) in the social sciences, I typically think very quickly to its future. Computers are making strides at being able to understand, in some sense, what they are reading. Two recent articles , however, give a good overview of the challenges that NLP faces. First, John Seabrook of the New Yorker had an article last summer, Hello, Hal, which states the problem clearly:

The first attempts at speech recognition were made in the nineteen-fifties and sixties, when the A.I. pioneers tried to simulate the way the human mind apprehends language. But where do you start? Even a simple concept like "yes" might be expressed in dozens of different ways--including "yes," "ya," "yup," "yeah," "yeayuh," "yeppers," "yessirree," "aye, aye," "mmmhmm," "uh-huh," "sure," "totally," "certainly," "indeed," "affirmative," "fine," "definitely," "you bet," "you betcha," "no problemo," and "okeydoke"--and what's the rule in that?

The article is mostly about speech recognition, but it definitely hits the main points about why human-generated language is so hard tricky. The second article, in the New York Times recently, is a short story about Watson, the computer that IBM is creating to compete on Jeopardy! IBM is trying to push the field of Question Answering quite a bit forward with this challenge. This goal is to create a computer that you can ask a natural language question to and get the correct answer. A quick story in the article indicates that they may a bit to go:

In a demonstration match here at the I.B.M. laboratory against two researchers recently, Watson appeared to be both aggressive and competent, but also made the occasional puzzling blunder.

For example, given the statement, "Bordered by Syria and Israel, this small country is only 135 miles long and 35 miles wide," Watson beat its human competitors by quickly answering, "What is Lebanon?"

Moments later, however, the program stumbled when it decided it had high confidence that a "sheet" was a fruit.

This whole Watson enterprise makes me wonder if there are applications for this kind of technology within the social sciences. Would this only be useful as a research aid, or are there empirical discoveries to be made with this? I suppose it comes down to this: if a computer could answer your question, what would you ask?

Posted by Matt Blackwell at 9:43 AM

May 10, 2009

Dobbie and Fryer on Charter Schools in the Harlem Children's Zone

David Brooks wrote a column a few days ago about Will Dobbie and Roland Fryer's working paper on the Harlem Children's Zone charter schools, which the authors report dramatically improved students' performance, particularly in math. Looking at the paper, I think it's a nice example of constructing multiple comparisons to assess the effect of a program and to do some disentangling of mechanisms.

The program they study is enrollment in one of the Promise Academy elementary and middle schools in Harlem Children's Zone, a set of schools that offer extended class days, provide incentives for teacher and student performance, and emphasize a "culture of achievement." The authors assess the schools' effect on student test scores by comparing the performance of students at the schools with that of other students. The bulk of the paper is concerned with how to define this group of comparable non-students, and the authors pursue two strategies:

  • First, they examine cases where too many students applied to the school and slots were handed out by lottery; the comparison of lottery winners and non-lottery winners (and the accompanying IV estimate in which attending the school at some point is the treatment) allow them to compare the effect of attending these schools under nearly experimental conditions, at least in years when lotteries were held.
  • Second, they compare students who were age-eligible and not age-eligible for the program, and students who were in the schools' recruitment area vs not in the schools' recruitment area. (This boils down to an IV in which the interaction of cohort and address instruments for attendance at the school.)

The estimated effect is very large, particularly for math. Because the estimates are based on comparisons both within the HCZ and between HCZ and non-HCZ students, the authors can speculate somewhat about the relative importance of the schooling itself vs other aspects of the HCZ: they tentatively suggest that the community aspects must not drive the results, because non-HCZ students did just as well.

Overall I thought it was a nice example of careful comparisons in a non-experimental situation providing useful knowledge. I don't really know this literature, but it seems like a case where good work could have a big impact.

Posted by Andy Eggers at 9:40 AM

April 13, 2009

Alley-oops as workplace cooperation

Here's a paper for the "high internal, low external validity" file (via Kevin Lewis):

Interracial Workplace Cooperation: Evidence from the NBA

Joseph Price, Lars Lefgren & Henry Tappen
NBER Working Paper, February 2009

Abstract:
Using data from the National Basketball Association (NBA), we examine
whether patterns of workplace cooperation occur disproportionately
among workers of the same race. We find that, holding constant the
composition of teammates on the floor, basketball players are no more
likely to complete an assist to a player of the same race than a
player of a different race. Our confidence interval allows us to
reject even small amounts of same-race bias in passing patterns. Our
findings suggest that high levels of interracial cooperation can occur
in a setting where workers are operating in a highly visible setting
with strong incentives to behave efficiently.

Posted by Andy Eggers at 6:51 PM

April 4, 2009

Can Nonrandomized Experiments Yield Accurate Answers?

Here is some latest progress (at least to me) on causal inference. William R. Shadish, M. H. Clark, and Peter M. Steiner published a paper on JASA (December 1, 2008, 103(484): 1334-1344.) based on "a randomized experiment comparing random and nonrandom assignments". Basically "In the randomized experiment, participants were randomly assigned to mathematics or vocabulary training; in the nonrandomized experiment, participants chose their training." As the authors acknowledged, unsurprisingly, the randomized and nonrandomized experiments provided different estimates of the training effects, very likely through the selection bias caused by math phobia. The key finding is that statistical adjustment including propensity score stratification, weighting, and covariance adjustment can reduce estimation bias by about 58-96%.

Here is a link to the PPT of the paper. The comments on the paper are also very insightful.

Posted by Weihua An at 10:31 PM

March 25, 2009

How to teach methods

Over on the polmeth mailing list there is a small discussion brewing about how to teach undergraduate methods classes. Much of the discussion is on how to manage the balance between computation and statistics. A few posters are using R as their main data analysis tool, which provoked others to comment that this might push a class too far away from its original intent: to learn research methods (although one teacher of R indicated that a bigger problem was the relative inability to handle .zip files). This got me thinking about how research methods, computing and statistics fit into the current education framework.

As a gross and unfair generalization, much of college is about learning how take a set of skills and use them to make effective and persuasive arguments. In a literature class, for instance, one might use the skills of reading and writing to critical engage a text. In mathematics, one might take the "skill" of logic and use it to derive a proof.

The issue with introductory methods classes is that many undergraduates come into school without a key skill: computing. It is becoming increasingly important to have proficient computing skills in order to make cogent arguments with data. I wonder if it is time to rethink how we teach computing at lower levels of education to adequately prepare students for the modern workplace. There is often emphasis on using computers to teach students, but I think it will become increasingly important to teach computers to students. This way courses on research methods can focus on how to combine computing and statistics in order to answer interesting questions. We could spend more time matching tools to questions and less time simply explaining the tool.

Of course, my argument reeks of passing buck. A broader question is this: where do data analysis and computing fit in the education model? Is this a more fundamental skill that we should build up in children earlier? Is it perfectly fine where it is, being taught in college?

Posted by Matt Blackwell at 3:08 PM

March 7, 2009

How to Take Log of Zero Income

I encounter a problem when using a Log normal distribution to model income distribution. Namely, there are a bunch of people in my dataset who report zero income, maybe due to unemployment, and I am wondering how to logarize the zero incomes. I notice some researchers just drop the observations with zero income while others assign a small amount of income to them so that logarithm can be taken legitimately. Obviously, we can try both ways to see how the results stand. But I am wondering if there are some experts on this topic who can clarify the pros and cons of these and other approaches treating zero incomes.

A related question is what model you think fits the income distribution best, a Lognormal, a power distribution, or a mixture model of a Normal and a point mass at zero, and so on.
Look forward to your thoughts on these questions.

Lastly, here is an interesting animation of the income distribution in the USA.

Posted by Weihua An at 6:07 PM

February 25, 2009

Missingness Maps and Cross Country Data

I've been doing some work on diagnostics for missing data issues and one that I have found particularly useful and enlightening has been what I've been calling a "missingness map." In the last few days, I used it on some World Bank data I downloaded to see what missingness looks like in a typical comparative political economy dataset.

missmap2.png

View image

The y-axis here are country-years and the x-axis are variables. We draw a red square where the country-year-variable cell is missing and a light green square where the cell is observed. We can see immediately that a whole set of variables in the middle columns are almost always unobserved. These are variables measuring income inequality and they are known to have extremely poor coverage. This plot very quickly shows us how listwise deletion will affect our analyzed sample and how the patterns of missingness occur in our data. For example, in these data, it seems that if GDP is missing, then many of the other variables, such as imports and exports are also missing. I think this is a neat way to get a quick, broad view of missingness.

(Another map and some questions after the jump...)

We can also change the ordering of the rows to give a better sense of missingness. For the World Bank data, it is wise to resort the data by time and see how missingness changes over time.

missmap-time2.png

View image

A clear pattern emerges that the World Bank has better and better data as we move forward in time (the map becomes more "clear"). This is not surprising, but it is an important point when, say, deciding the population under study in a comparative study. Clearly, listwise deletion will radically change the sample we analyze (the answers will be biased toward more recent data, at the very least). The standard statistical advice of imputation or data augmentation is tricky as well here because we need to choose what to impute. Should we carry forth with imputation given that income inequality measures seem to be completely unavailable before 1985? If we remove observations before this, how do we qualify our findings?

Any input on the missingness map would be amazing, as I am trying to add as a diagnostic it to a new version of Amelia. What would make these plots better?

Posted by Matt Blackwell at 2:58 PM

February 21, 2009

My Basketball Friend

I met one of my friends on basketball court. This is selection. I select him as my friend because he plays good basketball and is an avid player. We have been friends for almost three years. When either of us wants to play, most times we will call each other and meet on the court. I think without knowing him, I will still play basketball, but not that many times. So we influence each other. Sometimes we eat Vietnamese noodles together at Le's right after game. Contextual factors matter, but it is him who makes me eat more times of noodles than I would have by myself. Probably, our friendship has some impacts on both of our weights and may make them change more synchronously. Similarly, if you are a runner, you will surely like running with your friends and may run more because you get a runner as friend. So the empirical question is whether you indeed play more basketball when you get a friend who likes playing basketball and run more if you get a runner friend. It is also possible that because you play more or run more, you eat more, which offsets the weight loss due to those extra exercises.

Given only observational data, it is hard to disentangle the effects of selection, induction and contextual factors on weight changes. We have to assign you friends (roommates) randomly and check if you and your friends gain/lose weight together, possibly because you two play more basketball, run more, eat similar things, have similar living styles, share similar standards about what consists of a normal weight, etc.

It is interesting to see that the effects of friendship seem to be directional or asymmetric. Only people you think as friend can induce you to lose weight. You can not induce a person who does not think you are his friend to lose weight, although you think he is your friend. This is kind of opinion leader effect.

The directionality of friendship effects also counters the challenging of contextual factors hypothesis, because if contextual factors matter, you would expect friends' weight changes correlate without directionality. Also, if they matter, you would expect your neighbors' weight changes synchronize with yours and the weight of your friend who lives hundreds of miles away should not correlate with yours. But neither is corroborated by data.

Hence selection should be the largest concern in this case. Now the questions are whether using weight changes or obese status changes will remove the selection effect and how we could control it better.

One of my friends told me two weeks ago that, he did not buy the points in "The Spread of Obesity in a Large Social Network over 32 Years" until he read the real paper. I confessed, "Same here." Read the real paper, not the popular press. But you are absolutely not obligated to buy the points. Here are more.

K.P. Smith and N.A. Christakis, "Social Networks and Health," Annual Review of Sociology 34: 405-429 (August 2008)

Journal of Health Economics, Volume 27, Issue 5, September 2008

Ethan Cohen-Cole, Jason M. Fletcher, "Is obesity contagious? Social networks vs. environmental factors in the obesity epidemic", Pages 1382-1387.

Justin G. Trogdon, James Nonnemaker, Joanne Pais, "Peer effects in adolescent overweight", Pages 1388-1399.

J.H. Fowler, N.A. Christakis, "Estimating peer effects on health in social networks: A response to Cohen-Cole and Fletcher; and Trogdon, Nonnemaker, and Pais", Pages 1400-1405.

P.s. My friend and I have successfully induced several of our friends who originally do not play basketball to play more. But hopefully they can gain some weight rather than losing weight so that we can play more strongly and better.

Posted by Weihua An at 9:01 AM

February 17, 2009

Social pressure and biased refereeing in Italian soccer

I recently came across a paper by Per Pettersson-Lidbom and Mikael Priks that uses a neat natural experiment in Italian soccer to estimate the effect of stadium crowds on referees' decisions. After a bout of hooliganism in early February, 2007, the Italian government began requiring soccer stadiums to fulfill certain security regulations; those stadiums that did not meet the requirements would have to hold their games without spectators. As a result, 25 games were played in empty stadiums that month allowing Petterson-Lidbom and Priks to examine game stats (like this) and see whether referees were more disposed toward the home team when the bleachers were filled with fans than when the stadium was empty. Looking at fouls, yellow cards, and read cards, the authors find that referees were indeed more likely to penalize the home team (and less likely to penalize the away team) in an empty stadium. There does not appear to be any effect of the crowd on players' performance, which suggests that fans were reacting to the crowd and not the players (and that fans should save their energy for haranguing the refs).

One of the interesting things in the results is that refs showed no favoritism toward the home team in games with spectators -- they handed out about the same number of fouls and cards to the home and away teams in those games. The bias shows up in games without spectators, where they hand out more fouls and cards to the home team. (The difference is not statistically significant in games with spectators but is in games with spectators.) If we are to interpret the empty stadium games as indicative of what refs would do if not subjected to social pressure, then we should conclude from the data that refs are fundamentally biased against the home team and only referee in a balanced way when their bias is balanced by crowd pressure. This would indeed be evidence that social pressure matters, but it seems unlikely that refs would be so disposed against the home team. A perhaps more plausible interpretation of the findings is that Italian refs are generally pretty balanced and not affected by crowds, but in the "empty stadium" games they punished the home team for not following the rules on stadium security. This interpretation of course makes the finding less generally applicable. In the end the example highlights the difficulty of finding "natural experiments" that really do what you want them to do -- in this case, illustrate what would happen if, quite randomly, no fans showed up for the game.

Posted by Andy Eggers at 8:25 AM

February 15, 2009

Bayesian Propensity Score Matching

Many people have realized that conventional propensity score matching (PSM) method does not take into account the uncertainties of estimating propensity scores. In other words, for each observation, PSM assumes that there is only one fixed propensity score. In contrast, Bayesian methods can generate a sample of propensity scores for any observation, by either monitoring the posterior distributions of the estimated propensity scores directly or predicting propensity scores from the posterior samples of the parameters of the propensity score model.

Then matching on thus obtained propensity scores, we should expect to get a distribution of estimated treatment effects. This will also provide us with an estimation of the standard error of the treatment effect. The Bayesian S.E. will be larger than the S.E. based on PSM estimate, as it takes into account more uncertainties. This conjecture is indeed confirmed by a recent paper written by Lawrence C. McCandless, Paul Gustafson and Peter C. Austin, "Bayesian propensity score analysis for observational data", which appears in Statistics in Medicine (2009; 28:94-112). The authors show that, the Bayesian 95% credible interval for the treatment effect is 10% wider than conventional propensity score C.I.

It seems that we should expect Bayesian propensity score matching (BPSM) perform better than PSM in cases where there are a lot of uncertainties in estimating the propensity scores. Before running into any simulations, however, the question is: what are the sources of the uncertainties in estimating propensity scores? From my point of view, there is at least one source of uncertainties, the uncertainties due to omitted variables. I do not think BPSM can do any better than PSM in solving this issue. But maybe, BPSM can model the error terms and so provide better estimations of the propensity scores? The above authors argue that when the association between treatment and covariates is weak (i.e., when the betas are smaller), the uncertainties in estimating propensity scores are higher. Weak association means smaller R-square or larger AIC, etc. Is this equivalent to larger bias due to omitted variables?

Another type of uncertainty related to BPSM, but not to propensity scores, is the uncertainty due to matching procedure. This is avoidable or negligible. Radically, we can just abandon the matching method and resort to linear regression model to predict the outcomes. Or we can neglect the bias from matching procedure, because when we only care about ATT and there is sufficient number of control cases, the bias is negligible, according to Abadie and Imbens 2006. ("Large Sample Properties of Matching Estimators for Average Treatment Effects." Econometrica 74 (1): 235 - 267.)

Of course, the logit model for the propensity scores could be wrong as well. But this can be manipulated in the simulations. Now my question is: how should we do the simulations to evaluate the performance of BPSM vs. that of conventional PSM?

Posted by Weihua An at 12:06 AM

February 5, 2009

Deaton on use of randomized trials in development economics

A new NBER paper by Angus Deaton takes on the trendiness of randomized trials, instrumental variables and natural experiments in development economics. One of the main points: well-designed experiments are most useful when they help uncover general mechanisms (i.e. inform theory) and can support real-life policy-making outside their narrow context. A good if lengthy read.

Deaton, A (2009) Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development, NBER Working Paper 14690. http://papers.nber.org/papers/w14690

Harvard users click here.

There is currently much debate about the effectiveness of foreign aid and about what kind of projects can engender economic development. There is skepticism about the ability of econometric analysis to resolve these issues, or of development agencies to learn from their own experience. In response, there is movement in development economics towards the use of randomized controlled trials (RCTs) to accumulate credible knowledge of what works, without over-reliance on questionable theory or statistical methods. When RCTs are not possible, this movement advocates quasi-randomization through instrumental variable (IV) techniques or natural experiments. I argue that many of these applications are unlikely to recover quantities that are useful for policy or understanding: two key issues are the misunderstanding of exogeneity, and the handling of heterogeneity. I illustrate from the literature on aid and growth. Actual randomization faces similar problems as quasi-randomization, notwithstanding rhetoric to the contrary. I argue that experiments have no special ability to produce more credible knowledge than other methods, and that actual experiments are frequently subject to practical problems that undermine any claims to statistical or epistemic superiority. I illustrate using prominent experiments in development. As with IV methods, RCT-based evaluation of projects is unlikely to lead to scientific progress in the understanding of economic development. I welcome recent trends in development experimentation away from the evaluation of projects and towards the evaluation of theoretical mechanisms.

Posted by Sebastian Bauhoff at 8:12 AM

February 3, 2009

What is Japan doing at 2:04pm?

You can now answer that question and so many more. The Japanese Statistics Bureau conducts a survey every five years called the "Survey on Time Use and Leisure Activities" where they give people journals to record their activities throughout the day. Thus, they have a survey of what people are in Japan at any given time of the day. This is fun data in of itself, but it was made downright addictive by Jonathan Soma who created a slick Stream Graph based on the data. (via kottke)

There are actually three Stream Graphs: one for the various activities, another for how the current activity differs between sexes and a final for how the current activity breaks down by economic status. Thus, the view contains not only information about daily routines, but also how those routines vary across sex and activity. For instance, gardening tends to happen in the afternoon and evening at around equal intensity and is fairly evenly distributed between men and women. Household upkeep, on the other hand, is done mostly by women and mostly in the morning. This visualization is so compelling, I think, because it allows for deep exploration of rich and interesting data (to be honest, though, I find the economic status categories a little strange and not incredibly useful).

I think there are two points that come to mind when seeing this. First is that it would fascinating to see how these would look across countries, even if it was just one other country. The category of this survey on the website for the Japanese Bureau of Statistics is "culture." Seeing the charts actually makes me wonder how different this culture is from other countries. Soma does point out, though, that Japanese men are rather interested in "productive sports" which is perhaps unique to the island.

Second, I think that Stream Graphs might be useful for other time-based data types. Long term survey projects, such as the General Social Survey, track respondent spending priorities. It seems straightforward to use a Stream Graph to capture how priorities shift over time. Other implemented Stream Graphs are the NYT box-office returns data and Lee Byron's last.fm playlist data. This graph type seems best suited for showing how different categories change over time and how rapidly they grow and how quickly they shrink. They also seem to require some knowledge of Processing. There are still some open questions here: What other types of social science data might these charts be useful for? How or should we incorporate uncertainty? (Soma warns that the Japan data is rather slim on the number of respondents)

Also: October 18th is Statistics Day in Japan. There are posters. And a slogan: "Statistical Surveys Owe You and You Owe Statistical Data"!

Posted by Matt Blackwell at 5:37 PM

February 1, 2009

Visualizing partisan discourse

Burt Monroe, Michael Colaresi, and our own Kevin Quinn have written an interesting paper (forthcoming in Political Analysis) assessing methods for selecting partisan features in language, e.g. which words are particularly likely to be used by Republicans or Democrats on a given topic. They have also provided a dynamic visualization of partisan language in the Senate on defense issues between 1997 and 2004 (screenshot below).

The most striking feature coming out of the visualization is that language on defense went through an unpolarized period leading up to 9/11 and even for several months afterward, but that polarized language blossomed in the leadup to the Iraq War and through the end of the period they examine, with Republicans talking about what they thought was at stake ("Saddam", "Hussein". "oil", "freedom", "regime") and the Democrats emphasizing the process ("unilateral", "war", "reconstruction", "billions"). (Link to visualization, a QuickTime movie.)

fightingwords.png

Posted by Andy Eggers at 8:36 AM

January 22, 2009

Studying the 2008 primaries with prediction markets: Malhotra and Snowberg

With Obama now in office the rest of the country may be about ready to move on from the 2008 election, but political scientists are of course still finding plenty to write about. Neil Malhotra and Erik Snowberg recently circulated a working paper in which they use data from political prediction markets in 2008 to examine two key questions about presidential primaries: whether primaries constrain politicians from appealing to the middle of the electorate and whether states with early primaries play a disproportionately large role in choosing the nominee. It's a very short and preliminary working paper that applies some novel methods to interesting data. Ultimately the paper can't say all that much about these big questions, not just because 2008 was an unusual year but also because of the limitations of prediction market data and the usual problems of confounding. But there is some interesting stuff in the paper and I expect it will improve in revision -- I hope these comments can help.

The most clever insight in the paper is that you can combine data from different prediction markets to estimate an interesting conditional probability -- the probability that a primary candidate will win the general election conditional on winning the nomination. (If p(G) is the probability of winning the general election and p(N) is the probability of winning the nomination (both of which are evident in prediction market contract prices), p(G|N) -- the probability of winning the general election if nominated -- can be calculated as p(G)/p(N).) In the first part of the paper, the authors focus on how individual primaries in the 2008 election affected this conditional probability for each candidate. This is interesting because classic theories in political science posit that primary elections force candidates to take positions that satisfy their partisans but hurt their general election prospects by making it harder for them to appeal to the electoral middle. If that is the case, then ceteris paribus one would expect that the conditional election probabilities would have gone down for Obama and Clinton each time it looked like the primary season would become more drawn out -- which is what happened as results of several of the primaries rolled in.

As it turns out, p(G|N) didn't move much in most primaries; if anything, it went up when the primary season seemed likely to extend longer (e.g. for Obama in New Hampshire). Perhaps this was because of the much talked about positive countervailing factors -- i.e. the extended primary season actually sharpened each candidate's electoral machines and increased their free media exposure. Of course, Malhotra and Snowberg have no way of knowing whether the binding effect of primaries exists and was almost perfectly counterbalanced by these positive factors, or whether none of these factors really mattered very much.

There is yet another possibility, which is that conditional probabilities did not move much for most primaries because most primaries did not change the market's view of how long the primary season would be. Knowing how the conditional probability changed during a particular primary only tells us something about whether having more primaries helps or hurts candidates' general election prospects if that primary changed people's expectations about how long the primary season would be. There were certainly primaries where this was the case (New Hampshire and Ohio/Texas come to mind) but for most of the primaries there was very little new information about how many more primaries would follow. Malhotra and Snowberg proceed as if they were looking for an average effect of a primary taking place on a candidate's conditional general election prospects, but if they want to talk about how having more primaries affects candidates' electability in the general election, they need to focus more squarely on cases where expectations about the length of the primary season actually changed (and, ideally, not much else changed). I would say the March Ohio/Texas primary was the best case of that, and at that time Barack Obama's p(G|N) dropped by 3 points -- a good indication that the market assumed that the net effect of a longer season on general election prospects was negative. (Although of course that primary also presumably revealed new information about whether Obama would be able to carry Ohio in the general election -- it's hard to disentangle these things.)

The second part of the paper explicitly considers the problem of assessing how "surprised" the prediction markets were in particular primaries (without explaining why this was not an issue in the first part), and employs a pretty ad hoc means of upweighting effect estimates for the relatively unsurprising contests. Some kind of correction makes sense but it seemed to me that the correction was so important in producing their results that it should be explained more fully in further revisions of the paper.

So to sum up, I liked the use of prediction markets to estimate the conditional general election probability for a candidate at a point in time, and I think it's worth getting some estimates of how particular events moved this probability. I think at this stage the conclusions are a bit underdeveloped and oversold, considering how many factors are at play and how unclear it is what information each primary introduced. But I look forward to future revisions.

Posted by Andy Eggers at 10:18 AM

January 16, 2009

Amazon Mechanical Turk for Data Entry Tasks

Yesterday I tried using Amazon's Mechanical Turk service for the first time to save myself from some data collection drudgery. I found it fascinating. For the right kind of task, and with a little bit of setup effort, it can drastically reduce the cost and hassle of getting good data compared to other methods (such as using RAs).

Quick background on Mechanical Turk (MTurk): mturk.pngThe service acts as a marketplace for jobs that can be done quickly over a web interface. "Requesters" (like me) submit tasks and specify how much they will pay for an acceptable response; "Workers" (known commonly as "Turkers") browse submitted tasks and choose ones to complete. A Requester could ask for all sorts of things (e.g. write me a publishable paper), but because you can't do much to filter the Turkers and they aren't paid for unacceptable work, the system works best for tasks that can be done quickly and in a fairly objective way. The canonical tasks described in the documentation are discrete, bite-sized tasks that could almost be done by a computer -- indicating whether a person appears in a photo, for example. Amazon bills the service as "Artificial Artificial Intelligence," because to the Requester it seems as if a very smart computer were solving the problem for you (while in fact it's really a person). This is also the idea behind the name of the service, a reference to an 18th century chess-playing automaton that actually had a person inside (known as The Turk).

The task I had was to find the full text of a bunch of proposals from meeting agendas that were posted online. I had the urls of the agendas and a brief description of each proposal, and I faced the task of looking up each one. I could almost automate the task (and was sorely tempted), but it would require coding time and manual error checking. I decided to try MTurk.

The ideal data collection task on MTurk is the common situation where you have a spreadsheet with a bunch of columns and you need someone to go through and do something pretty rote to fill out another column. That was my situation: for every proposal I have a column with the url and a summary of what was proposed, and I wanted someone to fill in the "full text" column. To do a task like this, you need to design a template that applies to each row in the spreadsheet, indicating how the data from the existing columns should appear and where the Turker should enter the data for the missing column. Then you upload the spreadsheet and a separate task is created for each row in the spreadsheet. If everything looks good you post the tasks and watch the data roll in.

To provide a little more detail: Once you sign up to be a Requester at the MTurk website, you start the process of designing your "HIT" (Human Intelligence Task). MTurk provides a number of templates to get you started. The easiest approach is to pick the "Blank Template," which is very poorly named, because the "Blank Template" is in fact full of various elements you might need in your HIT; just cut out the stuff you don't need and edit the rest. (Here it helps to know some html, but for most tasks you can probably get by without knowing much.) The key thing is that when you place a variable in the template (e.g. ${party_id}), it will be filled by an entry from your spreadsheet, based on the spreadsheet's column names. So a very simple HIT would be a template that says

Is this sentence offensive? ${sentence}

followed by buttons for "yes" and "no" (which you can get right from the "Blank Template"). If you then upload a CSV with a column entitled "sentence" and 100 rows, you will generate 100 HITs, one for each sentence.

It was pretty quick for me to set up my HIT template, upload a CSV, and post my HITs.

Then the real fun begins. Within two minutes the first responses started coming in; I think the whole job (26 searches -- just a pilot) was done in about 20 minutes. (And prices are low on MTurk -- it cost me $3.80.) I had each task done by two different Turkers as a check for quality, and there was perfect agreement.

One big question people have is, "Who are these people who do rote work for so little?" You might think it was all people in developing countries, but it turns out that a large majority are bored Americans. There's some pretty interesting information out there about Turkers, largely from Panos Ipeirotis's blog (a good source on all things MTurk in fact). Most relvenat for understanding Turkers is survey of Turkers he conducted via (of course) MTurk. For $.10, Turkers were asked to write why they complete tasks on MTurk. The responses are here. My takeaway was that people do MTurk HITs to make a little money when they're bored, as an alternative to watching TV or playing games. One man's drudgery is another man's entertainment -- beautiful.

Posted by Andy Eggers at 9:49 AM

January 13, 2009

Multiple comparisons and the "Axe" effect


Like many of us, I'm always on the lookout for good examples to use in undergraduate methods courses. My high school chemistry teacher (a former nun) said that the best teaching examples involved sex, food, or money, and that seems like reasonable advice for statistics as well. In that vein, I noted a recent article on the "Axe effect" in Metro:

'Axe effect' really works, a new study swears

Researchers in the U.K. asked women to rate the attractiveness of men wearing Axe's British counterpart, Lynx, against those who were wearing an odorless placebo.

On a 7-point scale, men wearing Lynx scored a 4.2, 0.4 point higher than those wearing the placebo.

But here's the catch: The women did not meet the men face-to-face. They watched them on video.

So what explains the discrepancy in ratings? Men wearing Lynx reported feeling more confident about themselves. So the difference in attitude appears more responsible for getting you lucky than the scent itself.

This story was not just reported in a subway tabloid; a long article appeared in the Economist. (Although at least the Metro story reported an effect size, unlike the Economist).

Is there an Axe effect? The news stories are reporting on a study in the International Journal of Cosmetic Science, "Manipulation of body odour alters men's self-confidence and judgements of their visual attractiveness by women". The researchers recruited male students and staff members from the University of Liverpool, randomly assigned some of them to use deodorant or a placebo. They then took photographs of the men as well as videos of them pretending to chat up an attractive woman. The photos and videos of the men were evaluated by "a panel of eight independent female raters" for attractiveness and self-confidence.

Medium Attractiveness Confidence
Photo Not significant (not asked)
Video, no sound Significant! Not significant
Video w/ sound Not significant Not significant

There may be an Axe effect on women's perception of men's attractiveness (but not self-confidence) if they see them on video if they can't hear them. Or it might be a fluke. This seems like a classic multiple comparison problem. With five tests, it is not that unlikely that one of them would be (barely) statistically significant. The proposed mechanism for the one "effect" (which attracted all of the media attention) was increased self-confidence on the part of the male subjects, so it seems a little odd that an effect would be found on perceived attractiveness and not on self-confidence. We might be more confident that something is going on if the effect sizes were reported for the non-significant results, but they don't appear in the paper. So, the Axe effect may be for real, but only if you keep your mouth shut.

Posted by Mike Kellermann at 8:19 PM

January 6, 2009

NYT pays tribute to R

Today's New York Times has an article about the increasing popularity of R and what it means for commercial packages. See here for ``Data Analysts Captivated by Power of R''.

Posted by Sebastian Bauhoff at 11:09 PM

December 11, 2008

About those scatterplots . . .

Amanda Cox from the NYT graphics department gave a fun talk yesterday about challenges she and her colleagues face.

One of the challenges she discussed is statistical uncertainty -- how to represent confidence intervals on polling results, for example, while not sacrificing too much clarity. Amanda provided a couple of examples where the team had done a pretty poor job of reporting the uncertainty behind the numbers; in some cases doing it properly would have made the graphic too confusing for the audience and in others there may have been a better way.

She also talked about "abstraction," by which I think she meant the issue of how to graphically represent multivariate data. She showed some multivariate graphics the NYT had produced (the history of oil price vs. demand, growth in the CPI by categorized component) that I thought were quite successful, although some in audience disagreed about the latter figure.

Amanda also showed the figure that I reproduced and discussed in an earlier post, in which I reported that the NYT graphics people think that the public can't understand scatterplots. Amanda disagrees with this (she said it annoys her how often people mention that point to her) and showed some scatterplots the NYT has produced. (She did say she thinks people understand scatterplots better when there is an upward slope to the data, which was interesting.)

The audience at the talk, much of which studies the media in some capacity and nearly all of which reads the NYT, seemed hungry for some analysis of the economics behind the paper's decision to invest so much in graphics. (Amanda said the paper spends $500,000 a month on the department.) Amanda wasn't really able to shed too much light on this, but said she felt very fortunate to be at a paper that lets her publish regression trees when, at many papers, the graphics team is four people who have their hands full producing "fun facts" sidebars and illustrations of car crash sites.

Posted by Andy Eggers at 8:37 AM

October 29, 2008

Bafumi and Herron on whether the US government is representative

Amid the name-calling, insinuation and jingoism of this political season it is easy to get a bit depressed about the democratic process. Joe Bafumi and Michael Herron have an interesting working paper that is cause for some comfort. The paper, entitled "Preference Aggregation, Representation, and Elected American Political Institutions," assesses the extent to which our federal political institutions are representative, in the sense that elected officials have similar views to those of their constituents. They do this by lining up survey questions from the Cooperative Congressional Elections Study (recently discussed in our weekly seminar by Steve Ansolabehere) alongside similar roll call votes recorded for members of Congress, as well as President Bush's positions on a number of pieces of legislation. There are enough survey questions to be able to place the survey respondents on an ideological scale (using Bayesian ideal point estimation), enough pieces of legislation to place the members of Congress and the President on an ideological scale, and enough survey questions that mirrored actual roll call votes to bring everyone together on a unified scale.

Overall, the authors find that the system is pretty effective at aggregating and representing voters' preferences. Members of Congress are more extreme than the constituencies they represent (perhaps because they represent partisans in their own districts), but the median member of a state's delegation is usually pretty close to the median voter in that state. Since the voters were surveyed in 2006, the paper is able to look at how the election affected the ideological proximity of government to the voters, and as one would hope Bafumi and Herron find that government moved somewhat closer to the voters as a result of the legislative reshuffling.

Below is one of the interesting figures from the paper. The grey line shows the density of estimated ideal points among the voters (ie CCES survey respondents); the green and purple solid lines are the density of estimated ideal points among members of the current House and Senate. The arrows show the location of the median member of the current and previous House and Senate, the median American at the time of the 2006 election (based on the survey responses), and President Bush. As you can see, before the 2006 election the House and Senate were both to the right of the median American (as was President Bush); after the Democratic sweep Congress has moved closer to the median American. Members of Congress are more partisan than the voters throughout, although this seems to be more the case on the right than the left.herron_bafumi.png

Posted by Andy Eggers at 9:45 AM

October 25, 2008

A General Inequality Parameter

There is an interesting paper by Guillermina Jasso and Samuel Kotz in Sociological methods and Research in which they analyzed the mathematical connections between two kinds of inequality: inequality between persons and inequality between subgroups. They showed that a general inequality parameter (a shape parameter c of a two-parameter continuous univariate distribution), or a deep structure of inequality, governs both types of inequality. More concretely, they demonstrated convenient measures of personal inequality like Gini coefficient, Arkinson's measure, Theil's MLD and Pearson's coefficient of variation, and measures of inequality between subgroup are nothing but functions of this general inequality parameter c. The c parameter, according to the authors, also governs the shape of Lorenz curve, a conventional graph tool to express inequality.

Given the unitary operation of this inequality parameter, the authors concluded there is a monotonic connection between personal inequality and between-group inequality, namely, as personal inequality increases, so does between-group inequality. This conclusion is kind of surprising and even contradictory to our intuition that it is very plausible, if not usual, that personal inequality can change due to within-group transfers while between-group inequality still keeps the same. The authors admitted that their conclusion hold only under certain set of conditions. For example, the derived relation between the two types of inequality assumes two-parameter distribution and non-intersecting Lorenz curves. You may consult the full article to obtain more technical details if interested.

Source:
Jasso, Guillermina and Samuel Kotz. 2008. "Two Types of Inequality: Inequality Between Persons and Inequality Between Subgroups." Sociological Methods & Research 37: 31-74.

click here to get a working paper version of that from IDEAS

Posted by Weihua An at 2:40 PM

October 22, 2008

Useful metric for comparing two distributions?

In reading Bill Easterly's working paper "Can the West Save Africa?," I came across an interesting metric Easterly uses to compare African nations with the rest of the world on a set of development indicators. The metric is, "Given that there are K African nations, what percent of the K lowest scoring countries were African?" I don't think I've ever seen anyone use that particular metric, but maybe someone has. Does it have a name? Does it deserve one?

Generally, looking at the percent of units below (or above) a certain percentile that have some feature is a way of describing the composition of that tail of the distribution. What's interesting about using a cutoff corresponding to the total number of units with that feature is that it produces an intuitive measure of overlap of two distributions: it gives us a rough sense of how many countries would have to switch places before all the worst countries were African or, put differently, before all of the African countries are in the worst group. It reminds me a bit of measures of misclassification in machine learning, where here the default classification is, "All the worst countries are African."

Needless to say, the numbers were bleak -- 88% for life expectancy, 84% for percent of population with HIV, 75% for infant mortality.

Posted by Andy Eggers at 11:02 PM

October 15, 2008

Alfred Marshall, apologist for blog readers

Like many people I know, I often find it hard to stay on task and avoid the temptations of the internet while I work. Email, blogs, news of financial meltdown -- I find myself turning to these distractions in between spurts of productivity, knowing that I would get more done if I just turned off the wireless and kept on task for longer stretches of time.

Well, those of us who have trouble giving up our blogs and other internet distractions may have an unlikely enabler in Alfred Marshall, the great economist. When he was seventeen, Marshall observed an artist who took a lengthy break after drawing each element of a shop window sign. As he later recounted, the episode shaped his own productivity strategy, towards something that sounds vaguely similar to my own routine:

That set up a train of thought which led me to the resolve never to use my mind when it was not fresh, and to regard the intervals between successive strains as sacred to absolute repose. When I went to Cambridge and became full master of myself, I resolved never to read a mathematical book for more than a quarter of an hour at a time without a break. I had some light literature always by my side, and in the breaks I read through more than once nearly the whole of Shakespeare, Boswell's Life of Johnson, the Agamemnon of Aeschylus (the only Greek play I could read without effort), a great part of Lucretius and so on. Of course I often got excited by my mathematics, and read for half an hour or more without stopping, but that meant that my mind was intense, and no harm was done.

Now, somehow I doubt that Marshall would consider the NYT op-ed pages to be "light literature" on par with Boswell, or that he would agree that watching incendiary political videos at TalkingPointsMemo.com qualifies as "absolute repose." But never mind that. Alfred Marshall told me I shouldn't work for more than fifteen minutes without distractions!

Posted by Andy Eggers at 8:06 AM

October 7, 2008

DOL visa data reveals salaries for academic jobs

With many of my friends are preparing for the annual job market song and dance, one question they will have soon is what salary expectations are appropriate for what position and institution.

It seems hard to know. Fortunately (and somewhat incredibly) the Department of Labor Foreign Labor Certification Data Center not only collects employer petitions for H-1B visas for foreign professionals, but the DOL also posts them online. The data goes back until 2001; information for other visa types is sometimes available for earlier years. Overall this seems like a great source for labor economic studies or the effects of visa restrictions etc. (Let us know if you use it!)

But the data is also good for a quick reality check on salary expectations. You can search by institution on the DOL website or type in a keyword in this search engine.

For example, looking for "assistant professor economics harvard" will reveal two visa petitions from the university, with a proposed salary of $115,000 in 2005. Stanford proposed to pay $120,000 in early 2006. The data is not just limited to academic jobs of course. You can also see that Morgan Stanley proposed to pay $85,000 for an analyst in New York in 2006. Or that a taxi company in Maryland proposed $11.41 per hour.

Naturally the data is limited since it only covers a specific group of job applicants. Maybe they'll take a lower salary in exchange for help with the visa, or they get paid more to leave their home countries. But the relative scales across institutions could be similar and it's better than no idea at all. Good luck on your job hunts and negotiations!

Posted by Sebastian Bauhoff at 2:40 PM

September 26, 2008

Recommend a Book for Probability Theory

For those of you who want to do some exercises or solve typical problems in probability theory and random processes, I strongly recommend a book by Geoffrey Grimmett and David Stirzaker, One Thousand Exercises in Probability. As the authors said in the preface, there are over three thousands of problems in the book since many exercises include several parts. Personally, I find this book very useful, partly because all exercises come with solutions, which makes it much more readable than many other counterparts, and partly because I realize some faculty here tend to adopt exercises in it and put them in class assignments and exams. (Am I here the first person who realizes this?) So I recommend this book to you and hopefully, it will help you deepen your understanding of those daunting proofs in probability theory and random processes. More luckily, you may learn how to get used to them in von Neumann's sense.


In mathematics you don't understand things, you just get used to them.

John von Neumann

Posted by Weihua An at 7:59 PM

September 25, 2008

New NBER paper charts history and future of field experiments in economics

The NBER just posted a new working paper by Steven Levitt and John List ``Field Experiments in Economics: The Past, The Present, and The Future.'' I only had a first glance and this paper looks like an easy to read history of field experiments in economics and a (short) summary of the limitations. Levitt and List also suggest that partnerships with private institutions could be the future of this field. It seems like a natural conclusion. Collaborating with the private sector should create more opportunities for good research, and the money and infrastructure will be attractive to researchers. And anyway what other sector is left to be conquered? But maybe such partnerships are only useful for certain areas of research (Levitt and List suggest the setting could be a useful laboratory for the field of industrial organization). And firms, like any institution, must have an interest to participate. This might be fine for learning about fundamental economic behavior but will we see more declarations of interest on experiments related to policy?


Levitt, S and List, J (2008) ``Field Experiments in Economics: The Past, The Present, and The Future.'' NBER Working Paper 14356, http://papers.nber.org/papers/w14356

Harvard users click here for PIN access.

This study presents an overview of modern field experiments and their usage in economics. Our discussion focuses on three distinct periods of field experimentation that have influenced the economics literature. The first might well be thought of as the dawn of "field" experimentation: the work of Neyman and Fisher, who laid the experimental foundation in the 1920s and 1930s by conceptualizing randomization as an instrument to achieve identification via experimentation with agricultural plots. The second, the large-scale social experiments conducted by government agencies in the mid-twentieth century, moved the exploration from plots of land to groups of individuals. More recently, the nature and range of field experiments has expanded, with a diverse set of controlled experiments being completed outside of the typical laboratory environment. With this growth, the number and types of questions that can be explored using field experiments has grown tremendously. After discussing these three distinct phases, we speculate on the future of field experimental methods, a future that we envision including a strong collaborative effort with outside parties, most importantly private entities.

Posted by Sebastian Bauhoff at 7:30 AM

September 24, 2008

Government as API provider

The authors of "Government Data and the Invisible Hand" provide some interesting advice about how the next president can make the government more transparent:

If the next Presidential administration really wants to embrace the potential of Internet-enabled government transparency, it should follow a counter-intuitive but ultimately compelling strategy: reduce the federal role in presenting important government information to citizens. Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use. We argue that this understanding is a mistake. It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

I've blogged here a couple of times about the role transparency-minded programmers and other private actors are playing in opening up access to government data sources. This paper draws the logical policy conclusion from what we've seen in the instances I blogged about: that third parties often do a better job of bringing important government data to the people than the government does. (For example, compare govtrack.us/opencongress.org with http://thomas.loc.gov.) The upshot of the paper is that the government should make it easier for those third parties to make the government websites look bad. By focusing on providing structured data, the government will save web developers some of the hassle involved in parsing and combining data from unwieldy government sources and reduce the time between the release of a clunky government site and the release of private site that repackages the underlying data and combines it with new sources in an interesting way.

Of course, to the extent that government data is made available in more convenient formats, our work as academic researchers gets easier too, and we can spend more time on analysis and less on data wrangling. In fact, for people doing social science stats, it's really the structured data and not the slick front-end that is important (although many of the private sites provide both).

I understand that this policy proposal is an idea that's been circulating for a while (anyone want to fill me in on the history?) and apparently both campaigns have been listening. It will be interesting to see whether these ideas lead to any change in the emphasis of government info policy.

Posted by Andy Eggers at 9:09 AM

September 18, 2008

Call for papers: the Midwest Poli Sci conference gets interdisciplinary

From Jeff Segal via Gary King, we get the following call for papers for the Midwest Political Science Conference. An interesting bit of news here is that the conference is introducing a registration discount for people outside of the discipline.

Ask your favorite political scientist what the biggest political science conference is, and she'll tell you it's the American Political Science Association. Ask her what the best political science conference is and she'll tell you it's the Midwest Political Science Association meeting, held every April in the beautiful Palmer House in Chicago.

The Midwest Political Science Association, like most academic associations, charges higher conference registration rates for nonmembers than to members. Hoping to continue to increase attendance by people outside of political science and related fields at its annual meeting, the Association will begin charging the lower (member) rate to registrants who 1) have academic appointments outside of political science or related fields (policy, public administration and political economy) and 2) do not have a PhD in political science or the same related fields.

In addition, the Association grants, on request, a substantial number of conference registration waivers for first time participants who are outside the discipline.

The call for papers for the 2009 meeting, due October 10, is at http://www.mpsanet.org/~mpsa/index.html.

Hope to see you in Chicago.

Sincerely,

Jeffrey Segal, President
Midwest Political Science Association

Posted by Andy Eggers at 6:41 AM

September 2, 2008

Study on DTCA creates media attention for causal inference

The British Medical Journal just published an great piece by Michael Law* and co-authors on the (in-)effectiveness of direct-to-consumer advertisement (DTCA) for pharmaceuticals. This issue continues to be political controversial and expensive for companies, and good studies are rare. Mike makes use of the linguistic divide in his home country Canada to evaluate the effectiveness of the ads. Canadian TV stations are not allowed to broadcast pharma ads. The French-speakers have no choice to oblige, but English-speaking Canada gets to watch ads for pharmaceuticals on US TV stations. The results suggest that for the three drugs under study, the effects of DTCA maybe very small and short-term.

An interesting fallout of this work is a wave of media attention for causal inference and identifying counterfactuals. For example the WSJ writes

[...] the new study will draw some attention because it is among the first to compare the behavior of people exposed to drug ads with people who weren't.

And the New Scientist says

However, consumer advertising is usually accompanied by other marketing efforts directly to doctors, making it difficult to tease out the effect of the ads alone.

See here for a longer list of articles at Google News.

I think it's great that the study creates so much interest (meaning it's relevant in real life) and that the media gets interested in research design. I'm curious to see the wider repercussions on both issues.


Law, Michael, Majumdar, Sumit and Soumerai, Stephen (2008) "Effect of illicit direct to consumer advertising on use of etanercept, mometasone, and tegaserod in Canada: controlled longitudinal study" BMJ 2008;337:a1055

* Disclosure: Mike is a recent graduate of the PhD in Health Policy, and a classmate and friend of mine.

Posted by Sebastian Bauhoff at 9:32 PM

June 26, 2008

Exxon-tainted research?

A few bloggers at other sites (Concurring Opinions and Election Law Blog) have pointed out an interesting footnote in the Supreme Court's recent decision on punitive damages in the Exxon Valdez case. Justice Souter took note of experimental research on jury decisionmaking done by Cass Sunstein, Daniel Kahneman, and others, but then dismissed it for the purposes of the decision because Exxon had contributed funding for the research:

The Court is aware of a body of literature running parallel to anecdotal reports, examining the predictability of punitive awards by conducting numerous “mock juries,” where different “jurors” are confronted with the same hypothetical case. See, e.g., C. Sunstein, R. Hastie, J. Payne, D. Schkade, W. Viscusi, Punitive Damages: How Juries Decide (2002); Schkade, Sunstein, & Kahneman, Deliberating About Dollars: The Severity Shift, 100 Colum. L. Rev. 1139 (2000); Hastie, Schkade, & Payne, Juror Judgments in Civil Cases: Effects of Plaintiff’s Requests and Plaintiff’s Identity on Punitive Damage Awards, 23 Law & Hum. Behav. 445 (1999); Sunstein, Kahneman, & Schkade, Assessing Punitive Damages (with Notes on Cognition and Valuation in Law), 107 Yale L. J. 2071 (1998). Because this research was funded in part by Exxon, we decline to rely on it.

It will be interesting to see whether this position is taken up by the lower courts; if so, we might see less incentive for private actors to fund social science research. That could be good or bad, I suppose, depending on one's views of likelihood that researchers will be unduly influenced by their funding sources.

Posted by Mike Kellermann at 1:13 PM

June 13, 2008

Awards for IQSS faculty

Two awards given by the Society for Political Methodology were announced today, and both of them went to IQSS faculty members (and co-authors).

The Gosnell Prize is given to the "best paper on political methodology given at a conference", and this year's prize was awarded to Kevin Quinn for his paper "What Can be Learned from a Simple Table? Bayesian Inference and Sensitivity Analysis for Causal Effects from 2x2 and 2x2xK Tables in the Presence of Unmeasured Confounding." From the announcement:

Quinn's paper offers a set of steps to improve inference with binary independent and dependent variables and unmeasured confounds. He derives large sample, non-parametric bounds on the average treatment effect and shows how these bounds do not rely on auxiliary assumptions. He then provides a graphical way to depict the robustness of inferences as one changes assumptions about the confounds. Finally, he shows how one can use a Bayesian framework relying on substantive knowledge to restrict the set of assumptions on the confounds to improve inference.

The Warren Miller prize is given annually to the best paper appearing in Political Analysis. This year's prize has been awarded to Daniel E. Ho, Kosuke Imai, Gary King, and Elizabeth A. Stuart for their article, "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference." The abstract of their paper follows:

Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it is possible to find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.

Posted by Mike Kellermann at 2:22 PM

May 31, 2008

The Tree-Friendly Academic, Part II: The Editing Process, and Getting Off the Monitor

I'm grateful for the strong response to my original query for quality, free PDF annotation for Linux. In general, there seem to be a few categories.

-Windows-based editors, adaptable through emulators: PDF X-change, Foxit (free version), primopdf
-Linux editors with non-portable annotations: Okular, which has hidden XML files for its annotations (skim, for OS X, has the same scheme)
-early, incomplete solutions that will eventually be good: GNU's PDF project, Xournal
-early, incomplete solutions that aren't user-friendly: pdfedit, Cabaret Stage
-early solutions that are still in progress: evince

Of all of these options, I like Okular the best, mainly because integrating its XML-saved annotations into the PDF is but one plugin away (which might already exist, for all I know), and it's theoretically portable to Windows by installing qt4 binaries. Using an emulator like wine is a hassle big enough that I've avoided it, for the same reason I don't use cygwin on Windows systems.

So we're close to a (more) universal free editing environment. But I'm still not a fan of doing all my work on a screen, and also not willing to print. So I'm trying a middle road.

I bought an iLiad e-paper reader this past week, and so far I'm impressed with how it handles (though its price tag, $600 for the model I bought, definitely isn't for everyone, and was almost not for me). The screen is easily readable, the battery lasts, and I can zoom in and rotate documents to get a half-page display with larger text. More importantly, the device runs Linux and iRex has made a point to try and use open source software as much as possible, in contrast to Amazon and the Kindle (which is half the size, can't read PDFs and can't edit books.)

However, as the project is still in its relative infancy, there are a few functions it has yet to incorporate that I really would like, and they're the same ones I want in a computer-based annotator: highlighting multiple-column text, for example, so that I can extract passages I want later at the push of a button. And like Okular, the annotations made on the iLiad are saved in a companion XML file rather than the original PDF, but the company offers a free program to do the merging.

I'm going to continue to explore what the iLiad can do as far as editing, but it's definitely reassuring that everyone who's seen me used it has oohed and aahed at it.

To sum up, I've now got a free platform for reading, editing and annotating PDFs on a Linux machine, and an auxiliary paper-free method for reading them later which is admittedly not free. And I have more needs as well, but I can at least see them being met soon. What else do people want in paperless work we haven't covered yet?

P.S. If the people from iRex are reading this and want me to shill for them for real, they can let me know directly.

Posted by Andrew C. Thomas at 11:05 PM

May 26, 2008

The Tree-Friendly Academic: Whither A Useful Free PDF Editor?

I'm a Linux user in need of a quality PDF reader with basic annotation tools, and I need it to be available for free. Think I'm asking for too much?

We're at a point where the level of content available online dwarfs our ability to print it all onto paper for examination and notation. As academics, we're expected to sort through volumes of other people's work in order to verify that our own is original, as well as comment, annotate, and on occasion make corrections or forward-references to later works.

But despite a boom in computational power and information bandwidth, the software to do this without resorting to printed or copied matter isn't accessible to most students without paying through the nose. Full software suites like Adobe Acrobat aren't necessary for the kind of work academics need to do. There are a few functions that are essential to the task, currently available in commercial software:

-Adding and reading notes, whether free-floating or attached to highlighted text
-The ability to select and copy multi-column text (none of the free ones seem to be able to get this one right)
-I'd like that when LaTeX creates a link to a footnote or citation, hovering over the displayed link should cause a pop-up box to display the information.

I'm a man with big ideas but no time, and more importantly, no budget, to motivate and drive the development and use of a free PDF reader with mild annotation capabilities. I can't resort to the for-pay software available from the school website because I'm running Linux, and I shouldn't have to go to a virtual machine or another computer to do this kind of annotation. Likewise, others shouldn't have to spend hundreds for software where they only need a few simple functions.

I suppose the issue is that everyone has their own toys they want included in a PDF editor, which is why the commercial package makes sense. But as academics, wouldn't we be happy with "the basics plus"?

Posted by Andrew C. Thomas at 6:34 PM

May 22, 2008

Nicholas and James are Featured in the NYT again

Professor Nicholas Christakis and Professor James Fowler's study on social network and smoking cessation is featured in the New York Times, which is also going to appear in the New England Journal of Medicine this Thursday. Congratulations to them!

Their basic findings are that smokers are likely to quit in groups (As Nicholas said, "Whole constellations are blinking off at once.") and that the remaining smokers tend to be socially marginalized.

One interesting question I have for their study is that, if friends tend to quit smoking together, will this partly contribute to the simultaneous weight gains among friends, a result Nicholas and James have found last year using the same dataset? In other words, I totally accept that social ties have important impacts on individuals' wellbeing, but if you try to research a certain outcome of wellbeing and do not control for the "contaminating" effects from other outcomes, the estimation of the social network effects on the former outcome could be biased. For example, the weight gains among friends, from this point of view, could be partially resulted from their simultaneous quitting from smoking. Of course, if smokers only consist of a very small fraction of the participants in the studied sample and their weight changes are not too extreme, the bias of the estimation should not invoke a serious problem.

See the following link for a glimpse of their study.

Study Finds Big Social Factor in Quitting Smoking
http://www.nytimes.com/2008/05/22/science/22smoke.html?partner=rssnyt&emc=rss

Sorry for the duplicate if you have noticed this news.

Posted by Weihua An at 12:01 PM

May 19, 2008

Harvard Program on Survey Research (on Youtube)

Mark Blumenthal from pollster.com has been posting interviews with scholars at the 2008 AAPOR conference, including two with our very own Sunshine Hillygus and Chase Harrison from the Program on Survey Research:

Posted by Mike Kellermann at 10:50 AM

May 15, 2008

Placebo effects and the probability of assignment to active treatment

I just finished reading an interesting paper on placebo effects in drug trials by Anup Malani. Malani noticed that participants in high probability trials know that they more likely to get active treatment (because of informed consent prior to the trial). They have higher expectations and hence should have higher placebo effects than patients in low probability trials. Malani compares outcomes across trials with different assignment probabilities and finds evidence for placebo effects. A related finding is that the control group in high probability trials reports more side effects.

The paper discusses some potential implications of placebo effects, e.g. that patients who are optimistic about the outcome might change their behavior and hence get better even without the active drug. It makes me wonder how this might translate into non-medical settings and whether there are studies of placebo effects in the social sciences. Also, if placebo drugs can improve health outcomes, maybe ineffective social programs would still work as long as participants don’t know whether the program works or doesn’t? Maybe this is the role of politics. But what about the side-effects?


Malani, A (2006) “Identifying Placebo Effects with Data from Clinical Trials” Journal of Political Economy, Vol. 114, pp. 236-256. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=901838

Abstract:
A medical treatment is said to have placebo effects if patients who are optimistic about the treatment respond better to the treatment. This paper proposes a simple test for placebo effects. Instead of comparing the treatment and control arms of a single trial, one should compare the treatment arms of two trials with different probabilities of assignment to treatment. If there are placebo effects, patients in the higher-probability trial will experience better outcomes simply because they believe that there is a greater chance of receiving treatment. This paper finds evidence of placebo effects in trials of antiulcer and cholesterol-lowering drugs.

Posted by Sebastian Bauhoff at 12:00 PM

May 13, 2008

Data sets and data interfaces at datamob.org

I recently came across Datamob.org, a site featuring public datasets and interfaces that have been built to help the public explore them.

From datamob's about page:

Our listings emphasize the connection between data posted by governments and public institutions and the interfaces people are building to explore that data.

It's for anyone who's ever looked at a site like MAPLight.org and wondered, "Where did they get their data?" And for anyone who ever looked at THOMAS and thought, "There's got to be a better way to organize this!"

I continue to wonder how the types of interfaces featured on datamob will affect the dissemination of information in society. The dream of a lot of these interface builders is to disintermediate information provision -- ie, to make it possible for citizens to do their own research, produce their own insights, publish their findings on blogs and via data-laden widgets. (We welcomed Fernanda and Martin from Many Eyes, two prominent participants in this movement, earlier this year at our applied stats workshop.) At the same time, the new interfaces make it cheaper for professional analysts -- academics, journalists, consultants -- to access the data and, as they have always done, package it for public consumption. It makes me wonder to what extent the source of our data-backed insights will really change, ie, how much more common will "I was playing around with data on this website and found out that . . . " become relative to "I heard about this study where they found that . . ."?

My hunch is that, just as blogging and internet news has democratized political commentary, the new data resources will make it possible for a new group of relatively uncertified people to become intermediaries for data analysis. (I think FiveThirtyEight is a good example in political polling, although since the site's editor is anonymous I can't be sure.) People will overwhelmingly continue to get data insights as packaged by intermediaries rather than through new interfaces to raw data, but the intermediaries (who will use these new services) will be quicker to use data in making their points, will become much larger in number, and will on average become less credentialed.

Posted by Andy Eggers at 9:48 AM

May 9, 2008

Adventures in Identification III: The Indiana Jones of Economics

fabulous three part series on further adventures in identification on the Freakonomics blogs here, here, and here. The story features Kennedy School Professor Robert Jensen in his five year long quest of achieving rigorous identification for Giffen effects. After finding correlational evidence for Giffen goods in survey data he and his co-author actually followed up by running an experiment in China and guess what, they do find evidence for Giffen behavior. Impressive empirics and a funny read, enjoy!

Posted by Jens Hainmueller at 2:16 PM

May 8, 2008

Some Random Notes about the International Network Meeting

Last week we had an International Meeting on Methodology for Empirical Research on Social Interactions, Social Networks, and Health here at the IQ., thanks to the organization by Professor Charles Manski and Professor Nicholas Christakis. Some people told me that the second day of the meeting was much more "dynamic and interactive" than the first day and based on what I have seen, I believe it was true. I saw at least three cliques of speakers were automatically formed on site along the disciplinary lines: statisticians, economists, and sociologists and political scientists. There were even sub-cliques and backfires! Fortunately, nobody was severely wounded. But anyway, it was a great intellectual exchange between disciplines. Below are some brief notes I took at the second day of the meeting, particularly at the last 20 minutes of the meeting when speakers talked about the future directions of network analysis in social sciences. Sorry for that I forgot to jot down exactly who said what, and that I also squeezed into the notes some of my personal thoughts. I took full responsibility for all errors in the notes.

1. Need to combine game theory with social network analysis, particularly evolutionary game theory (and transaction costs theory).

2. Need to further develop social network analysis based on (random) graph theory, typology and random matrix theory.

3. Network studies tend to focus on network structure and typology as dependent variables while social sciences are more concerned with how network positions and features affect node level of problems. To put simply, network studies tend to start from nodes and end at network while social sciences are more like a top-down approach.

4. In either case, however, it is very crucial to understand the data/tie generating mechanism. Especially, think that the formation of ties can go two ways: influence and selection. For example, smokers can become friends either because a person is influenced by his/her smoking friend to start smoking or because they are both smokers and then become friends. For another example, a highly educated person is usually less likely to be nominated by others as the best friend. This could be either because the highly educated person is less trustworthy or incapable to maintain friend ties or because he/she is more independent and less wiling to associate with others.Longitudinal data may help solve the influence vs. selection issue.

5. Network analysis assumes that the probability of forming ties between nodes is the same between any pair of nodes. So start with a meaningful number of nodes to build network so that each node have roughly the same probability to form ties with one another.

6. How the sever of an existing tie and the formation of a new tie will affect the structure of social network? How ties can bring more ties and lead to polarized network? Nonlinear generating processes and dynamics in network can lead to dramatic difference in network structure for any tiny changes at the node level. How network size can affect network structure? (Think about the difference among monopolistic market, oligarchic market and perfect competitive market.)

7. How to define homophyly between friends? One dimension vs. multiple dimensions? Suppose it is one dimension, there are still two approaches: 1) do a mean test between the tie senders and the tie receivers. 2) Use the ratio of the number of ties whose connected nodes are in the same group (e.g., age +/- 5) that you defined to the total number of ties as an alternative measure. What else?

8. Need to think about how to incorporate network analysis into traditional regression framework. We can either include network properties into regression models to study how network affect personal/clique level of phenomena or use regressions to evaluate how network properties are determined by socioeconomic variables.

9. How to deal with the dependence structure among node level of variables since the errors are not iid.? Is it enough to just using correlation matrix to weight the standard errors and get robust SEs?

10. Need to combine network software with traditional statistical software. The stat-net is getting there. But for Stata users, canned programs are needed to generate network data inside of Stata.

Lastly, for those of you who are interested in causal analysis, read Patrick Doreian (2001), "Causality in Social Network Analysis" (Sociological Methods and Research 30: 81-114) and see if you can improve upon his study.

Posted by Weihua An at 10:46 AM

May 6, 2008

Tuesday: Tips & Tricks

I've been programming in R for four years now, and it seems that no how much I learn there are a million tiny ways that I could do it better. We all have our own programming styles and frequently used functions that may prove useful to others. I often find that a casual conversation with an office mate yields new approaches to a programming quandary. I'm speaking not of statistical insights, though those are important too, but rather the "simple" art of data manipulation and programming implementation--those essential tricks that help to improve coding efficiency. So, to that end I'm announcing the beginning of a bi-weekly "Tuesday Tips & Tricks" posting. These tips may include the description of a useful and perhaps obscure function, or the solutions to common coding problems. I'm selfishly hoping that if readers of this blog know of better or alternate approaches, they'll respond in the comment section. So I'm looking forward to reading your responses.

This week's tip: How to quickly summarize contents of an object.

Answer: summary(), str(), dput()

The primary option, of course, is the familiar summary() command. This command works well for viewing model output, but also to get a quick sense of data frame, matrices and factors. For example, summary of a data frame or matrix shows the following:

> summary(dat1)
Hello test citynames
Min. :1.00 Min. :-3 Length:2
1st Qu.:1.25 1st Qu.:-2 Class :character
Median :1.50 Median :-1 Mode :character
Mean :1.50 Mean :-1
3rd Qu.:1.75 3rd Qu.: 0
Max. :2.00 Max. : 1

This is an incredibly useful function for numeric data, but is less useful for string data. For character vectors the summary function only reveals the length, class, and mode of the variable. In this case, to get a quick look at the data, one might want to use str(). Officially str() "compactly displays the structure of an arbitrary R object", and in practice this is incredibly useful. So using the same dataframe as an example:

> str(dat1)
'data.frame': 2 obs. of 3 variables:
$ Hello : num 1 2
$ test : num -3 1
$ citynames: chr "Cambridge" "Rochester"

In this case, this is just a 2 x 3 data frame, where the first variable is Hello, it's a numeric variable, and the values of the variable Hello are: 1, 2. In this case, the character vector for citynames is much more usefully displayed. While this is a small example, the function works just as well for much larger data frames and matrices where it only displays the first ten values of each variable.

For smaller objects, the function dput() might also prove useful. This function shows the ASCII text representation of the R object and it's characteristics. So for this same example:

> dput(dat1)
structure(list(Hello = c(1, 2), test = c(-3, 1), citynames = c("Cambridge",
"Rochester")), .Names = c("Hello", "test", "citynames"), row.names = c(NA,
-2L), class = "data.frame")

Posted by Eleanor Neff Powell at 4:41 PM

May 1, 2008

New NBER working paper by James Heckman ``Econometric Causality''

James Heckman has a new NBER working paper ``Econmetric Causality’’ which some of you might interesting. To give you a flavor, Heckman writes

``Unlike the Neyman–Rubin model, these [selection] models do not start with the experiment as an ideal but they start with well-posed, clearly articulated models for outcomes and treatment choice where the unobservables that underlie the selection and evaluation problem are made explicit. The hypothetical manipulations define the causal parameters of the model. Randomization is a metaphor and not an ideal or “gold standard".’’ (page 37)


Heckman, J (2008) ``Econometric Causality’’ NBER working paper #13934. http://papers.nber.org/papers/W13934

Abstract: This paper presents the econometric approach to causal modeling. It is motivated by policy problems. New causal parameters are defined and identified to address specific policy problems. Economists embrace a scientific approach to causality and model the preferences and choices of agents to infer subjective (agent) evaluations as well as objective outcomes. Anticipated and realized subjective and objective outcomes are distinguished. Models for simultaneous causality are developed. The paper contrasts the Neyman-Rubin model of causality with the econometric approach.

Posted by Sebastian Bauhoff at 10:00 AM

April 24, 2008

FAQs about Statistical Interactions

I am writing a short essay about the connection and distinction between indirect effect and interaction effect for a methodological class and find the following website very helpful to clarify some of the FAQs on that subject. The website is maintained by Professor Regina Branton at the Department of Political Science of Rice University.

http://www.ruf.rice.edu/~branton/interaction/faqshome.htm

Also check out the mediation item at Wikipedia and its great references.

http://en.wikipedia.org/wiki/Mediation_(statistics)

Posted by Weihua An at 11:35 AM

April 16, 2008

JAMA article on ghostwriting medical studies

The Journal of the American Medical Association published a piece today on ghostwriting of medical research. Thanks to the Vioxx lawsuits, the authors say that they found documents ``describing Merck employees working either independently or in collaboration with medical publishing companies to prepare manuscripts and subsequently recruiting external, academically affiliated investigators to be authors. Recruited authors were frequently placed in the first and second positions of the authorship list.’’ One of the exhibits uses a placeholder ``External author?’’ for the expert to be named. Obviously the idea that a pharmaceutical company is pre-writing clinical studies is as controversial as doctors possibly signing off on them without really being involved. A NYT article has some comments, and Merck has released a press statement.


Ross, J et al (2008) "Guest Authorship and Ghostwriting in Publications Related to Rofecoxib. A Case Study of Industry Documents From Rofecoxib Litigation" JAMA 299(15):1800-1812.

Posted by Sebastian Bauhoff at 10:54 PM

April 15, 2008

Google Charts from R: Maps

A few weeks ago I wrote a post sharing some code I wrote to generate sharp-looking PNG scatterplots from R using the Google Chart API. I think there are some nice uses of that (for example, as suggested by a commenter, to send a quick plot over IM), but here's something that I think could be much more useful: maps from R using Google Charts.

So, suppose you have data on the proportion of people who say "pop" (as opposed to "soda" or "coke") in each US state. (I got this data from Many-Eyes.) Once you get my code, you enter a command like this in R

googlemap(x = pct_who_say_pop, codes = state_codes, location = "usa", file ="pop.png")

and this image is saved locally as "pop.png":

To use this, first get the code via
source("http://people.fas.harvard.edu/~aeggers/googlemap.r")
which loads in a function named googlemap, to which you pass


  • x: a vector of data

  • codes: a vector of state/country codes (see the list of standard state and country codes),

  • and location a region of the world ("africa", "asia", "europe", "middle_east", "south_america", "usa") or the whole world ("world")


and you get back a url that you can embed in html as I did above, send over IM, etc. If you pass a file argument, as I did above, you can save the PNG locally.

For optional parameters to affect the scale of the figure and its colors, see the source.

Another quick example:

Suppose you wanted to make a little plot of Germany's colonial possessions in Africa. This code

googlemap(x = c(1,1,1,1), location = "africa", codes = c("CM", "TZ", "NA", "TG"),file = "germans_in_africa.png")

returns this url

"http://chart.apis.google.com/chart?cht=t&chtm=africa . . . etc.

and saves this PNG on your hard drive:

The scatterplot thing before was something of a novelty, but I think this mapping functionality could actually be useful for generating quick maps in R, since the existing approaches are pretty annoying in my (limited) experience. The Google Charts API is not very flexible about labels and whatnot, so you probably won't be publishing any of these figures. But I expect this will serve very well for quick exploratory stuff, and I hope others do too.

I'd love it if someone wanted to help roll this into a proper R package . . . .

Posted by Andy Eggers at 3:01 PM

April 10, 2008

How Network Graphs are Generated?

When Professor Nicholas Christakis came by to give a talk on social networks and health two weeks ago, some commentator expressed concern about the sparseness of information contained in network graphs (not specifically regarding Nicholas’ research, which I believe was well-done). I do share the same concern with that commentator. So afterwards I did some preliminary search on the literature about visualization of network data and found several interesting pieces that may help clarify (or even exacerbate) part of the concern some of us are having with network graphs.

The first is the lecture notes Professor Peter V. Marsden wrote about visualization of network graphs in soc275. Here I just want to highlight a few points in his notes. (Words in quotes are taken from Professor Marsden’s lecture notes.)

1) Network graphs can be “referenced to known geographical/spatial/social locations of points”.

2) Aesthetic criteria are used to generate network graphs, for examples, to minimize crossing lines, to make lines shorter, … and “[to] construct plot such that close vertices are connected, positively connected, strongly connected, or connected via short geodesics”.

3) “Location of points reflects ‘social distances’”. … “Spatial configuration differs depending on what 'distance-generating mechanism' is assumed and built in to one’s data.”

4) Some often-used network graph generating algorithms include factor analysis, multidimensional scaling (MDS) and spring embedders, etc.

So the configuration of network graphs seems to a large degree dependent on researchers’ theoretical interests and can change according to the network measures (whether it is the number of clusters within network or overall network connectedness, etc.) that researchers are mostly interested in. In other words, before generating any network graphs, researchers have to be clear about what theoretical themes they aim to present through network graphs and then select corresponding network measures and generating algorithms. For those of you who want to follow up with this topic, there are several pieces recommended by Professor Marsden in his lecture notes that I think are good starting references. See below for more details.


1. Bartholomew, David J., Fiona Steele, Irini Moustaki, and Jane I. Galbraith. 2002. The Analysis and Interpretation of Multivariate Data for Social Scientists. London: Chapman and Hall/CRC. Chapters 3 and 4.

2. Freeman, Linton C. 2005. “Graphic Techniques for Exploring Social Network Data.” Chapter 12 in Carrington, Peter J., John Scott, and Stanley Wasserman. 2005. Models and Methods in Social Network Analysis. New York: Cambridge University Press.

3. Freeman, Linton C. 2000. “Visualizing Social Networks.” Journal of Social Structure 1. (Electronically available at http://www.cmu.edu/joss/content/articles/volindex.html)

Posted by Weihua An at 11:51 AM

April 7, 2008

A Case Against Evidence Based Medicine?

smig95752.f1.gif

Seb just sent this very amusing paper (which he found in a comment to a post on Andrew Gelman's blog):

Objectives: To determine whether parachutes are effective in preventing major trauma related to gravitational challenge. Design: Systematic review of randomised controlled trials. Data sources: Medline, Web of Science, Embase, and the Cochrane Library databases; appropriate internet sites and citation lists. Study selection: Studies showing the effects of using a parachute during free fall. Main outcome measure: Death or major trauma, defined as an injury severity score > 15. Results: We were unable to identify any randomised controlled trials of parachute intervention. Conclusions: As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.

Funny how such a lampoon can trigger a flame war on the BMJ website. Makes me understand why Gary writes about Misunderstandings between experimentalists and
observationalists about causal inference
...


Posted by Jens Hainmueller at 7:16 PM

April 5, 2008

Political Economy Students Conference

Dear students and colleagues,

We would like to invite you to attend the Political Economy Student Conference, to be held on April 17th in the NBER premises, in Cambridge, MA. The conference is an opportunity for students interested in political economy and other related fields to get together and discuss the open issues in the field, know what other people are working on, and share ideas. The program of the conference can be found at:

http://www.stanford.edu/group/peg/april_2008_conference/conference_program

This year, some members of the NBER Political Economy Group will be joining us for the conference. We are sure that we will greatly benefit from their comments and suggestions during the discussions.

We hope that those of you interested will attend the conference. The success of the conference largely depends on students' attendance and participation. Given that we have limited seats for the conference, please e-mail leopoldo (at) mit (dot) edu as soon as possible if you are interested in attending so that we can secure a spot for you.

Best regards,

Leopoldo Fergusson
Marcello Miccoli
Pablo Querubin

Posted by Jens Hainmueller at 5:04 PM

April 4, 2008

Predicting Pennsylvania

Here are the results of the Pennsylvania Democratic primary, with Obama counties in purple and Clinton counties in Orange.

pa.dem.2008.png

What, you say? The Pennsylvania primary hasn't happened yet? You're right. Enter statistics!

Consider this scatterplot of Kerry's 2004 vote share versus Obama's 2008 vote shares in Ohio counties. The result is something I call the Kerry-Obama smile: Obama does well in Kerry's best counties, where staunchly Democratic urban blacks are concentrated; and in Kerry's worst regions, presumably due to Obama's appeal to crossover Republicans. Clinton does best in the wide middle swath.

kerry.obama.png

This motivates a very simple modeling idea: fit a curve to the scatterplot. Obviously, a quadratic in Kerry's share looks like a decent fit. That gives us the best-fit line shown on the plot. The R-squared is 0.16, representing an okay fit.

The next step is utterly useless, but utterly fun. We can use Ohio to predict Pennsylvania. In other words, given that we know how Kerry did in Pennsylvania counties in 2004, we can predict how well Obama will do in 2008 in every Pennsylvania county. Note that I first tweaked the model's intercept slightly in Obama's favor, so that the aggregate prediction matches the current polling average (showing Clinton up by 6.6%).

The bad news for Obama is that nearly all of Pennsylvania's counties fall in the middle of the smile. The image below compares Kerry in 2004 to the model's predictions for Obama in 2008. Obama is predicted to carry Philadelphia overwhelmingly, and to do well in some of the curvy, heavily Republican counties in the south-center of the state. Everywhere else, though, is Clinton country.

pa.comp.png

Posted by Kevin Bartz at 1:15 PM

April 3, 2008

A born-again frequentist?

It's a day or so past April 1, but if you haven't seen this post [Edit: link fixed] over at Andrew Gelman's blog, it is worth a look. It's about as good an apologia from a "born-again frequentist" as you are likely to find. An exerpt:

I like unbiased estimates and I like confidence intervals that really have their advertised confidence coverage. I know that these aren't always going to be possible, but I think the right way forward is to get as close to these goals as possible and to develop robust methods that work with minimal assumptions. The Bayesian approach--to give up even trying to approximate unbiasedness and to instead rely on stronger and stronger assumptions--that seems like the wrong way to go.

Fortunately, Gelman's conversion experience appears to have ended after about a day...

Posted by Mike Kellermann at 12:09 AM

March 28, 2008

Visualizing Data with Processing

A friend just referred me to Processing, a powerful language for visualizing data:


Processing is an open source programming language and environment for people who want to program images, animation, and interactions. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production. It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool. Processing is developed by artists and designers as an alternative to proprietary software tools in the same domain.

Their exhibition shows some very impressive results. For example, I liked the visualization of the London Tube map by travel time. I lived in Russel Square once, so this invoked pleasant memories:
carden.jpg.
If you can spare a minute also take a look at the other exhibited pieces. Most are art rather than statistics. For chess friends I especially recommend the piece called "Thinking Machine 4" by Martin Wittenberg, who gave a talk at the IQSS applied stats workshop in the fall. Enjoy!

thinking.jpg.

Posted by Jens Hainmueller at 7:43 AM

March 27, 2008

How 0.05 comes into rule?

Recently I read an article written by Erin Leahey, talking about how the usage of statistical significance testing, the 0.05 cut-off value and the three-star system becomes legitimized and dominant in mainstream sociology. According to Erin, one star stands for p<=.05, two stars p<=.01 and three stars p<=.001. But I feel the cut-off values are something like .01, .05 and .10 respectively. Anyway, Erin attributed the first usage of .05 significance level to R. A. Fisher’s book, Design of Experiments in 1935. Erin noticed that other forms of significance testing besides the .05 test were already very popular in the 1930s, when close to 40 percent of articles published in ASR and AJS applied one or another form of significance testing procedure. Based on the articles she sampled from ASR and AJS, Erin showed that the popularity of the usage of statistical significance testing and the 0.05 cut-off value roughly took an “S” shape. The usage rose firstly from the 1930s to 1950, declined afterwards until 1970 and then revived since then. Currently, around 80 percent of articles published in ASR and AJS employ both practices. The three-star system emerged in the 1950s, but became popular only after 1970. Now there were slightly above 40 percent of articles published in the above top two sociological journals use this procedure.

So what account for the diffusion of such practices? Erin brought out several arguments to answer this question. For examples, she argued that institutional factors like investment in research and computer, graduate training and institution’s academic status, and journal editor’s individual preference, etc., could be some of the most important factors in the diffusion process of these practices. Interestingly, she found that graduating from Harvard had a significant negative “effect” on adopting these statistical practices. :-)

Of course, as it happens to almost all research, Erin’s study can not avoid some minor drawbacks either. For example, her sample is only drawn from the top two sociological journals and hence the generalization power of her findings could be limited. But overall, it is a fun reading. And if you are interested in more historical account of how the statistical practices were introduced to and became legitimized in social sciences in general, Camic and Xie (1994) is a very good start.

Sources:
Leahey, Erin. 2005. Alphas and Asterisks: the Development of Statistical Significance Testing Standards in Sociology. Social Forces 84: 1-24.
Camic, Charles, and Yu Xie. 1994. “The Statistical Turn in American Social Science: Columbia University, 1890-1915.” American Sociological Review 59:773-805.

Posted by Weihua An at 11:57 AM

March 26, 2008

The Guardian features Andy and Jens' research on returns to office

A joint project by Andy Eggers and Jens Hainmueller, two long-time contributors to this blog, is the basis of a piece in The Guardian this Monday. Check out the article "How election paid off for postwar Tory MPs" and the paper "MPs For Sale? Estimating Returns to Office in Post-War British Politics". Congrats to Andy and Jens!

Posted by Sebastian Bauhoff at 4:44 PM

March 20, 2008

Correlation of Ratios or Difference Scores Having Common Terms

Yesterday I went to Professor Stanley Lieberson’s class, Issue in the Interpretation of Empirical Evidence. We discussed a paper, written by Stan and Glenn Fuguitt, titled Correlation of Ratios or Difference Scores Having Common Terms. The basic argument of this paper is that although ratios and difference scores are often used as dependent variables in traditional regression analysis, if there are some independent variables who share the same common term with those dependent variables, the estimated coefficients could be severely biased due to the spurious correlation brought about by this common term (whether it is in the denominator or numerator). For examples, if dependent variables are in the form of X/Z while independent variables are something like Y/Z, Z, or Z/X, etc., the estimated coefficients between the dependent and independent variable could become statistically significant simply due to chance.

For some concrete examples, criminologist often use crime rate (adjusted by city population size) as dependent variable while at the same time using city population size as independent variable; organizational researchers are interested in the relationship between the relative size of administration of organization and the absolute size of organization; and economists often regress GDP per capita on such variables as population growth rate, and/or even population size, etc. According to Stan and Fuguitt’s research, all the above examples will provide spurious coefficients since the dependent variable and the independent variable include common terms. In their paper, they attributed this finding back to a paper written by Kail Pearson in 1897 in which Pearson presented rigorously how the spurious correlation came from and a proximate formula for computing correlations of ratios, etc.

We were asked to do an experiment to prove the above spurious correlation, in which we generated three sets of random integers (namely, X, Y, Z) ranging from 1 to 99, presented the pairwise correlation matrix among them and found no significant correlations between any pair of variables. But we found significant correlation between Y/X and X, and when we regressed Y/X on X, the coefficient became significant too. So after such manipulations like division or subtraction, we artificially build significant correlation among two originally insignificant correlated random integers.

Why not try the following in Stata to see if the above claims are overstated or not?

set obs 50
gen x=int(99*uniform()+1)
gen y=int(99*uniform()+1)
gen z=int(99*uniform()+1)

pwcorr x y z, sig

gen ydx = y/x
pwcorr x ydx, sig
reg x ydx

gen xdz = x/z
gen ydz = y/z
pwcorr xdz ydz, sig
reg xdz ydz

gen zdy = z/y
pwcorr xdz zdy, sig
reg xdz zdy

Are you convinced by now? If not, please go read the source paper below (or just write back and say what is wrong with Stan and Fuguitt’s argument). If yes, the question now becomes what should we do with the spurious correlation. Shall we just use the original forms of variables? Shall we re-specify the Solow model? But what if our research interest is about ratio or difference? … …


Source:
Stanley Lieberson and Glenn Fuguitt, 1974. Correlation of Ratios or Difference Scores Having Common Terms, in Sociological Methodology (1973-1974), edited by Herbert Costner, San Francisco: Jossey-Rass Publishers.

Posted by Weihua An at 11:17 AM

March 18, 2008

Games That Produce Data

In a conversation with Kevin Quinn this week I was reminded of a fascinating lecture given at Google in 2006 by Luis von Ahn, an assistant professor in computer science at Carnegie Mellon. Von Ahn gives a very entertaining and thought-provoking talk on ingenious ways to apply human intelligence and judgment on a large scale to fairly small problems that computers still struggle with.

(Or watch video on Google video.)

Von Ahn devises games that produce data, the best-known example being the ESP Game, which Google acquired and developed as Google Image Labeler. In the game, you are paired with another (anonymous) player and shown an image. Each of you feverishly types in words describing the image (eg, "Spitzer", "politician", "scandal", "prostitution"); you get points and move to the next image when you and your partner agree on a label. The game is fun, even addictive, and of course Google gets a big, free payoff -- a set of validated keywords for each image.

I'm curious about how these approaches can be applied to coding problems in social science. A lot of recent interesting work has involved developing machine learning techniques to teach computers to label text, but there are clearly cases where language is just too subtle and complex to accurately extract meaning, and we need real people to read the text and make judgments. Mostly we hire RAs or do it ourselves; could we devise games instead?

Posted by Andy Eggers at 9:37 AM

March 11, 2008

What is P(Obama beats McCain)?

While the Democratic nomination contest drags on (and on and on...; Tom Hanks declared himself bored with the race last week), attention is turning to hypothetical general election matchups between Hilary Clinton or Barack Obama and John McCain. Mystery Pollster has a post up reporting on state-by-state hypothetical matchup numbers obtained from surveys of 600 registered voters in each state conducted by Survey USA. There is some debate about the quality of the data (Survey USA uses Interactive Voice Response to conduct its surveys, there is no likely voter screen, etc.). But we have what we have.

At this point, the results are primarily of interest to the extent that they speak to the "electability" question on the Democratic side; who is more likely to beat McCain? MP goes through the results state by state, classifying each state into Strong McCain, Lean McCain, Toss-up, etc. From this you can calculate the number of electoral votes in each category, which provides some information but isn't exactly what we're interested in.

This problem is a natural one for the application of some simple, naive Bayesian ideas. If we throw on some flat priors, make all sorts of unreasonably strong independence assumptions, and assume that the results were derived from simple random sampling, we can quickly get posterior distributions for the support for each candidate in each state and can calculate estimates of the probability of victory. From there, it is easy to calculate the posterior distribution of the number of electoral votes for each candidate and find posterior probabilities that Obama beats McCain, Clinton beats McCain, or the probability that Obama would receive more electoral votes than Clinton.

While I was sitting around at lunch yesterday, I ran a very quick analysis using the reported SurveyUSA marginals. Essentially, I took samples from 50 independent Dirichlet posteriors for both hypothetical matchups, assuming a flat prior and multinomial sampling density (to allow for undecideds); to avoid dealing with the posterior predictive distributions, I'm just going to assume that all registered voters will vote so I can just compare posterior proportions. When you run this, you obtain estimates (conditional on the data and, most importantly, the model) that the probability of an Obama victory over McCain is about 88% and the probability of a Clinton victory is about 72%. There is a roughly 70% posterior probability that Obama would win more electoral votes than Clinton.

As I mentioned, this is an extremely naive Bayesian approach. There are a lot of ways that one could make the model better: adding additional sources of uncertainty, allowing for correlations between the states, using historical information to inform priors, and imposing a hierarchical structure to shrink outlying estimates toward the grand mean. One place to start would be by modeling the pairs of responses to the two hypothetical matchup questions. Any of these things, however, is going to be much easier to do in a Bayesian framework, since calculating posterior distributions of functions of the model parameters is extremely easy.

Posted by Mike Kellermann at 11:17 AM

March 5, 2008

"Early Thoughts on the Autism Epidemic"

The dramatic increase in cases of autism in children over the past few years has been in the news again in recent days. Most notably, presumptive Republican presidential nominee John McCain said at a recent stop, "there’s strong evidence that indicates that it’s got to do with a preservative in vaccines." Which would be fine if such strong evidence existed; unfortunately, that is a mischaracterization of the current state of the literature to say the least. McCain has since backed away from his initial comments (see this article in yesterday's New York Times), but the debate prompted by his comments will undoubtedly continue.

By coincidence, the Robert Wood Johnson program at Harvard is sponsoring a talk tomorrow on this topic. Professor Peter Bearman (chair of the Statistics Department at Columbia) will be speaking on "Early Thoughts on the Autism Epidemic." Professor Bearman is currently leading a project on the social determinants of autism. The talk is in N262 on the second floor of the Knafel Building at CGIS from 11:00 to 12:30.

Posted by Mike Kellermann at 2:56 PM

February 23, 2008

Publication Bias in Drug Trials

A study published in the New England Journal of Medicine last month showed that widely-prescribed antidepressants may not be as effective as the published research indicates. After reading about the study in the NYT, I recently read the article and was struck by how well the authors were able to document the somewhat elusive phenomenon of publication bias.

Researchers in most fields can document publication bias only by pointing out patterns in published results. A jump in the density of t-stats around 2 is one strong sign that null reports are not being published; an inverse relationship between average reported effect size and sample size in studies of the same phenomenon is another strong sign (because the only small studies that could be published are the ones with large estimated effects). These meta-analysis procedures are clever because they infer something about unpublished studies from what we see in published studies.

As the NEJM article makes clear, publication bias is more directly observable in drug trials because we have very good information about unpublished trials. When a pharmaceutical company initiates clinical trials for a new drug, the studies are registered with the FDA; in order to get FDA approval to bring the drug to market, the company must submit the results of all of those trials (including the raw data) for FDA review. All trials conducted on a particular drug are therefore reviewed by the FDA, but a subset of those trials are published in medical journals.

The NEJM article uses this information to determine which antidepressant trials made it into the journals:

Among 74 FDA-registered studies, 31%, accounting for 3449 study participants, were not published. Whether and how the studies were published were associated with the study outcome. A total of 37 studies viewed by the FDA as having positive results were published; 1 study viewed as positive was not published. Studies viewed by the FDA as having negative or questionable results were, with 3 exceptions, either not published (22 studies) or published in a way that, in our opinion, conveyed a positive outcome (11 studies). According to the published literature, it appeared that 94% of the trials conducted were positive. By contrast, the FDA analysis showed that 51% were positive. Separate meta-analyses of the FDA and journal data sets showed that the increase in effect size ranged from 11 to 69% for individual drugs and was 32% overall.

One complaint -- I thought it was too bad that the authors did not determine whether the 22 studies that were "negative or questionable" and went unpublished were not submitted ("the file drawer problem") or rejected by the journals. But otherwise very thorough and interesting.

Posted by Andy Eggers at 2:05 AM

February 22, 2008

Bus Accidents as Random Health Shocks

A major item of interest in applied health economics is to understand the impact of health shocks on household income, investments and consumption. This relation is particularly important in developing countries that don’t have programs like universal health insurance or social insurance like Medicaid. Alas it’s also a major challenge to establish causal effects and mechanisms through which the shocks might operate. A main culprit is endogeneity, since health affects wealth and vice versa. As result there is a huge and truly inter-disciplinary literature on the topic, much of it with suspicious identification strategies.

The main struggle is to find a plausibly exogenous exposure to health shocks that have real-life relevance. A new paper by Manoj Mohanan takes this challenge seriously and looks at the effect of health shocks from bus accidents on household’s consumption, and examines what mechanisms households rely on to smooth consumption. (Full disclosure: Manoj is a classmate of mine, and I really like his work!)

To address the endogeneity problem, the paper focuses on people who have been in bus accidents as recorded by the state-run bus company in Karnataka, India. Clearly, finding a good control group is critical: people who travel on public buses may be different from those who don’t. For starters, they actually took the risk of getting on a bus – if you have ever been on the road in a developing country you’ll know what this means. Manoj’s approach is to select unexposed individuals among travelers on the same bus route, after matching on age, sex and geographic area of residence. Hence, conditional on these factors, the bus accident can be treated as exogenous.

He then compares the two groups on various dimensions. He finds that households reduce educational and festival spending by a large amount, but appear to be able to smooth food and housing consumption. He is unable to find effects on assets or labor supply. The principal coping mechanism is debt accumulation. Overall this suggests that not all is well: debt traps aside, reducing investments in education could be very costly in the long run (on this point see also Chetty and Looney, 2006).


* Chetty, R. and Looney, A. (2006) ``Consumption smoothing and the welfare consequences of social insurance in developing economies'' Journal of Public Economics, 90: 2351-2356.

Posted by Sebastian Bauhoff at 10:00 AM

February 2, 2008

Conference on ``New Technologies and Survey Research''

This year's Spring Conference of the Harvard Program on Survey Research is on ``New Technologies and Survey Research.'' It will be held on May 9, 2008, 9:00am to 5:00 pm at IQSS, and is open to the public.

See here for details.

Posted by Sebastian Bauhoff at 9:54 AM

February 1, 2008

useR! 2008 in Dortmund

Abstracts are now being accepted for the 2008 useR! conference in Dortmund, Germany. This conference is designed to bring R users and developers together to trade ideas and find out what is new in the sprawling world of R. Several of us went to the Vienna conference a few years ago, and found it very useful. Previous editions have had a good mix of academic and private sector participants, and I learned more than I have at some of the more traditional academic conferences. The announcement from the useR webpage is below; the website is at http://www.statistik.uni-dortmund.de/useR-2008/

useR! 2008, the R user conference, takes place at the Fakultät Statistik, Technische Universität Dortmund, Germany from 2008-08-12 to 2008-08-14. Pre-conference tutorials will take place on August 11.

The conference is organized by the Fakultät Statistik, Technische Universität Dortmund and the Austrian Association for Statistical Computing (AASC). It is funded by the R Foundation for Statistical Computing.

Following the successful useR! 2004, useR! 2006, and useR! 2007 conferences, the conference is focused on

  1. R as the `lingua franca' of data analysis and statistical computing,
  2. providing a platform for R users to discuss and exchange ideas how R can be used to do statistical computations, data analysis, visualization and exciting applications in various fields,
  3. giving an overview of the new features of the rapidly evolving R project.

As for the predecessor conference, the program consists of two parts:

  1. invited lectures discussing new R developments and exciting applications of R,
  2. user-contributed presentations reflecting the wide range of fields in which R is used to analyze data.

A major goal of the useR! conference is to bring users from various fields together and provide a platform for discussion and exchange of ideas: both in the formal framework of presentations as well as in the informal part of the conference in Dortmund's famous beer pubs and restaurants.

Prior to the conference, on 2008-08-11, there are tutorials offered at the conference site. Each tutorial has a length of 3 hours and takes place either in the morning or afternoon.

Call for Papers
We invite all R users to submit abstracts presenting innovations or exciting applications of R on topics such as:

Applied Statistics & Biostatistics
Bayesian Statistics
Bioinformatics
Chemometrics and Computational Physics
Data Mining
Econometrics & Finance
Environmetrics & Ecological Modeling
High Performance Computing
Machine Learning
Marketing & Business Analytics
Psychometrics
Robust Statistics
Sensometrics
Spatial Statistics
Statistics in the Social and Political Sciences
Teaching
Visualization & Graphics
and many more.

We recommend a length of about one page in pdf format. The program committee decided on the presentation format. There is no proceedings volume, but the abstracts are available in an online collection linked from the conference program and in a single pdf file.

Deadline for submission of abstracts: 2008-03-31.

Posted by Mike Kellermann at 11:55 AM

January 4, 2008

Call for Papers: Conference at Harvard on Networks in Political Science

James Fowler sent the following message to the Polmeth list, regarding a conference that we will apparently be hosting in June that may be of interest:

The study of networks has exploded over the last decade, both in the social and hard sciences. From sociology to biology, there has been a paradigm shift from a focus on the units of the system to the relationships among those units. Despite a tradition incorporating network ideas dating back at least 70 years, political science has been largely left out of this recent creative surge. This has begun to change, as witnessed, for example, by an exponential increase in network-related research presented at the major disciplinary conferences.

We therefore announce an open call for paper proposals for presentation at a conference on "Networks in Political Science" (NIPS), aimed at _all_ of the subdisciplines of political science. NIPS is supported by the National Science Foundation, and sponsored by the Program on Networked Governance at Harvard University.

The conference will take place June 13-14. Preceding the conference will be a series of workshops introducing existing substantive areas of research, statistical methods (and software packages) for dealing with the distinctive dependencies of network data, and network visualization. There will be a $50 conference fee. Limited funding will be available to defray the costs of attendance for doctoral students and recent (post 2005) PhDs. Funding may be available for graduate students not presenting papers, but preference will be given to students using network analysis in their dissertations. Women and minorities are especially encouraged to apply.

The deadline for submitting a paper proposal is March 1, 2008. Proposals should include a title and a one-paragraph abstract. Graduate students and recent Ph.D.'s applying for funding should also include their CV, a letter of support from their advisor, and a brief statement about their intended use of network analysis. Send them to networked_governance@ksg.harvard.edu. The final program will be available at www.ksg.harvard.edu/netgov.


Posted by Mike Kellermann at 5:18 PM

December 11, 2007

Coding Analysis Toolkit looking for beta testers

A recent message to the Polmeth mailing list announced that a research group at the University of Pittsburgh is looking for beta testers for some new coding reliability software that they have developed:

The Coding Analysis Toolkit (or “CAT”) was developed in the summer of 2007. The system consists of a web-based suite of tools custom built from the ground-up to facilitate efficient and effective analysis of text datasets that have been coded using the commercial-off-the-shelf package ATLAS.ti (http://www.atlasti.com). We have recently posted a narrated slide show about CAT and a tutorial online. The Coding Analysis Toolkit was designed to use keystrokes and automation to clarify and speed-up the validation or consensus adjudication process. Special attention was paid during the design process to the need to eliminate the role of the computer mouse, thereby streamlining the physical and mental tasks in the coding analysis process. We anticipate that CAT will open new avenues for researchers interested in measuring and accurately reporting coder validity and reliability, as well as for those practicing consensus-based adjudication. The availability of CAT can improve the practice of qualitative data analysis at the University of Pittsburgh and beyond.

More information is avaliable at this website: http://www.qdap.pitt.edu/cat.htm. This is far from my area of expertise, but it looks like it might be useful for some projects...

Posted by Mike Kellermann at 6:00 PM

December 5, 2007

Holiday Gifts for the Data-Addicted

The infosthetics blog offers its "shopping guide for the data-addicted." I was intrigued by the chumby and nabaztag, two devices that offer the charms of the internet divorced from the keyboard/mouse/monitor setup. For the urban planner on your list, don't miss the fly swatter whose mesh is a street map of Milan. For the social science stats crowd, though, the best gift on the list has to be the Death and Taxes poster, depicting the US federal discretionary budget in remarkable detail and clarity. Click on the image below to get a close-up look at the poster.
dat.jpg

Posted by Andy Eggers at 8:52 AM

November 30, 2007

Conference on Computational Social Science

IQSS is sponsoring a conference next Friday on the emerging area of computational social science. Below is the announcement:

The Conference on Computational Social Science (part of the Eric M. Mindich Conference series)

Friday, December 7, 2007
Center for Government and International Studies South, Tsai Auditorium (Room S010)
1730 Cambridge Street, Cambridge, MA

The development of enormous computational power and the capacity to collect enormous amounts of data has proven transformational in a number of scientific fields. The emergence of a computational social science has been slower than in the sciences. However, the combination of the still exponentially increasing computational power with a massive increase in the capturing of data about human behavior makes the emergence of a field of computational social science desirable, but not inevitable. The creation of a field of computational social science poses enormous challenges, but offers enormous promise to achieve the public good. The hope is that we can produce an understanding of the global network on which many global
problems exist: SARS and infectious disease, global warming, strife due to cultural collisions, and the livability of our cities. That is, can sensing our society lead to a sensible society?

To solve these problems will require trading off privacy versus convenience, individual freedom versus societal benefit, and our sense of individuality versus group identity. How will we decide what the sensible society will look like? This conference brings together the wide array of individuals who are working in this emerging research area to discuss how we might address these global challenges, and to evaluate the potential emergence of a field of "computational social science.

Registration is required; more information is available here.

Posted by Mike Kellermann at 9:42 AM

November 15, 2007

Artsy Statistics

From Andrew Gelman, I saw a link to an interesting "art exhibit" that's actually all about statistics and language. In some ways it reminded me of this other art exhibit that's actually all about statistics -- in this case, the meaning of some of the very large numbers we read about all the time, but find difficult to grasp on an intuitive level.

Both are worth checking out online. And if you live somewhere that you can visit either, lucky you!

Posted by Amy Perfors at 9:47 AM

October 31, 2007

The statistics of race

Amy Perfors

There's an interesting article at Salon today about racial perception. As is normally the case for scientific articles reported in the mainstream media, I have mixed feelings about it.

1) First, a pet peeve: just because something is can be localized in the brain using fMRI or similar techniques, does not mean it's innate. This drives me craaazy. Everything that we conceptualize or do is represented in the brain somehow (unless you're a dualist, and that has its own major logical flaws). For instance, trained musicians devote more of their auditory processing regions to listening to piano music, and have a larger auditory cortex and larger areas devoted toward motor control of the fingers used to play their instrument. [cite]. This is (naturally, reasonably) not interpreted as meaning that playing the violin is innate, but that the brain can "tune itself" as it learns. [These differences are linked to amount of musical training, and are larger the younger the training began, which all supports such an interpretation]. The point is, localization in the brain != innateness. Aarrgh.

2) The article talks about what agent-based modeling has shown us, which is interesting:

Using this technique, University of Michigan political scientist Robert Axelrod and his colleague Ross Hammond of the Brookings Institution in Washington, D.C., have studied how ethnocentric behavior may have evolved even in the absence of any initial bias or prejudice. To make the model as simple as possible, they made each agent one of four possible colors. None of the colors was given any positive or negative ranking with respect to the other colors; in the beginning, all colors were created equal. The agents were then provided with instructions (simple algorithms) as to possible ways to respond when encountering another agent. One algorithm specified whether or not the agent cooperated when meeting someone of its own color. The other algorithm specified whether or not the agent cooperated with agents of a different color.

The scientists defined an ethnocentric strategy as one in which an agent cooperated only with other agents of its own color, and not with agents of other colors. The other strategies were to cooperate with everyone, cooperate with no one and cooperate only with agents of a different color. Since only one of the four possible strategies is ethnocentric and all were equally likely, random interactions would result in a 25 percent rate of ethnocentric behavior. Yet their studies consistently demonstrated that greater than three-fourths of the agents eventually adopted an ethnocentric strategy. In short, although the agents weren't programmed to have any initial bias for or against any color, they gradually evolved an ethnocentric preference for one's own color at the expense of those of another color.

Axelrod and Hammond don't claim that their studies duplicate the real-world complexities of prejudice and discrimination. But it is hard to ignore that an initially meaningless trait morphed into a trigger for group bias. Contrary to how most of us see bigotry and prejudice as arising out of faulty education and early-childhood indoctrination, Axelrod's model doesn't begin with preconceived notions about the relative values of different colors, nor is it associated with any underlying negative emotional state such as envy, frustration or animosity. Detection of a difference, no matter how innocent, is enough to result in ethnocentric strategies.

As I understand it, the general reason these experiments work the way they do is that the other strategies do worse given the dynamics of the game (single-interaction Prisoner's Dilemma): (a) cooperating with everyone leaves one open to being "suckered" by more people; (b) cooperating with nobody leaves one open to being hurt disproportionately by never getting the benefits of cooperation; and (c) cooperating with different colors is less likely to lead to a stable state.

Why is this last observation -- the critical one -- true? Let's say we have a red, orange, and yellow agent sitting next to each other, and all of them decide to cooperate with a different color. This is good, and leads to an increased probability of all of them being able to reproduce, and the next generation has two red, two yellow, and two orange agents. Now the problem is apparent: each of the agents is now next to an agent (i.e., the other one of its own color) that it is not going to cooperate with, which will hurt its chances of being able to survive and reproduce. By contrast, subsequent generations of agents that favor their own color won't have this problem. And in fact, if you remove "local reproduction" -- if an agent's children aren't likely to end up next to it -- then you don't get the rise of ethnocentrism... but you don't get much cooperation, either. (Again, this is sensible: the key is for agents to be able to essentially adapt to local conditions in such a way that they can rely on the other agents close to them, and they can't do that if reproduction isn't local). I would imagine that if one's cooperation strategy didn't tend to resemble the cooperation strategy of one's parents, you wouldn't see either ethnocentrism (or much cooperation) either.

3) One thing the article didn't talk about, but I think is very important, is how much racial perception may have to do with our strategies of categorization in general. There's a rich literature studying categorization, and one of the basic findings is of boundary sharpening and within-category blurring. (Rob Goldstone has been doing lots of interesting work in this area, for instance). Boundary sharpening refers to the tendency, once you've categorized X and Y as different things, to exaggerate their differences: if the categories containing X and Y are defined by differences in size, you would perceive the size difference between X and Y to be greater than it actually is. Within-category blurring refers to the opposite effect: the tendency to minimize the differences of objects within the same category -- so you might see two X's as being closer in size than they really are. This is a sensible strategy, since the more you do so it, the better you'll be able to correctly categorize the boundary cases. However, it results in something that looks very much like stereotyping.

Research along these lines is just beginning, and it's too early to go from this observation to conclude that part of the reason for stereotyping is that it emerges from the way we categorize things, but I think it's a possibility. (There also might be an interaction with the cognitive capacity of the learning agent, or its preference for a "simpler" explanation -- the more the agent can't remember subtle distinctions, and the more the agent favors an underlying categorization with few groups or few subtleties between or within groups, the more these effects occur).

All of which doesn't mean, of course, that stereotyping or different in-group/out-group responses are justified or rational in today's situations and contexts. But figuring out why we think this way is a good way to start to understand how not to when we need to.

[*] Axelrod and Hammond's paper can be found here.

Posted by Amy Perfors at 2:32 PM

October 30, 2007

Clay Public Lecture: "Technology-driven statistics"

The Clay Mathematics Institute and the Harvard Mathematics Department are sponsoring a lecture by Terry Speed from the Department of Statistics at Berkeley on "Technology-driven statistics," with a focus on the challenges presented to statistical theory and practice presented by the massive amounts of data that are generated by modern scientific instruments (microarrays, mass spectrometers, etc.). These issues have not yet been as salient in the social sciences, but they are clearly on the horizon. The talk is at 7PM tonight (Oct. 30) in Science Center B at Harvard. The abstract for the talk is after the jump:

Technology-driven Statistics

Terry Speed, UC Berkeley and WEHI in Melbourne, Australia

Tuesday, October 30, 2007, at 7:00 PM

Harvard University Science Center -- Hall B


Forty years ago, biologists collected data in their notebooks. If they needed help from a statistician in analyzing and interpreting it, they would pass over a piece of paper with numbers on it. The theory on which statistical analyses was built a couple of decades earlier seemed entirely adequate for the task. When computers became widely available, analyses became easier and a little different. with the term "computer intensive" entering the lexicon. Now, in contemporary biology and many other areas, new technologies generate data whose quantity and complexity stretches both our hardware and our theory. Genome sequencing, genechips, mass spectrometers and a host of other technologies are now pushing statistics very hard, especially its theory. Terry Speed will talk about this revolution in data availability, and the revolution we need in the way we theorize about it.

Terry Speed splits his time between the Department of Statistics at the University of California, Berkeley and the Walter & Eliza Hall Institute of Medical Research (WEHI) in Melbourne, Australia. Originally trained in mathematics and statistics, he has had a life-long interest in genetics. After teaching mathematics and statistics in universities in Australia and the United Kingdom, and a spell in Australia's Commonwealth Scientific and Industrial Research Organization, he went to Berkeley 20 years ago. Since that time, his research and teaching interests have concerned the application of statistics to genetics and molecular biology. Within that subfield, eventually to be named bioinformatics, his interests are broad, including biomolecular sequence analysis, the mapping of genes in experimental animals and humans, and functional genomics. He has been particularly involved in the low level analysis of microarray data. Ten years ago he took the WEHI job, and now spends half of his time there, half in Berkeley, and the remaining half in the air somewhere in between.

Posted by Mike Kellermann at 12:08 AM

October 29, 2007

Visualizing Electoral Data

Andy Eggers and I are currently working on a project on UK elections. We have collected a new dataset that covers detailed information on races for the House of Commons between 1950 and 1970; seven general elections overall. We have spent some time thinking about new ways to visualize electoral data and Andy has blogged about this here and here. Today, I'd like to present a new set of plots that we came up with to summarize the closeness of constituency races over time. This is important for our project because we exploit close district races as a source of identification.

Conventional wisdom holds that in Britain, about one-quarter of all seats are 'marginal', ie. decided within majorities of less than 10 percentage points. To visualize this fact Andy and I came up with the following plot. Constituencies are on the x axis and the elections are on the y axis. Colors indicate the closeness of the district race (ie. vote majority / vote sum) categorized into different bins as indicated in the colorkey on top. Color scales are from Colorbrewer. We have ranked the constituencies from close to safe from left to right. Please take a look:

closewide.png

The same plot is available as a pdf here. The conventional wisdom seems to hold. About 30 percent of the races are close. Also some elections are closer than others.

A long format of the plot is available here. It allows to identify individual districts, but requires some scrolling. We are considering developing an interactive version using javascript so that additional info pops up as one mouses over the plot. Notice that both plots exclude the 50 or so districts that changed names as a result of the 1951 redistricting wave.

Finally, Andy and I care about districts that swing between the two major parties. To visualize this we have produced similar plots where the color now indicates the vote share margins as seen by the Conservative party: ((Conservative vote - Labour vote)/vote sum). So negative values indicate a Labour victory and positive values a victory of the Conservative party. We only look at districts where Labour or the Conservative party took first and second place. Here it is:

conswide.png

The partisan swings from election to election are really clear. Finally, the long format is here. The latter plot allows to easily identify the party strongholds during this time period. Comments and suggestions are highly welcome. We wonder whether anybody has done such plots before or whether we can legitimately coin them as Eggmueller plots (lol).


Posted by Jens Hainmueller at 8:13 PM

October 19, 2007

Tim McCarver is a Bayesian with very strong priors....

The Red Sox beat the Indians last night in Game 5 of the ALCS, sending the series back to Fenway and enabling the majority of us at Harvard who are (at least fair-weather) Sox fans to, as Kevin Youkilis said last night, come down off the bridge for a few more days. Why do I bring this up? Well, after Boston's loss in Game 4, a commenter on this blog asked the following question:

In the disastrous inning of the Red Sox game tonight, the announcer (maybe Tim McCarver?) said “One would think that a lead-off walk would lead to more runs than a lead-off home-run, but it’s not true. We’ve researched it and this year a lead-off home-run has led to more multi-run innings than have lead-off walks.”

I must not be "one", b/c I think a lead-off home-run is much more likely to lead to multiple-run innings, b/c after the home-run, you have a run and need only 1 more to have multiple, and the actions after the first batter are mostly independent of the results of the first batter. So, I think he has it totally backwards. I was a fair stats student, so I need confirmation. He was backwards, right?

The short answer is that it was Tim McCarver, and as an empirical matter he was wrong to be surprised. I don't have access to full inning-by-inning statistics over a long period of time, but the most convincing analysis I found in a quick search (here) suggests that between 1974 and 2002, the probability of a multi-run inning conditional on a leadoff walk is .242 and the probability of a multirun inning after a leadoff home run is .276.


The blogosphere has had a lot of fun at McCarver's expense (not that it takes much to provoke such a reaction, granted): It's Math!, Zero > One, Tim McCarver Does Research, etc. His observation, though, is a good example of Bayesian updating at work: while I doubt that most baseball observers "would think that a lead-off walk would lead to more runs than a lead-off home-run," it is very clear that Tim McCarver thought that at some point. As evidence, in a 2006 game he made the following comment:

"There is nothing that opens up big innings any more than a leadoff walk. Leadoff home runs don't do it. Leadoff singles, maybe. But a leadoff walk. It changes the mindset of a pitcher. Since he walked the first hitter, now all of a sudden he wants to find the fatter part of the plate with the succeeding hitters. And that could make for a big inning."

In 2004, he said during the Yankees-Red Sox ALCS that "a walk is as good as a home run." And back in 2002, he made a similar comment during the playoffs; in fact, it was that comment that prompted the analysis that I linked to above! Clearly, he had a strong prior belief (from where, I don't know) that leadoff walks somehow get in the pitcher's head and produce more big innings. Now that he's been confronted by data, those belief are updating, but since his posterior has shifted so much from his prior it's not surprising that he thinks this is some great discovery. In a couple of years, he'll probably think that he always knew a leadoff home run was better.

As for the intuition, it looks like the commenter is also correct. Using the data cited above, the probability of scoring zero runs in an inning is approx. .723, while the probability of scoring no additional runs after a leadoff homer is approx. .724; the rest of distribution is similar as well.

Posted by Mike Kellermann at 1:02 PM

October 18, 2007

R Quiz Anybody?

Perl has the Perl quiz, Python has the Python challenges, Ruby has the Ruby quiz, but what about our good old friend R?? Does such a thing exist anywhere? Would be a nice idea I think...

Posted by Jens Hainmueller at 8:52 PM

October 17, 2007

How tall are you? No, really...

Continuing on the topic of self-reported health data, and how to correct for reporting (and other) biases, here an interesting paper on height and weight in the US. Those two measures have received a lot of interest in the past years, not least as components of the body-mass index BMI which is used to estimate the prevalence of obesity. BMI itself is not a great measure (more on that another day) but at least it’s relatively easy to collect via telephone and in-person interviews. Of course some people make mistakes while reporting their own vital measures, and some might do so systematically: a height of 6 foot sounds like a good height to have even to me, and I tend to think in the metric system!

Anyway, the paper by Ezzati et al examines the issue of systematic misreporting. They note that existing smaller-scale studies on this issue might in fact under-estimate the bias because of their design. People might limit their misreporting if they are measured before or after reporting their vitals, which is a challenge for validation studies. And participation might systematically differ with the interview modes of the analysis studies and a general health surveys (e.g. in-person versus telephone interviews) so that the studies are not directly comparable to population-level surveys.

The idea of the paper is to employ two nationally representative surveys to compare three different kinds of measurement for height and weight, by age group and gender. The first survey is the National Health and Nutrition Examination Survey NHANES which collects self-reported information through in-person interviews, and also through medical examination. The second survey is the Behavior and Risk Factor Surveillance Survey BRFFS, an annual cross-sectional telephone survey that is state-level representative and features widely in policy discussions.

The comparisons between the surveys might confirm your priors on misreporting. On average, women under-report their weight and men under 65 tend to over-report their height. The authors find that state-level obesity measures based on the BRFFS are too low – they re-calculate that a number of states in fact had obesity prevalences above 30% in 2000. Of course this is not a perfectly clean assessment, because the NHANES participants might have anticipated the clinical examination a few weeks after the in-person interview. But at the least this study is a good reminder that people do systematically misreport for some reason, and that analysts should treat self-reported BMI carefully.

Posted by Sebastian Bauhoff at 10:23 PM

October 10, 2007

Visualizing the evolution of open-edited text

Today's applied stats talk by Fernanda Viegas and Martin Wattenberg covered a wide array of interesting data visualization tools that they and their colleagues have been developing over at IBM Research. One of the early efforts that they described is an applet called History Flow, which allows users to visualize the evolution of a text document that was edited by a number of people, such as Wikipedia entries or computer source code. You can track which authors contributed over time, how long certain parts of the text have remained in place, and how text moves from one part of the document to another. To give you a flavor of what is possible, here is a visualization of the history of the Wikipedia page for Gary King (who is the only blog contributor who has one at the moment):

gking1.jpg

This shows how the page became longer over time and that it was primarily written by one author. The applet also allows you to connect textual passages from earlier versions to their authors. We noticed this one from Gary's entry:

gking2.jpg

"Ratherclumsy"'s contribution to the article only survived for 24 minutes, and was deleted by another user with best wishes for becoming "un-screwed". All kidding aside, this is a really interesting tool for text-based projects. Leaving aside the possibility for analysis, this would be useful for people working on coding projects. I can think of more than one R function that I've worked on where it would be nice to know who wrote a particular section of code....

Posted by Mike Kellermann at 5:52 PM

October 8, 2007

Fernanda Viegas and Martin Wattenberg on Data Visualization

Dear Applied Statistics Community,

Please join us for this week's installment of the Applied Statistics workshop, where Fernanda Viegas and Martin Wattenberg will be presenting their talk entitled, "From Wikipedia to Visualization and Back'. The authors provided the following abstract for their talk:

This talk will be a tour of our recent visualization work, starting with a case study of how a new data visualization technique uncovered dramatic dynamics in Wikipedia. The technique sheds light on the mix of dedication, vandalism, and obsession that underlies the online encyclopedia. We discuss the reaction of the Wikipedia community to this visualization, and how it led to a recent ambitious project to make data visualization technology available to everyone. This project, Many Eyes, is a web site where people may upload their own data, create interactive visualizations, and carry on conversations. The goal is to foster a social style of data analysis in which visualizations serve not only as a discovery tool for individuals but also as a means to spur discussion and collaboration.

Martin and Fernanda have also provided the following set of links as background for the presentation:

http://alumni.media.mit.edu/~fviegas/papers/history_flow.pdf

http://www.research.ibm.com/visual/papers/viegasinfovis07.pdf


And to a website based upon recent work in data visualization

Link to Many Eyes site:
www.many-eyes.com


As always, the workshop meets at 12 noon on Wednesday, in room N-354 CGIS-Knafel. A light lunch will be provided

Posted by Justin Grimmer at 12:02 PM

October 4, 2007

Another way of thinking about probability?

Amy Perfors

On Tuesday I went to a talk by Terrence Fine from Cornell University. It was one of those talks that's worth going to, if nothing else because it makes you re-visit and re-question the sort of basic assumptions that are so easy to not even notice that you're making. In this case, that basic assumption was that the mathematics of probability theory, which views probability as a real number between 0 and 1, is equally applicable to any domain where we want to reason about statistics.

Is this a sensible assumption?

As I understand it, Fine made the point that in many applied fields, what you do is start from the phenomenon to be modeled and then use the mathematical/modeling framework that is appropriate to it. In other words, you go from the applied "meaning" to the framework: e.g., if you're modeling dynamical systems, then you decide to use differential equations. What's odd in applications of probability theory, he said, is that you basically go from the mathematical theory to the meaning: we interpret the same underlying math as having different potential meanings, depending on the application and the domain.

He discussed four different applications, which are typically interpreted in different ways: physically-determined probability (e.g., statistical mechanics or quantum mechanics); frequentist probability (i.e., more data driven); subjective probability (in which probability is interpreted as degree of belief); and epistemic/logical (in which probability is used to characterize inductive reasoning in a formal language). Though I broadly agree with these distinctions, I confess I'm not getting the exact subtleties he must be making: for instance, it seems to me the interpretation of probability in statistical mechanics is arguably very different from in quantum mechanics and they should therefore not be lumped together: in statistical mechanics, the statistics of flow arise some underlying variables (i.e., the movements of individual particles), and in quantum mechanics, as I understand it, there aren't any "hidden variables" determining the probabilities as all.

But that technicality aside, the main point he made is that depending on the interpretation of probability and the application we are using it for, our standard mathematical framework -- in which we reason about probabilities using real numbers between 0 and 1 -- may be inappropriate because it is either more or less expressive than necessary. For instance, in the domain of (say) IQ, numerical probability is probably too expressive -- it is not sensible or meaningful to divide IQs by each other; all we really want is an ordering (and maybe even a partial ordering, if, as seems likely, the precision of an IQ test is low enough that small distinctions aren't meaningful[1]). So a mathematics of probability which views it in that way, Fine argues, would be more appropriate than the standard "numerical" view.

Another example would be in quantum mechanics, where we actually observe a violation of some axioms of probability. For instance, the distributivity of union and intersection fails: P(A or B) != P(A)+P(B)-P(A and B). This is an obvious place where one would want to use a different mathematical framework, but since (as far as I know) people in quantum mechanics actually do use such a framework, I'm not sure what his point was. Other than it's a good example of the overall moral, I guess?

Anyway, the talk was interesting and thought-provoking, and I think it's a good idea to keep this point in the back of one's mind. That said, although I can see why he's arguing that different underlying mathematics might be more appropriate in some cases, I'm not convinced yet that we can conclude that using a different underlying mathematics (in the case of IQ, say) would therefore lead to new insight or help us avoid misconceptions. One of the reasons numerical probability is used so widely -- in addition to whatever historical entrenchment there is -- is that it is an indispensible tool for doing inference, reasoning about distributions, etc. It seems like replacing it with a different sort of underlying math might result in losing some of these tools (or, at the very least, require us to spend decades re-inventing new ones).

Of course, other mathematical approaches might be worth it, but at this point I don't know how well-worked out they are, and -- speaking as someone interested in the applications -- I don't know if they'd be worth the work in order to see. (They might be; I just don't know... and, of course, a pure mathematician wouldn't care about this concern, which is all to the good). Fine gave a quick sketch of some of these alternative approaches, and I got the sense that he was working on developing them but they weren't that well developed yet -- but I could be totally wrong. If anyone knows any better, or knows of good references on this sort of thing, please let us know in comments. I couldn't find anything obvious on his web page.

[1] I really really do not want to get into a debate about whether and to what extent IQ in general is meaningful - that question is really tangential to the point of this post, and I use IQ as illustration only. (I use it rather than something perhaps less inflammatory because it's the example Fine used).

Posted by Amy Perfors at 12:40 PM

June 20, 2007

SPM Career Achievement Award

The Society for Political Methodology has announced the winner of its inaugural Career Achievement Award. The first recipient will be Chris Achen, currently the Roger Williams Straus Professor of Social Sciences at Princeton University. The award will be presented at the APSA meeting this summer at the society's business meeting. Chris was chosen to receive the award by a committee consisting of Simon Jackman, Mike Alvarez, Liz Gerber and Marco Steenbergen, and their citation does a fine job of summarizing his many accomplishments over the years.

On a personal note, Chris was my senior thesis advisor back in 00-01 when he was at Michigan. That came about through a bit of luck; I had never taken a class from him, and one of the other professors at Michigan asked him to meet with me as a favor. Despite this, he was unfailingly generous with both support and constructive criticism. At least at the time, Chris had the habit of working rather late in the evenings. When I was working on my thesis, I'd often send him an e-mail asking a few questions when I left the computer lab at night, and by the time I got home there would be an answer in my inbox pointing out what I had missed or suggesting some new approach to try. If Chris hadn't taken me on as an advisee back then, I probably would not be in graduate school today.

The citation follows on the jump:

Christopher H. Achen is the inaugural recipient of the Career Achievement Award of the Society for Political Methodology. Achen is the Roger William Straus Professor of Social Sciences in the Woodrow Wilson School of Public and International Affairs, and Professor of Politics in the Department of Politics, at Princeton University. He was a founding member and first president of the Society for Political Methodology, and has held faculty appointments at the University of Michigan, the University of California, Berkeley, the University of Chicago, the University of Rochester, and Yale University. He has a Ph.D. from Yale, and was an undergraduate at Berkeley.

In the words of one of the many colleagues writing to nominate Achen for this award, "Chris more or less made the field of political methodology''. In a series of articles and books now spanning some thirty years, Achen has consistently reminded us of the intimate connection between methodological rigor and substantive insights in political science. To summarize (and again, borrowing from another colleague's letter of nomination), Achen's methodological contributions are "invariably practical, invariably forceful, and invariably presented with clarity and liveliness''. In a series of papers in 1970s, Chris basically showed how us how to do political methodology, elegantly demonstrating how methodological insights are indispensable to understanding a phenomenon as central to political science as representation. Achen's "little green Sage book'', Interpreting and Using Regression (1982) has remained in print for 25 years, and has provided generations of social scientists with a compact yet rigorous introduction to the linear regression model (the workhorse of quantitative social science), and is probably the most widely read methodological book authored by a political methodologist. Achen's 1983 review essay "Towards Theories of Data: The State of Political Methodology'' set an agenda for the field that still powerfully shapes both the practice of political methodology and the field's self-conception. Achen's 1986 book The Statistical Analysis of Quasi-Experiments provides a brilliant exposition of the statistical problems stemming from non-random assignment to "treatment'', a topic very much in vogue again today. Achen's 1995 book with Phil Shivley, Cross-Level Inference, provides a similarly clear and wise exposition of the issues arising when aggregated data are used to make inferences about individual behavior ("ecological inference''). A series of papers on party identification --- an influential 1989 conference paper, "Social Psychology, Demographic Variables, and Linear Regression: Breaking the Iron Triangle in Voting Research'' (Political Behavior, 1992) and "Parental Socialization and Rational Party Identification'' (Political Behavior, 2002) --- have helped formalize the "revisionist'' theory of party identification outlined by Fiorina in his 1981 Retrospective Voting book, and now the subject of a lively debate among scholars of American politics.

In addition to being a productive and extremely influential scholar, Achen has an especially distinguished record in training graduate students in methodology, American politics, comparative politics, and international relations. His students at Berkeley in the late 1970s and early 1980s included Larry Bartels (now at Princeton), Barbara Geddes (UCLA), Steven Rosenstone (Minnesota), and John Zaller (UCLA), among many others. His students at Michigan in the 1990s include Bear Braumoeller (now at Harvard), Ken Goldstein (Wisconsin), Simon Hug (Texas-Austin), Anne Sartori (Princeton), and Karen Long Jusko (Stanford). In addition to being the founding president of the Society for Political Methodology, Chris has been a fellow at the Center for Advanced Study in the Behavioral Sciences, has served as a member of the APSA Council, has won campus-wide awards for both research and teaching, and is a member of the American Academy of Arts and Sciences.

Posted by Mike Kellermann at 11:23 PM

June 13, 2007

Statistics and the Death Penalty

A few days ago, the AP moved a story reporting on academic studies of the deterrent effect of the death penalty on potential murderers. Many media outlets picked up the story under headlines such as "Studies say death penalty deters crime", "Death penalty works: studies", and my favorite, "Do more executions mean fewer murders?" Presumably the answer to the last question is yes, at least in the limit; if the state were to execute everyone (except the executioner, of course), clearly there would be fewer murderers.

I was surprised when I read the article on Monday morning, since my sense of the state of play in this area is that it is probably impossible to tell one way or the other. Those are the findings of a recent study by Donohue and Wolfers, which finds most existing studies to be flawed and, more importantly, points out a variety of reasons why estimating the correct deterrent effect is difficult in principle. Here is some of what Andrew Gelman had to say about their study last year:

My first comment is that death-penalty deterrence is a difficult topic to study. The treatment is observational, the data and the effect itself are aggregate, and changes in death-penalty policies are associated with other policy changes.... Much of the discussion of the deterrence studies reminds me of a little-known statistical principle, which is that statisticians (or, more generally, data analysts) look best when they are studying large, clear effects. This is a messy problem, and nobody is going to come out of it looking so great.

My second comment is that a quick analysis of the data, at least since 1960, will find that homicide rates went up when the death penalty went away, and then homicide rates declined when the death penalty was re-instituted (see Figure 1 of the Donohue and Wolfers paper), and similar patterns have happened within states. So it's not a surprise that regression analyses have found a deterrent effect. But, as noted, the difficulties arise because of the observational nature of the treatment, and the fact that other policies are changed along with the death penalty. There are also various technical issues that arise, which Donohue and Wolfers discussed.

Given the tone of the article (and certainly the headlines), you would have thought that the Donohue and Wolfers paper had been overlooked by the reporter, but no: he cites it in the article, and he interviewed Justin Wolfers! He seems to have missed the point, however; the issue is not that some studies say that "there is a deterrent effect" and some say "we're just not sure yet". The problem is that we aren't sure, and we probably never will be unless someone gets to randomly assign death penalty policy to states or countries. This raises a problem that we often face in social science: there are questions that are interesting, and there are questions that we can answer, and the intersection of those two categories is probably a lot smaller than any of us would like. This doesn't seem to be a realization that has crept into the media as of yet, so it is no surprise that studies that purport to give answers to interesting questions will get more coverage than those pointing out why those answers probably don't mean very much.

Posted by Mike Kellermann at 4:19 PM

June 7, 2007

Gosnell Prize Winner

Congratulations to the 2007 Gosnell Prize winners - Harvard's very own Alberto Abadie, Alexis Diamond, and Jens Hainmueller! They won for their paper "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program", which was presented at this year's MPSA conference in Chicago. We saw an earlier version of the paper this past semester at the Applied Stats workshop, and I have to say, the award is well deserved. The Gosnell Prize is awarded to the best paper presented at any political science conference in the preceding year. Alexis is a two-time recipient, having shared the award with Jas Sekhon in 2005 for their paper on genetic matching.

Posted by Mike Kellermann at 3:51 PM

June 5, 2007

Stata 10 announced

Yesterday, StataCorp announced that Stata 10 will be available from June 25. Apart from a bunch of new routines, a main attraction will be their new graph editor which might well resolve major nightmares for users. Also it appears that there is now a way to copy & paste results to other applications without loosing the formating. Overall the new version looks great, if you're so inclined.

Here the announcement sent out on Statalist yesterday, and a longer description on the StataCorp website.

Posted by Sebastian Bauhoff at 8:46 AM

May 23, 2007

Disclosing clinical trials

The New York Times has an article today ("For Drug Makers, a Downside to Full Disclosure") discussing the recent creation of archives for pharmecutical clinical trial data, including data from trials that did not result in publications. This effort is an attempt to deal with the age old problem of publication bias, a problem supposedly identified by the ancient Greeks, as described in a letter to the editor of Lancet by Mark Pettigrew:

The writings of Francis Bacon (1561-1626) are a good starting point. In his 1605 book, The Advancement of Learning, he alludes to this particular bias by pointing out that it is human nature for "the affirmative or active to effect more than the negative or privative. So that a few times hitting, or presence, countervails oft-times failing or absence". This is a clear description of the human tendency to ignore negative results, and Bacon would be an acceptable father figure. Bacon, however, goes further and supports his claim with a story about Diagoras the Atheist of Melos, the fifth century Greek poet.

Diagoras was the original atheist and free thinker. He mocked the Eleusinian mysteries, an autumnal fertility festival which involved psychogenic drug-taking, and was outlawed from Athens for hurling the wooden statue of a god into a fire and sarcastically urging it to perform a miracle to save itself. In the context of publication bias, his contribution is shown in a story of his visit to a votive temple on the Aegean island of Samothrace. Those who escaped from shipwrecks or were saved from drowning at sea would display portraits of themselves here in thanks to the great sea god Neptune. "Surely", Diagoras was challenged by a believer, "these portraits are proof that the gods really do intervene in human affairs?" Diagoras' reply cements his claim to be the "father of publication bias": "yea, but . . . where are they painted that are drowned?"



While dealing with publication bias would seem to be a good thing, the Times article suggests (perhaps in an attempt to avoid publication bias itself) that some people are worried about this practice:

Some experts also believe that releasing the results of hundreds of studies involving drugs or medical devices might create confusion and anxiety for patients who are typically not well prepared to understand the studies or to put them in context.

“I would be very concerned about wholesale posting of thousands of clinical trials leading to mass confusion,” said Dr. Steven Galson, the director for the Center for Drug Evaluation and Research at the F.D.A.

It is a little hard for me to believe that this confusion would be worse than the litany of possible side effects given at the end of every pharmecutical commercial, but that is a different issue. From a purely statistical point of view, it seems like this is a no-brainer, a natural extension of efforts to ensure that published results can be replicated. Whether you are a frequentist or a Bayesian, inferences should be better when conditioned on all of the data that has been collected, not just the data that researchers decided to use in their publications. There could be a reasonable argument about what to do with (and how do define) corrupted data - data from trials that blew up in one way or another - but this seems like a second-order consideration.

It would be great if we could extend this effort into the social sciences. It would be easier to do this for experimental work since the data collection process is generally well defined. On the other hand, I suspect that there is less of a need for archives of experimental data in the social sciences, for two reasons. First, experimental work is still rare enough (at least in political science) that I think you have a decent chance of getting published even with "non-results". Second, my sense is that, with the possible exception of researchers closely associated with particular policy interventions, the incentives facing social scientists are not the same as those facing pharmecutical researchers. Social scientists may have a preference for "significant" results, but in most cases they don't care as much about the direction.

The kind of data archive described above would be more useful for observational research, but much harder to define. Most social scientists have invested significant time and energy collecting observational data only to find that there are no results that reviewers would think were worth publishing. On the other hand, how do we define a trial for observational data? Should there be an obligation to make one's data available any time that it is collected, or should it be restricted to data that has been analyzed and found uninteresting? Or should we think of data and models together, and ask researcher to share both their data and their analysis? I'm not sure what the answer is, but it is something that we need to think about as a discipline.


Posted by Mike Kellermann at 7:18 PM

May 22, 2007

Statistics and the law

Over at the Volokh Conspiracy, Professor Elmer Elhauge from Harvard Law School has a post about the future of empirical legal studies, comparing the law today to baseball before the rise of sabermetrics. From the post:

In short, in law, we are currently still largely in the position of the baseball scouts lampooned so effectively in Moneyball for their reliance on traditional beliefs that had no empirical foundation. But all this is changing. At Harvard Law School, as traditional a place as you can get, we now have by my count 10 professors who have done significant statistical analysis of legal issues. We just hired our first JD with a PhD in statistics. The movement is not at all limited to Harvard, and seems to be growing at all law schools.

So we are hardly devoid of empirical analysis of law. We are just, rather, in our early Bill James era, and can expect the analysis to get more sophisticated and systematic as things progress. I expect within a couple of decades we will have our own book distilling the highlights of things we will know then that conflict with what is now conventional legal wisdom.

We are all pretty pleased that Harvard Law now has a stats Ph.D. on faculty. But one of the commenters raises an interesting question; if empirical legal studies are like sabermetrics, who is the legal equivalent of Joe Morgan?

Posted by Mike Kellermann at 8:49 AM

May 10, 2007

Surveying Multiethnic America

The Program on Survey Research at Harvard is hosting an afternoon conference tomorrow on the challenges of surveying multiethnic populations:

Surveying Multiethnic America

May 11, 2007
12:30 – 5:00

Institute for Quantitative Social Science
CGIS N-050
1737 Cambridge St.
Cambridge, MA 02138

Across a variety of different academic disciplines, scholars are interested in topics related to multiethnic populations, and sample surveys are one of the primary means of studying these populations. Surveys of multiethnic populations face a number of distinctive methodological challenges, including issues related to defining and measuring ethnic identity, and locating, sampling, and communicating with the groups of interest.

This afternoon panel sponsored by the Program on Survey Research at Harvard University will look at recent survey research projects on multiethnic populations in the US. Researchers will discuss how they confronted the unique methodological challenges in their survey projects and will consider the implications of their approach for their key theoretical and empirical findings.

Schedule:

12:30 - 2:45

Sunshine Hillygus, Harvard University, Introduction

Manuel de la Puente, US Bureau of the Census, Current Issues in Multiethnic Survey Methods

Guillermina Jasso, New York University, New Immigrant Study

Deborah Schildkraut, Tufts University, The 21st Century Americanism Study

Yoshiko Herrera, Harvard University, Discussant


3:00 - 5:00

Tami Buhr, Harvard University, Harvard Multi-Ethnic Health Survey

Ronald Brown, Wayne State University, National Ethnic Pluralism Survey

Valerie Martinez-Ebers, Texas Christian University, National Latino Politics Survey

Kim Williams, Harvard University, Discussant

Posted by Mike Kellermann at 12:05 PM

May 9, 2007

What's your optimal GPA?

Amy Perfors

This may not be new to anybody but me, but recent news at UNC brought the so-called "Achievement Index" to my attention. The Achievement Index is a way of calculating GPA that takes into account not only how well one performs in a class, but also how hard the class is relative to others in the institution. It was first suggested by Valen Johnson, a professor of statistics at Duke University, in a paper in Statistical Science titled "An Alternative to Traditional GPA for Evaluating Student Performance." (The paper is available on his website; you can also find a more accessible pdf description here).

This seems like a great idea to me. The model, which is Bayesian, calculates "achievement index" scores for each student as latent variables that best explain the grade cutoffs for each class in the university. As a result, it captures several phenomena: (a) if a class is hard and full of very good students, then a high grade is more indicative of ability (and a low grade less indicative of lack of ability); (b) if a class is easy and full of poor students, then a high grade doesn't mean much; (c) if a certain instructor always gives As then the grade isn't that meaningful -- though it's more meaningful if the only people who take the class in the first place are the extremely bright, hard-working students. Your "achievement index" score thus reflects your actual grades as well as the difficulty level of the classes you have chosen.

Why isn't this a standard measure of student performance? 10 years ago it was proposed at Duke but failed to pass, and at UNC they are currently debating it -- but what about other universities? The Achievement Index addresses multiple problems. There would be less pressure toward grade inflation, for one thing. For another, it would address the unfortunate tendency of students to avoid "hard" classes for fear of hurting their GPA. Students in hard majors or taking hard classes also wouldn't be penalized in university-wide, GPA-based awards.

One might argue that students shouldn't avoid hard classes simply because of their potential grade, and I tend to agree that they shouldn't -- it was a glorious moment in my own college career when I finally decided "to heck with it" and decided to take the classes that interested me, even if they seemed really hard. But it's not necessarily irrational for a student to care about GPA, especially if important things -- many of which I didn't have to worry about -- hinge on it: things like scholarships or admission to medical school. Similarly, instructors shouldn't inflate grades and create easy classes, but it is often strictly "rational" to do so: giving higher grades can often mean better evaluations and less stress due to students whinging for a higher grade, and easier classes are also easier to teach. Why not try to create a system where the rational thing to do within that system is also the one that's beneficial for the university and the student in the long run? It seems like the only ones who benefit from the current system are the teachers who inflate their grades and teach "gimme" courses and the students who take those easy courses. The ones who pay are the teachers who really seek to challenge and teach their students, and the students who want to learn, who are intellectually curious and daring enough to take courses that challenge them. Shouldn't the incentive structure be the opposite?

I found a petition against the Achievement Index online, and I'm not very persuaded by their arguments. One problem they have is that it's not transparent how it works, which I could possibly see being a concern... but there are two kinds of transparency, and I think only one really matters. If it's not transparent because it's biased or subjective, then that's bad; but if it's not transparent simply because it's complicated (as this is), but is in fact totally objective and is published how it works - then, well, it's much less problematic. Sometimes complicated is better: and other things that matter a great deal for our academic success -- such as SATs and GREs -- aren't all that transparent either, and they are still very valuable. The petition also argues that using the AI system will make students more competitive with each other, but I confess I don't understand this argument at all: how will it increase competition above and beyond the standard GPA?

Anyway, it might seem like I'm being fairly dogmatic about the greatness of the Achievement Index, but I don't intend to be. I have no particular bone to pick, and I got interested in this issue originally mainly just because I wanted to understand the model. It's simply that I don't really see any true disadvantages and I wonder what I'm missing. Why don't more universities try to implement it? Can anyone enlighten me?

Posted by Amy Perfors at 10:20 AM

May 8, 2007

Data for Replications

We have blogged a fair bit about reproducibility standards and data-sharing for replication (see here and here). Some journals require authors to make datasets and codes available for a while already, and now these policies start to show effects. For example the American Economic Review requires authors to submit their data since 2004, and this information is now available on their website. The AER provides a basic readme document and files with the used variables for an increasing number of articles since late 2002; some authors also provide their program codes. There's a list of articles with available data here.

The 2006 Report of the Editor suggests that most authors now comply with the data posting requirements and that only few exceptions are made. At this point AER is pretty much alone among the top economics journals with offering this information. I wonder if authors substitute between the AER and other journals. Since the AER is still a very desirable place to publish, maybe this improves the quality of AER submissions if only confident authors submit? At least for now the submission statistics in the editor’s report don't suggest that they are loosing authors. Meanwhile hundreds of grad students can rejoice in a wealth of interesting papers to replicate.

Posted by Sebastian Bauhoff at 11:33 AM

May 7, 2007

No Applied Stats Workshop until September

Just as a reminder, the Applied Statistics Workshop has wrapped up for this academic year. Thanks to all who came to the talks, and we look forward to seeing you again in September.

Posted by Mike Kellermann at 1:38 PM

May 2, 2007

Is There a Statistics/Economics Divide?

OK, so now that I have a job, I feel like I can stick my foot in something smelly to see what happens. When I was on the market this past year, I was often asked about the difference (lawyers are always careful to ask about "the difference, if any") between a degree in statistics and a degree in something more "traditional" for a law scholar, such as economics or political science or sociology. Because of the prevelance and power of the Law & Economics movement in legal scholarship, there was particular interest in the difference between statistics and economics/econometrics. I had a certain amount of trouble answering the question. It was easy to point out that the best quantitative empiricists move within all fields and are able to read all literatures. As an aspiring statistician, it was also easy to give the statistical version of things, which is that statisticians invent data analysis techniques and methods that, after ten to twenty-five to forty years, filter into or are reinvented by other fields (whenever I said this, I clarified that this story was a caricature).

So what is the difference between an empirical, data-centered economist and an applied statistician? The stereotypes I've internalized from hanging out in an East Coast statistics department are that economists tend to focus more on parameter estimation, asymptotics, unbiasedness, and paper-and-pencil solutions to problems (which can then be implemented via canned software like STATA), whereas applied statisticians are leaning more towards imputation and predictive inference, Bayesian thinking, and computational solutions to problems (which require programming in packages such as R). Anyone care to disabuse me of these notions?

Posted by James Greiner at 12:07 PM

May 1, 2007

Racial bias in basketball?

The New York Times has an article discussing a working paper by Justin Wolfers and Joseph Price, looking at the rate at which white referees call fouls on black players (and black referees call fouls on white players). The paper can be found here. I haven't had a chance to read it yet, but if it uses "multivariable regression analysis" as it says in the Times article, then I'm sure it must be good.

Posted by Mike Kellermann at 11:21 PM

April 18, 2007

Appellate Cases and SUTVA Violations

Around a month ago, I blogged about the dangers of using appellate case outcomes as datapoints. The basic idea is that most models or inference structures assume some kind of independence among the units, perhaps independence given covariates (in which case the residuals are assumed to be i.i.d.), or perhaps the "Stable Unit Treatment Value Assumption" in the causal inference context. When applied to appellate cases in the United States legal system, these analyses assume away precedent. The instincts I developed as a practicing litigator tell me not to believe a study that assumes away precedent.

One solution to this problem previously proposed in the causal inference literature is to match "treated" and "control" appellate cases that are very close in time to each other (whatever "treated" and "control" are here). After a conversation I had with Mike Kellermann a week or so ago, I think this cure may be worse than the disease. The idea behind comparing cases very close in time to one another is that the general state of the law (in part defined by precedent) for the two cases will be similar. That's right, but recent developments in the law are more on the minds of judges.

Suppose Case A got treatment, and Case B got control. If the matching algorithm has worked, Case A and Case B will be similar in all ways except the treatment. If Case A and Case B are also close in time to one another, how plausible is it the judges who decide both will decide them without regard to each other?

Posted by James Greiner at 4:48 PM

April 11, 2007

Why I wish TV news was really boring

Amy Perfors

I've posted before about the various ways that the mass media of today interacts badly with cognitive heuristics people use, in such a way as to create apparently irrational behavior. Spending a fair amount of time recently standing in long security lines at airports crystallized another one to me.

The availability heuristic describes people's tendency to judge that events that are really emotionally salient or memorable are more probable than events that aren't, even if the ones that aren't are actually statistically more likely. One classic place you see this is in estimates of risk of dying in a terrorist attack: even though the odds are exceedingly low of dying this way (if you live in most countries, at least), we tend to spend far more resources, proportionally, fighting terror than in dealing with more prosaic dangers like automobile accidents or poverty. There might be other valid reasons to spend disproportionately -- e.g., terrorism is part of a web of other foreign-policy issues that we need to focus on for more long-term benefits; or people don't want to sacrifice the freedoms that would be necessary (like more restrictive speed limits) to make cars safer; or it's not very clear how to solve some problems (like poverty) -- and I really don't want to get into those debates -- the point is just that I think most everyone would agree that in all of those cases, at least part of the reason for the disproportionate attention is because dying in a terrorist attack is much more vivid and sensational than dying an early death because of the accumulated woes of living in poverty. And there's plenty of actual research showing that the availability heuristic plays a role in many aspects of prediction.

There's been a lot of debate about whether this heuristic is necessarily irrational. Evolutionarily speaking, it might make a lot of sense to pay more attention to the more salient information. To steal an example from Gerd Gigerenzer, if you live on the banks of a river and for 1000 days there have been no crocodile sightings there, but yesterday there was, you'd be well-advised to disregard the "overall statistics" and keep your kids from playing near the river today. It's a bit of a just-so story, but a sensible one, from which we might infer two possible morals: (a) as Steven Pinker pointed out, since events have causal structure, it might make sense to pay more attention to more recent ones (which tend to be more salient); and (b) it also might make sense to pay more attention to emotionally vivid ones, which give a good indication of the "costs" of being wrong.

However, I think the problem is that when we're talking about information that comes from mass media, both of these reasons don't apply as well. Why? Well, if your information doesn't come from mass media, to a good approximation you can assume that the events are statistically representative of the events that you might be likely to encounter. If you get your information from mass media, you cannot assume this. Mass media reports on events from all over the world in such a way that they can have the same vividness and impact as if they were in the next town over. And while it might be rational to worry a lot about crime if you consistently have shootings your neighborhood, it doesn't make as much sense to worry about it if there are multiple shootings in cities hundreds of miles away. Similarly, because mass media reports on news - i.e., statistically rare occurrences - it is easy to get the dual impression that (a) rare events are less rare than they actually are; and (b) that there is a "recent trend" that needs to be paid attention to.

In other words, while it might be rational to keep your kids in if there were crocodile attacks at the nearby river yesterday, it's pretty irrational to keep them in if there were attacks at the river a hundred miles away. Our "thinking" brains know this, but if we see those attacks as rapidly and as vividly as if they were right here -- i.e., if we watch them on the nightly news -- then it's very hard to listen to the thinking brain... even if you know about the dangers. And cable TV news, with its constant repetition, makes this even harder.

The source of the problem is due to the sampling structure of mass media, but it's of course far worse if the medium makes the message more emotional and vivid. So there's probably much less of a problem if you get most of your news from written sources -- especially multiple different ones -- than TV news. That's what I would guess, at least, though I don't know if anyone has actually done the research.

Posted by Amy Perfors at 3:11 PM

April 10, 2007

What determines which statistical software you use?

I was recently involved in a discussion among fellow grad students about what determines which statistical software package people use to analyze their data. For example, this recent market survey lists 44 products selected from 31 vendors and they do not even include packages like R that many people around Harvard seem to use. Another survey conducted by Alan Zaslavsky lists 15 packages while `just’ looking at the available software for the analysis of surveys with complex sample designs. So how do people pick their packages given the plethora of options? Obviously, many factors will go into this decision (departmental teaching, ease of use, type of methods used, etc. etc. etc. ). One particularly interesting factor in our discussion concerned the importance of academic discipline. It seems to be the case that different packages are popular in different disciplines. But how exactly usage patterns vary across fields remains unclear. We wondered whether any systematic data exists on this issue? For example, how many political scientists use R compared to other programs? What about statisticians, economists, sociologists, etc.? Any information would be highly appreciated.

Posted by Jens Hainmueller at 10:12 PM

April 4, 2007

Trial-Level Criminal Outcomes

With a coauthor, I am involved in a project which in part attempts to assess the effect of assigning judge A versus judge B to outcomes at the trial level in criminal cases. I've begun a literature search on this, and it seems like most attention thus far has focused on the sentencing stage (particularly relating to the controversy over the federal sentencing guidelines), and that few authors have used what one might call modern or cutting edge causal inference thinking. Can anyone out there help here? I'm I missing important studies?

(Feel free to email me off-blog if you'd prefer.)

Posted by James Greiner at 3:24 PM

CCCSN - Devon Brewer

The Cambridge Colloquium on Complexity and Social Networks is sponsoring a talk tomorrow that may be of some interest to readers of this blog. Details below:

"Taking Person, Place, and Time Seriously in Infectious Disease Epidemiology and
Diffusion Research"

Devon D. Brewer, University of Washington

Thursday, April 5, 2007
12:00 - 1:30 p.m.
CGIS North, 1737 Cambridge Street, Room N262

Abstract: Social scientists and field epidemiologists have long appreciated the role of social networks in diffusion processes. The cardinal goal of descriptive epidemiology is to examine "person, place, and time" in relation to the occurrence of disease or other health events. In the last 20 years, most infectious disease epidemiologist have moved away from the field epidemiologistÿÿs understanding of transmission as embedded in contact structures and shaped by temporal and locational factors. Instead, infectious disease epidemiologists have employed research designs that are best suited to studying non-infectious chronic diseases but unable to provide meaningful insight on transmission processes. A comprehensive and contextualized infectious disease epidemiology requires assessment of person (contact structure and individual characteristics), place, and time, together with measurement of specific behaviors, physical settings/fomites, and the molecular biology of pathogens, infected persons, and susceptible persons. In this presentation, I highlight examples of research that include multiple elements of this standard. From this overview, I show in particular how the main routes of HIV transmission in poor countries remain unknown as a consequence of inappropriate design in epidemiologic research. In addition, these examples highlight how diffusion research in the social sciences might be improved with greater attention to temporal and locational factors.

Devon D. Brewer, Ph.D., Director, has broad training and experience in thesocial and health sciences. Much of his past research has focused onsocial networks, research methods and design, memory and cognition, drug abuse, violence, crime, sexual behavior, and infectious disease (including sexually transmitted diseases, HIV, and hepatitis C). He earned his
bachelor's degree in anthropology from the University of Washington and his doctorate in social science from the University of California, Irvine. Prior to founding Interdisciplinary Scientific Research, Dr. Brewer held research positions at the University of Washington, an administrative position with Public Health-Seattle and King County, and teaching positions at the University of Washington, Pacific Lutheran University, and Tulane University. He has been a principal investigator on federal research grants and authored/co-authored more than 60 scientific publications.

Posted by Mike Kellermann at 11:31 AM

April 2, 2007

Applied Statistics - Richard Berk

This week, the Applied Statistics Workshop will present a talk by Richard Berk, professor of criminology and statistics at the University of Pennsylvania. Professor Berk received his Ph.D. from Johns Hopkins University and served on the faculties of Northwestern, UC-Santa Barbara and UCLA before moving to Penn in 2006. He has published widely in journals in statistics and criminology. His research focuses on the application of statistical methods to questions arising in the criminal justice system. One of his current projects is the development and application of statistical learning procedures to anticipate failures on probation or parole and to forecast crime “hot spots” a week in advance.

Professor Berk will present a talk entitled "Counting the Homeless in Los Angeles County," which is based on joint work with Brian Kriegler and Donald Ylvisaker. Their paper is available through the workshop website. The presentation will be at noon on Wednesday, April 2 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:

Counting the Homeless in Los Angeles County

Richard Berk
Department Statistics
Department of Criminology
University of Pennsylvania

ABSTRACT

Over the past two decades, a variety of methods have been used to count the homeless in large metropolitan areas. In this paper, we report on a recent effort to count the homeless in Los Angeles County. A number of complications are discussed including the need to impute homeless counts to areas of the County not sampled and to take the relative costs of underestimates and overestimates of the number of homeless individuals into account. We conclude that despite their imperfections, the estimated counts provided useful and credible information to the stakeholders involved. Of course, not all stakeholders agreed.

Joint work with Brian Kriegler and Donald Ylvisaker.

Posted by Mike Kellermann at 8:23 AM

March 30, 2007

"That looks cool!" versus "What does it mean?"

Every Sunday, I flip open the New York Times Magazine to the weekly social commentary, "The Way We Live Now," and I check out the accompanying data presentation graphic. First, I think, "That looks cool." Then, for the next several minutes, I wonder, "What does it mean?" I'm usually looking at an illustration like this:

I sat down to write this entry ready to argue that clarity is always more important than aesthetics when communicating with data and that the media needs to be more educated when it comes to data presentation. I still think those things. However, after a little googling, I discovered that Catalogtree (as in "Chart by Catalogtree" in the graphic above) is a Dutch design firm, not a research organization, and I started to wonder whether the Times knowingly prioritizes art over data for these graphics. Maybe communication is not the primary goal. This is, after all, a magazine, including fashion and a serial comic strip along with coverage of political and social issues.

How should a publication balance illustration and information? If I belong to a statistics department, am I allowed to say, "That looks cool!" and not point out that a chart is indecipherable? My gut reaction is that information should always win, but maybe I'm wrong - and I do like the designs. You can see some of Catalogtree's other creations for the Times here and their other work here.

Posted by Cassandra Wolos at 1:49 PM

March 28, 2007

The singular of data is anecdote

Amy Perfors

This post started off as little more than some amusing wordplay brought on by the truism that "the plural of anecdote is not data". It's a sensible admonition -- you can't just exchange anectodes and feel like that's the equivalent of actual scientific data -- but, like many truisms, it's not necessarily true. After all, the singular of data is anecdote: every individual datapoint in a scientific study constitutes an anecdote (though admittedly probably a quite boring one, depending on the nature of your study). A better truism would therefore be more like "the plural of anecdote is probably not data", which of course isn't nearly as catchy.

The post started that way, but then I got to thinking about it more and I realized that the attitude embodied by "the plural of anecdote is not data" -- while a necessary corrective in our culture, where people far more often go too far in the other direction -- isn't very useful, either.

A very important caveat first: I think it's an admirable goal -- definitely for scientists in their professional lives, but also for everyone in our personal lives -- to as far as possible try to make choices and draw conclusions informed not by personal anecdote(s) but rather by what "the data" shows. Anecdote is notoriously unreliable; it's distorted by context and memory; because it's emotionally fraught it's all too easy to weight anecdotes that resound with our experience more highly and discount those that don't; and, of course, the process of anecdote collection is hardly systematic or representative. For all of those reasons, it's my natural temptation to distrust "reasoning by anecdote", and I think that's a very good suspicion to hone.

But... but. It would be too easy to conclude that anecdotes should be discounted entirely, or that there is no difference between anecdotes of different sorts, and that's not the case. The main thing that turns an anecdote into data is the sampling process: if attention is paid to ensuring not only that the source of the data is representative, but also that the process of data collection hasn't greatly skewed the results in some way, then it is more like data than anecdote. (There are other criteria, of course, but I think that's a main one).

That means, though, that some anecdotes are better than others. One person's anecdote about an incredibly rare situation should properly be discounted more than 1000 anecdotes from people drawn from an array of backgrounds (unless, of course, one wants to learn about that very rare situation); likewise, a collection of stories taken from the comments of a highly partisan blog where disagreement is immediately deleted -- even if there are 1000 of them -- should be discounted more than, say, a focus group of 100 people carefully chosen to be representative, led by a trained moderator.

I feel like I'm sort of belaboring the obvious, but I think it's also easy for "the obvious" to be forgotten (or ignored, or discounted) if its opposite is repeated enough.

Also, I think the tension between the "focus on data only" philosophy on one hand, and "be informed by anecdote" philosophy on the other, is a deep and interesting one: in my opinion, it is one of the main meta-issues in cognitive science, and of course comes up all the time in other areas (politics and policy, personal decision-making, stereotyping, etc). The main reason it's an issue, of course, is that we don't have data about most things -- either because the question simply hasn't been studied scientifically, or because it has but in an effort to "be scientific" the sample has been restricted enough that it's to know how well one can generalize beyond it. For a long time most studies in medicine used white men only as subjects; what then should one infer regarding women, or other genders? One is caught between the Scylla of using possibly inappropriate data, and the Charybdis of not using any data at all. Of course in the long term one should go out and get more data, but life can't wait for "the long term." Furthermore, if one is going to be absolutely insistent on a rigid reliance on appropriate data, there is the reductive problem that, strictly speaking, a dataset never allows you to logically draw a conclusion about anything other than itself. Unless it is the entire population, it will always be different than the population; the real question comes in deciding whether it is too different -- and as far as I can tell, aside from a few simple metrics, that decision is at least as much art as science (and is itself made partly on the basis of anecdote).

Another example, one I'm intimately familiar with, is the constant tension in psychology between ecological and external validity on the one hand, and proper scientific methodology on the other. Too often, increasing one means sacrificing the other: if you're interested in categorization, for instance, you can try to control for every possible factor by limiting your subjects to undergrad students in the same major, testing everyone in the same blank room at the same time of day, creating stimuli consisting of geometric figures with a clear number of equally-salient features, randomizing the order of presentation, etc. You can't be completely sure you've removed all possible confounds, but you've done a pretty good job. The problem is that what you're studying is now so unlike the categorization we do every day -- which is flexible, context-sensitive, influenced by many factors of the situation and ourselves, and about things that are not anything like abstract geometric pictures (unless you work in a modern art museum, I suppose) -- that it's hard to know how it applies. Every cognitive scientist I know is aware of this tension, and in my opinion the best science occurs right on the tightrope - not at the extremes.

That's why I think it's worth pointing out why the extreme -- even the extreme I tend to err on -- is best avoided, even if it seems obvious.

Posted by Amy Perfors at 10:06 AM

March 26, 2007

Applied Statistics - Spring Break

As many of you know, Harvard is on spring break this week, so the Applied Statistics Workshop will not meet. Please join us next Wednesday, April 4, for a presentation by Professor Richard Berk of the University of Pennsylvania. And for those of you at Harvard, enjoy some time off (or at least some time without students!).

Posted by Mike Kellermann at 8:19 AM

March 21, 2007

Efficient Vacationing, Summer 2007

With the ice melting and the birds chirping it’s the time again for planning the summer. Here a few worthwhile reasons not to be stuck behind your desk all summer. Maybe these are not the most exotic events and locations but at least they are ‘productive’ and you won’t feel guilty for being away.

The Michigan Summer Institute in Survey Research Techniques runs several sessions over a total of eight weeks from June 4 to July 27. The courses are mainly about designing, writing and testing surveys, and analyzing survey data. The level of the courses differs but they have some advanced courses on sampling and analysis. Because of a modular setup, it's possible to pick and choose broadly. I've heard good things about this institute, particularly from people who want to collect their own data.

Also in Michigan is the Summer Program in Quantitative Methods of Social Research which runs two sessions from June 25 to August 17. This program focuses on analytics and also caters for different levels of sophistication. I only know a few people who attended this program, with mixed reviews. Much seems to depend on what courses you actually take, some are great and others so-so.

The University of Chicago hosts this years’ Institute on Computational Economics from July 30 to August 9. The topics are quite advanced and focus on programming approaches to economic problems. This seems to be quite worthwhile, if it's your interest.

Further afield is the Mannheim Empirical Research Summer School from July 8 – 20. This event focuses on analysis of household data but also features sessions on experiment design and behavioral economics. I didn't hear about previous schools but would be curious to find out.

There are other summer schools that don’t have a strong methods focus. Harvard, LSE and a host of other universities offer a number of courses that might provide a quick dip into some of the substantive topics.

Posted by Sebastian Bauhoff at 6:19 PM

March 20, 2007

Judicial Decisions as Data Points

Empirical, particularly quantitative empirical, scholarship is all the rage these days in law schools. (By the way, as a quantitative legal empiricist,that makes me really nervous. If there's one constant in legal academia, it's that things go in and out of style as fast in law schools as they do in Milan fashion shows.)

One thing that has been bothering me lately about this next phase, new wave, dance craze aspect of legal scholarship is the use of appellate cases as datapoints. It's tempting to think that one can code appellate decisions or judicial opinions pursuant to some neutral criteria, then look for trends, tease out inferences of causation, etc. Here's a note of caution: they're not i.i.d. They're probably not i.i.d. given X (whatever X is). Precedent matters. In our legal system, the fact that a previous appellate case (with a published opinion) was decided a certain way is a reason to decide a subsequent, facially similar appellate case the same way, even if the first decision might have been (arguably) wrong. Folks will argue over how much precedent matters; all I can tell say is that as a law clerk to an appellate judge, I participated in numerous conversations that resulted in the sentiment, "I might/would have decided the present case differently had Smith v. Jones not been on the books, but I see no grounds for departing from the reasoning of Smith v. Jones here." I.i.d. models, or analyses that assume non-interference among units, should be viewed with great caution in this setting.

Posted by James Greiner at 4:40 PM

March 18, 2007

Three-way ties and Jeopardy: Or, Drew questions the odds

It's been in the news that a three-way tie happened on Jeopardy on Friday night. From the AP article:

The show contacted a mathematician who calculated the odds of such a three-way tie happening — one in 25 million.

I have to believe that the mathematician contacted didn't have all the facts (and the AP rushed to meet deadline), because once you're in Final Jeopardy there's little randomness about it. It's all down to game theory.

Suppose we first estimate the odds that all three players are tied at the end of Double Jeopardy.The total dollar value shared by all three is around $30000, or about $10000 each. Since questions have dollar values which are multiples of $200, we could reasonably assume that there are 100 dollar values, between 0 and 20000, where each player can end up. So the odds of a tie at this stage should be no more than one in a million - and this is a very conservative guess, since I assume that the probabilities are all equal (whereas they would likely have a central mode around 10000.)

Breaking a three way tie with a Final Jeopardy question would then require that all three players bet the same amount, and I think the odds are considerably less than 1 in 20 that they'd all bet the farm no matter the category.

But it shouldn't even get that far. The scenario on Friday night had two players tied behind the leader who didn't have a runaway. So we have somewhere around 1 in 20,000 odds that this would happen (the factor of two because the third player could be ahead or behind the tied players.)

The runners-up would both be highly likely to bet everything in order to get past the leader. And the leader, in this case, placed a tying bet for great strategic reasons - getting one more day against known opposition rather than taking the chance of a new superstar appearing the next day - as well as a true demonstration of giving away someone else's money to appear magnanimous.

Even if the leader only had a 10% chance of making that call, and given that the other two players were pressured to bet high, that's still 1 in 200,000 - over 100 times more likely with a fairly conservative estimation process.

Posted by Andrew C. Thomas at 11:14 PM

March 14, 2007

Who makes a good peer reviewer?

Amy Perfors

One of the interesting things about accruing more experience in a field is that as you do so, you find yourself called upon to be a peer reviewer more and more often (as I'm discovering). But because I've never been an editor, I've often wondered what this process looks like from that perspective: how do you pick reviewers? And what kind of people tend to be the best reviewers?

A recent article in the (open-access) journal PLoS Medicine speaks to these questions. Even though it's in medicine, I found the results somewhat interesting for what they might imply or predict about other fields as well.

In a nutshell, this study looked at 306 reviewers from the journal Annals of Emergency Medicine. Each of the 2,856 reviews (of 1,484 separate manuscripts) had been rated by the editors of the journal on a five-point scale (1=worst, 5=best). The study simply tried to identify what characteristics of the reviewers could be used to predict the effectiveness of the review. The basic finding?

Multivariable analysis revealed that most variables, including academic rank, formal training in critical appraisal or statistics, or status as principal investigator of a grant, failed to predict performance of higher-quality reviews. The only significant predictors of quality were working in a university-operated hospital versus other teaching environment and relative youth (under ten years of experience after finishing training). Being on an editorial board and doing formal grant (study section) review were each predictors for only one of our two comparisons. However, the predictive power of all variables was weak.

The details of the study are somewhat helpful for interpreting these results. When I first read that younger was better, I wondered to what extent this might simply be because younger people have more time. After looking at the details, I think this interpretation, while possible, is doubtful: the youngest cohort were defined as those that had less than ten years of experience after finishing training, not those who were largely still in grad school. I'd guess that most of those were on the tenure-track, or at least still in the beginnings of their career. This is when it's probably most important to do many many things and be extremely busy: so I doubt those people have more time. Arguably, they might just be more motivated to do well precisely because they are still young and trying to make a name for themselves -- though I don't know how big of a factor it would be given the anonymity of the process: the only people you're impressing with a good review are the editors of the journals.

All in all, I'm not actually that surprised that "goodness of review" isn't correlated with things such as academic rank, training in statistics, or being a good PI: not that those things don't matter, but my guess would be that nearly everyone who's a potential reviewer (for what is, I gather, a fairly prestigious journal) would have sufficient intelligence and training to be able to do a good review. If that's the case, then the best predictors of reviewing quality would come down to more ineffable traits like general conscientiousness and motivation to do a good review... This interpretation, if true, implies that a good way to generate better reviews is not to just choose big names, but rather to make sure people are motivated to put the time and effort into those reviews. Unfortunately, given that peer review is largely uncredited and gloryless, it's difficult to see how best to motivate them.

What do you all think about the idea of making these sort of rankings public? If people could put them on their CV, I bet there would suddenly be a lot more interest in writing good reviews... at least for the people for whom the CV still mattered.

Posted by Amy Perfors at 6:45 PM

March 13, 2007

Which Color for your Figure?

ever wondered about what would be the best color for your graphs? While common in the sciences, it may be fair to say that the use of color in graphs is still under-appreciated in many social science fields. Colors can be a every effective tool to visualize data in many forms, because color is essentially a 3-d concept:

- hue (red, green, blue)
- value/lightness: (light vs. dark)
- saturation/chroma (dull vs. vivid)

From my limited understanding of this topic, not much scientific knowlegde exists about how color is best used. However, a few general principles have emerged from the literature. For example, sequential information (ordering) is often best indicated through distinction in lightness. The tricky part here is that indicating sequence with colors requires the viewer to remember the color ordering. A small number of colors should be used. One principle that is sometimes advocated is the use of a neutral color midpoint, that makes sense when there is a "natural" midpoint in the data. If so, you may want to distinguish above and below the midpoint, and use dark color1 -> light color1 -> white -> light color2 -> dark color2 (e.g., dark blue to dark red) . If no natural midpoint exists, one option is to use a single hue and just vary lightness (e.g., white/pink to dark red). Another idea is that categorical distinctions are best indicated through hue (e.g., red=higher than average, blue=lower than average). Read Edward Tufte and the cites therein for more ideas on the use of color. In addition, a nice online tool that helps you choose color in a principled way is ColorBrewer, a website definitely worth a visit. Many of the color schemes advocated there are also available in R in the ColorBrewer {RColorBrewer} library. Good luck!

Posted by Jens Hainmueller at 11:14 PM

March 7, 2007

More on Cheating

In my last post, I solicited comments on ways to cheat when using a design-before-analysis framework for analyzing observational studies. My claim was that if one does the hard work of distinguishing intermediate outcomes from covariates (followed usually by discarding the former) and of balancing the covariates (often done by discarding non-comparable observations) without access to the outcome variable, it should be hard(er) to cheat. Felix suggested one way that should work but that should also be fairly easy to spot: temporarily substitute in a "good" (meaning highly predictive of the outcome variable) covariate as the outcome and find a design that achieves the desired result, then use this design with the "real" outcome. In a comment, Mike suggested another way: do honest observational studies, but don't tell anyone about those that don't come to desired results.

Here's my thought: in many observational settings, we have a strong prior that there is either an effect in a particular direction or no effect at all. In an anti-discrimination lawsuit, for example, the issue is whether the plaintiff class is suffering from discrimination. There is usually little chance (or worry) that the plaintiff class is in fact benefiting from discrimination. Thus, the key issue is whether the estimated causal effect is statistically (and practically/legally) significant. With that in mind, it seems like a researcher might be able to manipulate the distance metric essential to any balancing process. When balancing, we have to define (a) a usually one-dimensional distance metric to decide how close observations are to one another, and (b) a cutoff point beyond which we say observations are too far from one another to risk inference, in which case we discard the offending observations. If one side of a debate (e.g., the defendant) has an interest in results that are not statistically significant, that side can insist on distance metrics and cutofff points that result in discarding (as too far away from their peers) a great many observations. A smaller number of observations generally means less precision and a lower likelihood of a significant result. The other side can, of course, do the opposite.

I still think we're way better off in this world than in the model-snooping of regression. What do people think?

Posted by James Greiner at 4:53 PM

March 6, 2007

More Tools for Research

It’s been a while since Jens and I summarized some useful tools for research. Since then more productivity tools have appeared that make life easy for researchers. Some of the following might only work for Harvard affiliates but maybe your outfit offers something similar.

First, Harvard offers a table of contents service. After signing up you can request to receive the table of contents of most journals that Harvard Libraries carries. The handy part is a “Find it @ Harvard” button next to each article; clicking it takes you to the article through the library's account so that you have full access. This service also allows you to manage all journal subscriptions through only one account. (Best make the service email you the TOC as attachment, as in-text tables occasionally get cut off. Also, your spam filter might intercept those emails so check there if you don’t receive anything.)

Second, Harvard provides a new toolbar for the Firefox browser called LibX (see here). This provides quick links to Harvard’s e-tools (citation index, e-resources etc), lets you search in the Hollis catalog and provides a drag&drop field for Google Scholar. If you’re on a journal website without having gone through Harvard libraries, LibX allows you to reload the restricted webpage via Harvard to access to the full-text sources. Another nice feature is that LibX embeds cues in webpages. For example if you have installed the tool and are looking at a book on Amazon, you will notice a little Harvard shield on the page. Clicking it takes you straight to the book’s entry in Hollis. LibX also provides automatic links to print and e-resources for ISBN, DOI’s and other identifiers.

There are other useful tools for Firefox. I recently discovered the ScrapBook add-on which essentially works like bookmarks, but allows you to store only the part of a web page you’re interested in. Simply select the part and store it in your scrapbook. You can then access it offline and also comment or highlight. You can sort and import/export items too. A further useful built-in function uses search keywords in Firefox. This allows you to access a search box on any website through a user-defined keyword. For example you can define ``gs'' as keyword for the search box on the Google Scholar website. Then entering ``gs'' and a search term in the location bar in Firefox takes you straight to the search results for that term. If you use Google Scholar through your library you'll even get full access to the articles straight away.

Posted by Sebastian Bauhoff at 7:07 PM

February 27, 2007

Adventures in Identification II: Exposing Corrupt Politicians

Today we continue our voyage in the treasure quest for identification in observational studies. After our sojourn in Spain two weeks ago, the next stopover is in Brazil, where in a recent paper Claudio Ferraz and Frederico Finan discovered a nice natural experiment that allows to estimate the effect of transparency on political accountability. Many in the policy world are agog over the beneficial impact of transparency on good governance. Yet, empirical studies of this subject are often bedevilled by selection problems for obvious reasons. Ideally, we would like to find a situation in which changes in transparency are randomly assigned, which (also for obvious reasons) tends to be a low probability event. But is does happen. Turns out that in a recent anti-corruption program in Brazil, the federal government randomly audits 60 municipalities every month and then discloses the findings of the report to the municipality and the media. The authors exploit this variation and find that the dissemination of information on corruption, which is facilitated by media, does indeed have a detrimental impact on the incumbent’s electoral performance.

Here is the abstract of the paper:

Exposing Corrupt Politicians: The Effects of Brazil’s Publicly Released Audits on Electoral
Outcomes

This paper examines whether access to information enhances political accountability. Based upon the results of Brazil’s recent anti-corruption program that randomly audits municipal expenditures of federally-transferred funds, it estimates the effects of the disclosure of local government corruption practices upon the re-election success of incumbent mayors. Comparing municipalities audited before and after the elections, we show that the audit policy reduced the incumbent’s likelihood of re-election by approximately 20 percent, and was more pronounced in municipalities with radio stations. These findings highlight the value of information and the role of the media in reducing informational asymmetries in the political process.

Posted by Jens Hainmueller at 12:48 PM

February 23, 2007

Translating Statistics-Speak

I wish we all talked more about how scientific results are translated by the media. Fully understanding the assumptions and limitations of a study is challenging enough for those performing the research. In some ways, the journalists’ job is harder, finding lay language to summarize outcomes and implications without generalizing or ignoring uncertainty. I do not envy them the task.

Byron Calame, the public editor of the New York Times, recently discussed his paper's presentation of a study about marital status. On January 16, the front page read, "51% of Women are Now Living Without Spouse.” Calame’s response noted that in the study, “women” included females aged 15 and older; the Census set the lower bound at 15 to catch all married women. The original article did not call attention to the fact that teenagers living at home were counted as single women.

Apparently, when other journalists pointed out the misleading lack of clarity, some readers felt that they had been deceived. Is the “true” parameter just over 50% or just under? I would argue that the lower age bound set by the census is as reasonable as any. I also think that it doesn’t make much difference whether the percentage of women who are unmarried is a tiny bit over 50 or a tiny bit under (Sam Roberts, who wrote the original article, eventually made the same argument).

Regardless, Calame reports that an executive Times editor plans to spend more time discussing statistical results with colleagues who have expertise in the relevant fields. This seems like a great plan. I wonder how far this idea could be taken – how can researchers best work with journalists to successfully translate results?

A Crimson article published yesterday went so far as to refer to the “basic statistical measures—such as p-values or R-squared values,” or lack thereof, in a study conducted by Philip Morris. And when covering The New England Journal of Medicine’s discussion of stents for heart patients, The Times focused on the fact that some risks are “tough to assess.” This journalistic direction seems promising.

Posted by Cassandra Wolos at 2:01 PM

February 22, 2007

Cheating for Honest People

Let me follow up on yesterday’s post by Jim Greiner.

Jim’s problem: He’s touring the country touting tools for increased honesty in applied statistical research, only to be asked, effectively, for recommendations about using these tools to cheat more effectively. Yay academic job market.

Jim’s example goes like this: An analyst is asked to model the effect of a treatment, T, on the outcome, Y, while controlling for a bunch of confounders, X. To minimize the potential for data dredging we give the analyst only the treatment and the observed potential confounders to model the treatment assignment process, but we withhold the outcome data. Only after the analyst announces success in balancing the data (by including X, functions of X,f(X), deleting off-support observations etc), would we communicate the outcome data, plug the outcome in the equation, run it once, and be done.

So how can we help Jim help his audience cheat? Let’s make two assumptions (which I’d be willing to defend with my life). First, although the analyst is not given the actual outcome data, the analyst does know what the outcome is (wages, say). Second, the analyst is permitted to drop elements of X from the analysis, based on his or her analytic judgment.

Now let’s cheat. First, select the covariate, C, from the pool of potential confounders, X, believed to correlate most strongly with the outcome, Y. Second, treat C as the outcome and build a model through data dredging to maximize (or minimize, if this is your objective) the “effect” of T on C. Specifically, find the subset of functions of X, S(f(X)), that maximizes the effect of T on C while maintaining balance in S(f(X)). Third, upon receiving the outcome data, just plug them into the model but “forget” to mention that you didn’t include C in the treatment assignment model. If C really correlates strongly with Y then this procedure should lead to an upwardly biased estimate of T on Y.

I fear that this would work well in practice (though one could construct a counterexample). Seems to me, however, that it would be more technically demanding to cheat in this way than to cheat in, say, standard regression analysis.

Posted by Felix Elwert at 6:42 PM

February 20, 2007

Borat's Effect on Kazakhstan

If you’ve seen it or paid some attention to what’s going on in the popular media in the past six months, you will not have missed the movie ``Borat: Cultural Learnings of America for Make Benefit Glorious Nation of Kazakhstan’’ by Sacha Baron Cohen. The movie went from huge hype to packed movie theatres, and is due out on March 6 on DVD. Some described the movie as ``brilliant’’, for others it was 15 minutes of mediocre jokes drawn out into 82 minutes of film.

Whatever you may think, the government of Kazakhstan certainly took issue. They felt that their country was portrayed in a particularly unfair light, and started an image campaign with advertisements in the New York Times and other news media (see here for an article on that matter by the NYT). But what actually was the impact on Kazakhstan’s image of that movie? Fifteen minutes on Google Trends are suggestive (or frivolous, as Amy suggested).

Here is the timeline of events from Wikipedia: Borat was first screened at some film festivals from July 2006 onwards. It was officially released at the Toronto Film Festival on September 7, 2006 which started the hype. The movie opened in early November in the US, Canada and most European countries. It was number 1 at the US box office for two weeks and only left the top 10 in mid-December.

Here’s a graph of search terms and their associated search volume from Google Trends until November 2006 (you can get this live here and modify as you please). The blue line is the term ``borat movie’’; the red line is ``kazakhstan’’ and the orange line is ``uzbekistan’’ which will serve as (admittedly imperfect) control country. The news reference volume refers to the number of times each topic appeared in Google News.

borat_1.png

As you can see, searches for ``borat movie'' take off in September 2006 which coincides with the official release. It spikes in late October before the movie opens at the box office and goes down afterwards. The event B is the announcement of the movie as picked up by Google News. All as expected even if the blips before July are a little strange.

Interestingly the search volume for ``uzbekistan’’ follows that of ``kazakhstan’’ quite well before the movie appears in the spotlight in September. From September onwards the volume for ``kazakhstan’’ somewhat tracks the volume for the movie instead. If you were to look at monthly data you would see that the relationship is not as clear but there does seem to be a trend. So maybe the movie generated some interest in the country.

Here’s another chart for September 2006 (from here). The blue and red lines are as before, but now the orange line is for ``kazakstan’’. It turns out that you can write the name correctly with or without the ``h’’. Maybe people who spell it for the first time would use this version. This search term appears in the search volume just before the movie hits the theaters.

borat_2.png

Google Trends gives another hint. If you look at the cities of origin for the searches, you will notice a mix of US/European countries and cities in the second half of 2006. And ``kazakstan’’ is mostly searched by British users. In the first half of the year however almost all searches come from Almaty, the largest city in Kazakhstan.

Now, obviously nothing is causal and proven but it does look interesting. Not only did the search volume on Google shoot up around the time of the introduction of the movie, but also the geographic composition of the searches shifted to where the movie was very popular and the country not well known before Fall 2006.

What does all this mean for Kazakhstan? Is this good or bad publicity? It seems that people became interested in the country beyond the movie (see a USA Today story here). A poll of users of a UK travel website put Kazakhstan in the Top 3 places to visit (right after Italy and the UK if you believe the results), and the Lonely Planet already has an article on the real Kazakhstan ``beyond Borat''. We'll see if those people are really going in the end, and if the trend persists over time as Google supplies more information. But all in all the movie might have generated some useful publicity for the country. Estimating the impact on tourism and world opinion, anyone?

Posted by Sebastian Bauhoff at 1:24 AM

February 14, 2007

Data sharing and visualization

A friend of mine pointed me to this website, Many eyes. Basically any random person can upload any sort of dataset, visualize the dataset in any number of ways, and then make the results publically available so that anyone can see them.

The negative, of course, is much the same as with anything that "just anyone" can contribute to: there is a lot of useless stuff, and (if the source of the dataset is uncited) you don't know for sure how valid the dataset itself is. There may be a lot of positives, though: the volume of data alone is like a fantastic dream for many a social scientist; it's a great tool for getting "ordinary people" interested in doing their own research or analysis of their lives (for instance, I noticed some people graphing changes in their own sports performance over time); many of the interesting datasets have ongoing conversations about them; and only time will tell, but I imagine there is at least a chance this could end up being Wikipedia-like in its usefulness.

It may also serve as a template for data-sharing among scientists. Wouldn't it be nice if, every time you published, you had to make your dataset (or code) publically available? We might already be trending in that direction, but some centralized location for scientific data-sharing sure would speed it along.

Posted by Amy Perfors at 10:24 AM

February 13, 2007

Adventures in Identification I: Voting After the Bomb

Jens Hainmueller

I've decided to start a little series of entries under the header `Adventures in Identification.' The title is inspired by the increasing trend in the social sciences, in particular economics, public health, also political science, sociology, etc. to look for natural or quasi-experiments to identify causal effects in observational settings. Although there are of course plenty of bad examples of this type of study, I think the general line of research is very promising and the rising interest in issues of identification is commendable. Natural experiments often provide the only credible alternative to answer many of the questions we care about in the social sciences, where real experiments are often unethical or infeasible (or both) and observational data usually has selection bias written all over it. Enough said, let's jump right into the material: `Adventures in Identification I: Voting After the Bomb -- a Macabre Natural Experiments in electoral politics.

A recent question in political science and also economics is how terrorism effects democratic elections. Now clearly this seems a fairly tricky question to get some (identification) handle on. Heretic graduate students riding on their Rubin horses around IQSS will tell you two minutes into your talk that you can't just run a regression and call it `causal.' One setting where an answer may be (partly) possible is the case of the Spanish congressional elections in 2004. The incumbent conservative party led by Prime Minister Jose Maria Aznar had been favored to win by a comfortable margin according to opinion polls. On March 11, however, Islamic terrorists deposited nine backpacks full of explosive in several commuter trains in Madrid. The explosions killed 191 people and wounded 1,500. Three days later Spain's socialists under the lead of Jose-Luis Rodriguez Zapatero scored a stunning victory in the elections. Turnout was high and many have argued that voters seemingly expressed anger with the government, accusing it of provoking the Madrid attacks by supporting the U.S.-led war in Iraq, which most Spaniards opposed.

Now the question is how (if at all) the terrorist attacks affected the election result. As usual, only one potential outcome is observed and the crucial question is what the election results would have been like in the absence of the attacks. One could do a simple before and after study imputing this missing potential outcome based on some extrapolated pre-attacks trend in opinion polls. But then the question remains whether these opinion polls are an accurate representation of how people would have voted on election day. A difference-in-differences design seems better suited, but given that the attacks probably affected all voters a control group is hard to come by.

In a recent paper, Jose G. Montalvo, actually found a control group. Turns out that at the time the attacks hit, Spanish residents abroad had already cast their absentee ballots. Thus, they were not affected in their decision by the attacks. The author then sets up a diff-in-diffs exploiting voting trends in the treated group (Spanish residents) and the control group (Spanish citizens in a foreign country). He finds that the attacks had a large effect on the result to the benefit of the opposition party. Interestingly, this result seems to be different from the findings of other simple before and after studies on the topic (although I can't say because I have not read the other papers cited).

Of course, the usual disclaimers about DID estimates apply. Differential trends between the groups may exist if foreign residents perceived terrorism differently than Spanish residents over time. Foreign residents are probably very different than Spanish residents. But to the defense of the author, the results seem fairly robust given the checks he presents. And hey, it's a though question to ask and this provides a more appropriate way to get a handle on identifying the counterfactual outcome then simply comparing before and after.

Posted by Jens Hainmueller at 8:00 AM

February 9, 2007

Corruption in the Classroom

In the fall, I mentioned the debate over teaching kids to read using whole language versus phonics methods. The heavily funded Reading First program, part of No Child Left Behind, is intended to promote phonics and relies on research published by the National Reading Panel (which I don’t completely trust, but today that’s beside the point).

The latest is a report by psychologist Louisa Moats claiming that instead of changing their curricula to focus on phonics, reading programs are sprinkling key phonics catchphrases throughout their marketing materials and selling the same old whole language lessons. The press release for Moats’ report contrasted the situation with the F.D.A.’s oversight of drugs. The government authority approves the treatment; companies marketing the treatment rely on public trust in the authority. The difference is that education companies get away with much more than the drug companies ever could.

Reports like this highlight for me the differences in how natural and social science results become policy. I see that medical dishonesty can kill people while the effects of corruption in education are less direct. But how does it happen that New York City public schools spend anti-whole language funding on thinly disguised whole language curricula? What other social programs are subject to this kind of deceit?

Posted by Cassandra Wolos at 9:37 AM

February 7, 2007

Timing Is Everything

Jim Greiner

Per previous blog posts, I'm giving today's presentation at CGIS on causal inference and immutable characteristics. I've previewed some of the ideas from this research in blog posts. Basically, the idea is that if we shift our thinking from "actual" immutable characteristics (e.g., race), a concept I find poorly defined in some situations, to perceived immutable characteristics, then the potential outcomes framework of causation can sometimes be usefully applied to things like race, gender, and ethnicity.

A key point here is the timing of treatment assignment. If treatment is conceptualized in terms of perceptions, then a natural point at which to consider treatment applied is the moment the decision maker whose conduct is being studied first perceives a unit's race, gender, ethnicity, whatever. This works well only if we're willing to exonerate the decision maker from responsibility for whatever happened before that moment of first perception. In the law, sometimes we're willing to do so. Sometimes, we're not.

Take the employment discrimination context. Typically, we don't hold an employer responsible for the discrimination of someone else, particular when it occurred (say) prior to a job application, even if that prior discrimination means that some groups (e.g., minorities) have less attractive covariates (e.g., educational achievement levels) than others (e.g., whites). Perhaps potential outcomes could work here; a study of the employer's hiring can safely condition on educational achievement levels (i.e., take them as given, balance on them, etc.) and other covariates. More covariates means that the ignorability assumption required for most causal inference is more plausible.

Contrast the employment discrimination setting to certain standards applying to education institutions. For example, we may not want to allow a university to justify allocating fewer resources to female sports teams on the grounds that its female students show less interest in sports (even if we believed the university to be telling the truth). Here, we might consider that the preferences of the female students were probably shaped by prior stereotyping, and we might want to force the university to take steps to combat those stereotypes and change the female students' preferences. If so, we are unwilling to take the previous social pressure as "given," so we cannot balance on it. The result is fewer covariates and greater pressure on the ignorability assumption.

My thanks to Professor Roderick Hills of NYU law school, whose insightful question during a job talk I recently gave there helped solidify the above Title IX example.

Posted by James Greiner at 4:00 PM

February 6, 2007

Ask why...why, why, why

askwhy1.jpeg

Posted by Jens Hainmueller at 10:11 PM

Presentation, Presentation (at conferences, that is)

An article by Jane Miller in the current issue of Health Services Research explains strategies for preparing conference posters. As she writes, posters are a "hybrid of a published paper and an oral presentation" and people often fail to recognize this in preparing a poster. The article reviews existing literature on research communication and provides some guidelines on how to present statistical methods and results appropriately. It's all common sense stuff, might come in handy for first-time presenters looking for guidance.

It also goes nicely with Gary's "Publication, Publication" guide for writing research papers which you can find here.

Jane E. Miller (2007) "Preparing and Presenting Effective Research Posters" Health Services Research 42(1p1): 311–328. doi:10.1111/j.1475-6773.2006.00588.x

Posted by Sebastian Bauhoff at 3:10 PM

February 1, 2007

A Rash Of Senicide?

There have been an awful lot of stories lately about the world's oldest person dying; in fact, it seems to have happened about three times in the last month or so. Then again, being the world's oldest person is a dubious honour to be sure, since the winner isn't likely to hold the title for very long and likely isn't even aware of their status. (Full disclosure: my great-grandmother was a centenarian but likely never knew my name.)

These stories have been bouncing in my mind lately and I'm trying to figure out why. I can think of a few scientifically relevant explanations:

1) The life expectancy of a centenarian is on the order of a year, and three successive deaths in a month is a rare event; conditioned on the first one, assuming independence and exponential life span (a reasonable assumption for the tail end), the probability of the next two events coming within a month is roughly 0.0033. And this happened to be the month for it.

2) The events aren't at all rare, and the centenarian death rate is actually dramatically higher, but it's a slow news month, and the stories themselves are floating to the top of the pile.

3) Online news services like Reuters and CNN have dedicated spaces for more `entertaining' and `bizarre' news stories, meaning that no matter how much news there is, people are seeing these stories.

4) Guinness sales are down, despite the "brilliant!" advertising campaign, and the World Record people are seeking out these changing events for the sake of their own discreet advertising.

5) I read this in The Onion and the satire hit me point blank, meaning I'm selecting and remembering the stories more often when they appear.

I'm thinking it's Number 5, but I'd be curious to know if anyone knew the mean centenarian death rate and whether this was a rare occurrence or not.

Posted by Andrew C. Thomas at 9:56 AM

January 31, 2007

Making bad choices, again

Amy Perfors

Most of us are aware of various distortions in reasoning that people are vulnerable to, mainly because of heuristics we use to make decisions easier. I recently came across an article in Psychological Science called Choosing an inferior alternative that demonstrates a technique that will cause people to choose an alternative that they themselves have previously acknowledged to be personally inferior. This is interesting for two reasons: first of all, exactly how and why it works tells us something about the process by which our brains update (at least some sorts of) information; and second, because I anticipate commercials and politicians and master manipulaters to start using these techniques any day now, and maybe if we know about it in advance we'll be more resistant. One can hope, anyway.

So what's the idea?

It's been known for a while that decision makers tend to slightly bias their evaluations of new data to support whatever alternative is currently leading. For instance, if I'm trying to choose between alternatives A, B, and C -- let's say they are restaurants and I'm trying to decide where to go eat -- when I learn about one attribute, say price, I'll tentatively rank them and decide that (for now) A is the best option. If I then learn about another attribute, say variety, I'll rerank them, but not in the same way I would have if I'd seen those two attributes at the same time: I'll actually bias it somewhat so that the second attribute favors A more than it otherwise would have. This effect is generally only slight, so if restaurant B is much better on variety and only slightly worse on price, I'll still end up choosing restaurant B: but if A and B were objectively about equal, or B was even slightly better, then I might choose A anyway.

Well, you can see where this is going. These researchers presented subjects with a set of restaurants and attributes to determined their objective "favorite." Then, two weeks later, they brought the same subjects in again and presented them with the same restaurants. This time, though, they had determined -- individually, for each subject -- the proper order of attributes that would most favor choosing the inferior alternative. (It gets a little more complicated than this, because in order to try to ensure that the subjects didn't recognize their choice from before, they combined nine attributes into six, but that's the essential idea). Basically what they did is picked the attribute that most favored the inferior choice and put it first, hoping to establish that the inferior choice would get installed as the leader. The attribute that second-most favored the inferior choice was last, to take advantage of recency effects. The other attributes were presented in pairs, specifically chosen so that the ones that most favored the superior alternative were paired with neutral or less-favorable ones (thus hopefully "drowning them out.")

The results were that when presented with the information in this order, 61% of people chose the inferior alternative. The good news, I guess, is that it wasn't more than 61% -- some people were not fooled -- but it was robustly different than chance, and definitely more than you'd expect (since, after all, it was the inferior alternative, and one would hope you'd choose that less often). Moreover, people didn't realize they were doing this at all: they were more confident in their choice when they had picked the inferior alternative. Even when told about this effect and asked if they thought they themselves had done it, they tended not to think so (and the participants who did it most were no more likely to think they had done it than the ones who didn't).

I always get kind of depressed at this sort of result, mainly because I become convinced that this sort of knowledge is then used by unscrupulous people to manipulate others. I mean, it's probably always been used somewhat subconsciously that way, but making it explicit makes it potentially more powerful. On the plus side, it really does imply interesting things for how we process and update information -- and raises the question of why we bias the leading alternative, given that it's demonstrably vulnerable to order effects. Just to make ourselves feel better about our current choice? But why would this biasing do that - wouldn't we feel best of all if we knew we were being utterly rational the whole time? It's a puzzle.

Posted by Amy Perfors at 10:29 AM

January 30, 2007

The Role of Sample Size and Unobserved Heterogeneity in Causal Inference

Jens Hainmueller

Here is a question for you: Imagine you are asked to conduct an observational study to estimate the effect of wearing a helmet on the risk of death in motorcycle crashes. You have to choose one of two different data-sets for this study: Either a large, rather heterogeneous sample of crashes (these happened on different roads, at different speeds, etc.) or a smaller, more homogeneous sample of crashes (let's say they all occurred on the same road). Your goal is to unearth a trustworthy estimate of the treatment effect that is as close as possible to the `truth', i.e. the effect estimate obtained from an (unethical) experimental study on the same subject. Which sample do you prefer?

Naturally, most people tend to choose the large sample. Larger sample, smaller standard error, less uncertainty, better inference…we’ve heard it all before. Interestingly, in a recent paper entitled "Heterogeneity and Causality: Unit Heterogeneity and Design Sensitivity in Observational Studies" Paul Rosenbaum comes to the opposite conclusion. He demonstrates that heterogeneity, and not sample size matters for the sensitivity of your inference to hidden bias (a topic we blogged about previously here and here). He concludes that:

“In observational studies, reducing heterogeneity reduces both sampling variability and sensitivity to unobserved bias—with less heterogeneity, larger biases would need to be present to explain away the same effect. In contrast, increasing the sample size reduces sampling variability, which is, of course useful, but it does little to reduce concerns about unobserved bias.”

This basic insight about the role of unit heterogeneity in causal inference goes back to John Stuart Mill’s 1864 System of Logic. In this regard, Rosenbaum’s paper is a nice comparison to Jas’s view on Mill’s methods. Of course, Sir Fisher dismissed Mill for his plea for unit homogeneity because in experiments, when you have randomization working for you, hidden bias is not a real concern so you may as well go for the larger sample.

Now you may say: well it all depends on the estimand, no? Do I care about the effect of helmets in the US as a whole or only on a single road? This point is well taken, but keep in mind that for causal inference from observational data we often care about internal validity first and not necessarily generalizability (most experiments are also done on highly selective groups). In any case, Rosenbaum’s basic intuition remains and has real implications for the way we gather data and judge inferences. Next time you complain about a small sample size, you may want to think about heterogeneity first.

So finally back to the helmet example. Rosenbaum cites an observational study that deals with the heterogeneity issue in a clever way: “Different crashes occur on different motorcycles, at different speeds, with different forces, on highways or country roads, in dense or light traffic, encountering deer or Hummers. One would like to compare two people, one with a helmet, the other without, on the same type of motorcycle, riding at the same speed, on the same road, in the same traffic, crashing into the same object. Is this possible? It is when two people ride the same motorcycle, a driver and a passenger, one helmeted, the other not. Using data from the Fatality Analysis Reporting System, Norvell and Cummings (2002) performed such a matched pair analysis using a conditional model with numerous pair parameters, estimating approximately a 40% reduction in risk associated with helmet use.”

Posted by Jens Hainmueller at 8:30 AM

January 26, 2007

Statistical porridge and other influences on the American public

In this past Sunday’s New York Times Book Review, Scott Stossel covers a book by Sarah E. Igo, a professor in the history department at the University of Pennsylvania. The Averaged American – which I haven’t read but plan to pick up soon – discusses how the development of statistical measurement after World War I impacted not only social science, but also, well, the average American. According to the review, Igo argues that statistical groundbreakers like the Gallup poll and the Kinsey reports created a societal self-awareness that hadn’t existed before.

What struck me, though, was the reviewer’s closing comment. Stossel writes, “Even as we have moved toward ever-finer calibrations of statistical measurement, the knowledge that social science can produce is, in the end, limited. Is the statistical average rendered by pollsters the distillation of America? Or its grinding down into porridge? For all of the hunger Americans have always expressed for cold, hard, data about who we are, literary ways of knowing may be profounder than statistical ones.”

Keep in mind that these words come from a literary person immersed in the literary world (specifically, Stossel is the managing editor of The Atlantic Monthly ) and should be understood in context. However, I hope that Stossel and the average American see the value of cold, hard, data handled well. I also think that we as social scientists and statisticians should accept his challenge to keep the porridge limited, the ideas unlimited, and our impact on the national consciousness profound! And maybe we should be a little offended, too.

Posted by Cassandra Wolos at 9:30 AM

January 24, 2007

The Goal of Causal Inference

Jim Greiner

I’ll be giving the talk at the Gov 3009 seminar in early February, and I’ll be presenting a paper I’m writing with Don Rubin on applying the potential outcomes framework of causation to what lawyers call “immutable characteristics” (race, gender, and national origin, for example). I’ll be previewing some of the idea from this paper on the blog.

One key point from this paper is the recognition that in law (specifically, in an anti-discrimination setting), the goal of causal inference may be different from that in a more traditional social science setting. A sociologist, for example, might study the effect of tax breaks for married couples on marriage rates; the obvious goal of the study is to see whether a contemplated intervention (tax breaks) has a desired effect. An economist might evaluate a job training program for a similar reason. In anti-discrimination law, however, we study the effect of units’ perceived races (or genders or whatever) on some outcome (e.g., hiring or promotion), but we have no interest in intervening to change these perceptions. Rather, we’re contemplating action that would mitigate the effects we find. The “intervention” we’re considering might be compensating the victim of discrimination, as is true in an employment discrimination suit. Or it might be ceasing a certain type of government action, such as the death penalty. But we’re not interesting in implementing a policy promoting or effectuating the treatment that we’re studying.

Posted by James Greiner at 1:14 PM

January 16, 2007

Applied Statistics Workshop

The Applied Statistics Workshop will resume for the spring semester on January 31, 2007. We will continue to meet in the CGIS Knafel Building, Room N354 on the third floor at noon on Wednesdays. The Workshop has a new website that has the tentative schedule posted for the semester. We will be moving the archives of papers from the previous semesters to the new site in the coming weeks, so you can track down your favorite talks from years past. As a preview of what's to come, here are the names and affiliations of some of the speakers presenting in the next month:

January 31st
Holger Lutz Kern
Department of Government
Harvard University

February 7th
Jim Greiner
Department of Statistics
Harvard University

February 14th
Alberto Abadie, Alexis Diamond, and Jens Hainmueller
Kennedy School of Government and Department of Government
Harvard University

February 21st
Dan Hopkins
Department of Government
Harvard University

Posted by Mike Kellermann at 5:51 PM

January 9, 2007

Visualization Guide

Courtesy of Aleks at Columbia, who brought this to my attention:


A very interesting collection of visualizations for projects, proposals and presentations. The periodic table arrangement itself is not at all useful, but the depth and organization sure is.

Posted by Andrew C. Thomas at 2:36 PM

December 13, 2006

Applied Statistics – Harrington

This week the Applied Statistics Workshop will present a talk by David Harrington, Professor of Biostatistics at Harvard’s School of Public Health, and in the Department of Biostatistical Science at the Dana Farber Cancer Institute.

Professor Harrington received his Ph.D. from the University of Maryland and taught at the University of Virginia before coming to Harvard. He has served as Principal Investigator on numerous NIH and NSF grants researching topics including Nonparametric Tests for Censored Cancer Data, and Statistical Problems for Markov Branching Processes. His research has appeared in Journal of the American Statistical Association, Biostatistics, Genetic Epidemiology, Journal of Clinical Oncology, and Biometrics among many others.

Professor Harrington is involved in two different lines of research. The first is research in statistical methods for clinical trials and prospective cohort studies in which the time to an event is a primary outcome. He has worked in efficient nonparametric tests and regression methods for right-censored data, sequential designs for clinical trials, and nonparametric methods for estimating nonlinear covariate effects on survival. Recently, he and co-workers in the Department of Biostatistics have been studying methods for analyzing survival data when some covariates have missing observations. Missing data are common in both prospective and retrospective cohort studies, and simply ignoring cases with missing observations can lead to substantial biases in inference.

Dr. Harrington 's second line of research, on which he will be presenting, is collaborative research in cancer. He is the principal investigator of the Statistical Coordinating Center for the Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium. This NCI-funded study is a network of sites around the country that are conducting a population-based study of access to and outcomes from cancer care, with special focus on ethnic subgroups and subgroups defined by age.

Professor Harrington will present a talk entitled "Statistical Issues in the Cancer Care Outcomes Research and Surveillance Consortium (CarCORS)." The presentation will be at noon on Wednesday, December 13 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 9:23 AM

December 12, 2006

Better Way To Make Cumulative Comparisons With Small Samples?

On July 15, 1971 the research vessel Lev Berg set sail from Aralsk (Kazakhstan) to survey the Aral Sea, then the 4th largest freshwater lake in the world. The Soviet Union had been steadily draining the Aral for agricultural purposes since the 1950s and the Lev Berg was to measure the ecological damage. This trip included passing by the island Vozrozhdeniye on the South side.

Lev Berg Image
(Image Source: "The 1971 Smallpox Epidemic in Aralsk, Kazakhstan, and the Soviet Biological Warfare Program." Center for Nonproliferation Studies Occasional Paper No. 9, Jonathan B. Tucker and Raymand A. Zilinskas.)

Vozrozhdeniye was an ideal site for the main Soviet bioweapons field testing because itwas in a remote area, easily secured as an island, and had reliable winds from the Northto the South allowing ``safe'' testing and housing on the North end. The site was active from 1936 until 1990 when Yeltsin publicly denounced the program and
had it shut down. This is despite the Soviet Union having signed the 1972 Biological and Toxin Weapons Convention outlawing such research. Shortly after the Lev Berg returned to Aralsk, there was an unusual outbreak of smallpox there, starting with a young researcher who had been onboard. The following is the best
epidemiological data available:


Table Image
Comparison Case: in 1972 a Muslim man from Kosovo went on a pilgrimage to Mecca, returning through Baghdad where he was infected with smallpox. This was the first reported smallpox case in Kosovo since 1930 and it apparently went undiagnosed for six weeks producing 175 cases and 35 deaths. A good comparison since rates of vaccination were similar as were socio-economic conditions.

Kaplan-Meier graph with time-to-event = onset of illness:

Kaplan-Meier Image(Image Source: Ibid.)

Key difference: all three Aralsk deaths were from hemorrhagic smallpox and only five in Kosovo were. The baseline for naturally occurring smallpox: Rao's study in Madras, India had 10,857 cases with only 240 hemorrhagic. Only two possible explanations seem to remain for the differences:
- host conditions (nutrition, genetic resistance, environment) differ greatly.
- Aralsk strain was an unusual type.
Obviously, it would be nice to claim strong evidence that the Soviet case resulted from escaped smallpox. We know the extent of the bioweapons program from Yeltsin's opening of the files, but not the responsibility of this dissemination with 100% certainty.

This is just a motivating (and interesting) example; the real question is about testing really small samples, when exact inference doesn't seem appropriate. So what other approaches would readers suggest for making comparisons with these types of cumulative data besides simple Kaplan-Meier comparisons? Obviously typical
correlational analysis won't work (polychoric, multichoric, etc.) and standard tabular approaches are not going to be effective either.

Posted by Jeff Gill at 2:48 PM

December 7, 2006

NIPS highlights

Amy Perfors

I've just spent this week at the annual NIPS conference; though its main focus seems to be machine learning, there are always interesting papers on the intersection of computational/mathematical methods in cognitive science and neuroscience. I thought it might be interesting to mention the highlights of the conference for me - which obviously tends to focus heavily on the cognitive science end of things. (Be aware that links (pdf) are to the paper pre-proceedings, not final versions, which haven't been released yet).

From Daniel Navarro and Tom Griffiths, we have A Nonparametric Bayesian Method for Inferring Features from Similarity Judgments. The problem, in a nutshell, is that if you're given a set of similarity ratings about a group of objects, you'd like to be able to infer the features of the objects from that. Additive clustering assumes that similarity is well-approximated by a weighted linear combination of common features. However, the actual inference problem -- actually finding the features -- has always been difficult. This paper presents a method for inferring the features (as well as figuring out how many features their are) that handles the empirical data well, and might even be useful for figuring out what sorts of information (i.e., what sorts of features) we humans represent and use.

From Mozer et. al. comes Context Effects in Category Learning: An Investigation of Four Probabilistic Models. Some interesting phenomena in human categorization are the so-called push and pull effects: when shown an example from a target category, the prototype gets "pulled" closer to that example, and the prototypes of other related categories get pushed away. It's proven difficult to explain this computationally, and this paper considers four obvious candidate models. The best one uses a distributed representation and a maximum likelihood learning rule (and thus tries to find the prototypes that maximize the probability of being able to identify the category given the example); it's interesting to speculate about what this might imply about humans. The main shortcoming of this paper, to my mind, is that they use very idealized categories; but it's probably a necessary simplification to begin with, and future work can extend it to categories with a richer representation.

The next is work from my own lab (though not me): Kemp et. al. present an account of Combining causal and similarity-based reasoning. The central point is that people have developed accounts of reasoning about causal relationships between properties (say, having wings causes one to be able to fly) and accounts of reasoning about objects on the basis of similarity (say, if a monkey has some gene, an ape is more likely to have it than a duck is). But many real-world inferences rely on both: if a duck has gene X, and gene X causes enzyme Y to be expressed, it is likely that a goose has enzyme Y. This paper presents a model that intelligently combines causal- and similarity-based reasoning, and is thus able to predict human judgments more accurately than either of them alone.

Roger Levy and T. Florian Jaeger have a paper called Speakers optimize information density through syntactic reduction. They explore the (intuitively sensible, but hard to study) idea that people -- if they are rational -- should try to communicate in the information-theoretically optimal way: they should try to give more information at highly ambiguous points in a sentence, but not bother doing so at less ambiguous points (since adding information has the undesirable side-effect of making utterances longer). They examine the use of reduced relative clauses (saying, e.g., "How big is the family you cook for" rather than "How big is the family THAT you look for" - the word "that" is extra information which reduces the ambiguity of the subsequent word "you"). The finding is that speakers choose to reduce the relative clause -- to say the first type of sentence -- when the subsequent word is relatively unambiguous; in other words, their choices are correlated with information density. One of the reasons this is interesting to me is because it motivates the question of why exactly speakers do this: is it a conscious adaptation to try to make things easier for the listener, or a more automatic/unconscious strategy of some sort?

There are a number of other papers that I found interesting -- Chemudugunta et. al. on Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model; Roy et. al. on Learning Annotated Hierarchies from Relational Data, and Greedy Layer-wise Training of Deep Networks by Bengio et. al., to name a few -- so if this sort of thing interests you, I suggest checking out the NIPS proceedings when they come out. And if any of you went to NIPS also, I'd be curious what you really liked and think I should have included on this list!

Posted by Amy Perfors at 4:07 PM

December 6, 2006

Applied Statistics - Imbens and Ridder

This week the Applied Statistics Workshop will present a talk by Guido Imbens, Professor of Economics at Harvard University, and Geert Ridder, Professor of Economics at the University of Southern California.

Professor Imbens has recently rejoined the Department of Economics at Harvard and is one of the faculty sponsors of the Applied Statistics Workshop, so we are delighted that he will be speaking at the Workshop. He received his Ph.D. from Brown University and served on the faculties of Harvard, UCLA, and Berkeley before returning to Harvard. He has published widely, with a particular focus on questions relating to causal inference. Professor Imbens has been the recipient of numerous National Science Foundation grants and teaching awards. His work has appeared in Econometrica, Journal of Econometrics, Journal of the Royal Statistical Society, and Biostatistics among many others.

Geert Ridder is Professor of Economics at the University of Southern California. Before coming to the United States he was Professor of Econometrics at the Rijksuniversiteit Groningen and the Vrije Universiteit in Amsterdam in The Netherlands. In the United States he was Professor of Economics at the Johns Hopkins University and visiting professor at Cornell University, the University of Iowa, and Brown University. He received his Ph.D. from the University of Amsterdam. Professor Ridder’s research area is econometrics, in particular microeconometrics, and its applications in labor economics, public finance, economic development, economic demography, transportation research, and the economics of sports. His methodological interests are the (nonparametric) identification of statistical and economic structures from observed distributions (mainly in duration data and discrete choice data), models and estimation methods for duration data and panel data, (selectively) missing data, causal inference, and errors-in-variables. His work has appeared in Econometric, Economics of Education Review, Journal of the European Economic Association, and Journal of Econometrics among others.

Professors Imbens and Ridder will present a talk entitled "Complementarity and Aggregate Implications of Assortative Matching: A Nonparametric Analysis." The presentation will be at noon on Wednesday, December 6, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 10:34 AM

December 5, 2006

Causality in the Social Sciences Anybody?

Funny how there is no section on causal inference in the social sciences here? It says that to meet Wikipedia's quality standards, this article may require cleanup. Hopefully, somebody will find the time to contribute a social science section. Why not you? My guess is that readers of this blog know plenty about this topic...and the current entry is lacking a lot of what statistics has to say about causality.

Posted by Jens Hainmueller at 10:00 AM

November 29, 2006

Applied Statistics - Alan Zaslavsky

This week the Applied Statistics Workshop will present a talk by Alan Zaslavsky, Professor of Health Care Policy (Statistics) in the Department of Health Care Policy at Harvard Medical School. Dr. Zaslavsky's statistical research interests include surveys, census methodology, small area estimation, official statistics, missing data, hierarchical modeling, and Bayesian methodology. His research topics in health care policy center on measurement of the quality of care provided by health plans through consumer assessments and clinical and administrative data. Among his current major projects are (1) the Consumer Assessments of Healthcare Providers and Systems (CAHPS) survey implementation for the Medicare system, (2) methodology for surveys in psychiatric epidemiology, centered on validation of the CIDI-A (adolescent) survey in the National Comorbidity Study-Adolescent, and (3) studies on determinants of quality of care for cancer, including both the Statistical Coordinating Center and a research site for the NCI-funded CanCORS (Cancer Consortium for Outcomes Research and Surveillance) study. Other research interests include measurement of disparities in health care, and privacy and confidentiality for health care data.

He is a member of the Committee on National Statistics (CNSTAT) of the National Academy of Sciences and has served on CNSTAT panels on census methodology, small area estimation and race/ethnicity measurement, as well as the Committee on the National Quality Report on Health Care Delivery of the Institute of Medicine.

Dr. Zaslavsky received his A.B. degree at Harvard College, his M.S. at Northeastern University, and his Ph.D. at the Massachusetts Institute of Technology. He is a Fellow of the American Statistical Association.

Professor Zaslavsky will present a talk entitled "Modeling the covariance structure of random coefficients to characterize the quality variation in health plans." The presentation will be at noon on Wednesday, November 29th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 7:59 AM

November 22, 2006

Business Information and Social Science Statistics, Part II

I mentioned in this earlier blog entry an interview I did with DM Review. Here's the sequel.

Posted by Gary King at 2:25 PM

November 21, 2006

Back to the Drawing Board?

thumb.jpg

Have you ever been to a social science talk and heard somebody saying things like "i guess I will have to go back to the drawing board…" I always wondered what that really meant, until an engineering friend of mine suggested taking a look at this.

Maybe we can get one for the IQSS?

Posted by Jens Hainmueller at 11:34 AM

November 17, 2006

Bayesian brains?

Amy Perfors

Andrew Gelman has link to a study that just came out in Nature Neuroscience whose author, Alex Pouget at the University of Rochester, suggests that "the cortex appears wired at its foundation to run Bayesian computations as efficiently as can be possible." I haven't read the paper yet, so I don't have much in the way of intelligent commentary, but I'll try to take a look at it soon. In the meantime, here is a link to the press release so you can read something about it even if you don't have access to Nature Neuroscience. From the blurb, it sounds pretty neat, especially if you (like me) are at all interested in the psychological plausibility of Bayesian models as applied to human cognition.

Posted by Amy Perfors at 11:40 AM

The "Imperial Grip" of Instrumental Variables

The Economist is agog over the increasing prominence of instrumental variables in econometrics ("Winds of Change", November 4, 2006). While it is always nice to get some square inches in a publication with a circulation greater than a few thousand, I'm afraid that I tend to sympathize more with the "instrument police" than the "instrumentalists."

For a variable to be a valid instrument, it must be (a) correlated with the variable for which we are trying to estimate a causal effect, and (b) only affect the outcome through the proposed causal variable, such that an exclusion restriction is satisfied. This is true for every estimation in which a proposed instrument is used; one must make a separate case for the validity of the exclusion restriction with respect to each analysis. Leaving aside what should be the second-order problem of actually carrying out an IV analysis, which may be a first-order problem in practice ("what do you mean it has no mean?"), our inability to verify the exclusion restriction in the case of naturally occuring instruments forces us to move from the substance of the problem we are trying to investigate to a duel of "just-so stories" for or against the restriction, a debate that typically cannot be resolved by looking at the empirical evidence.

Consider the two papers desribed in the Economist article. The first attempts to estimate the effect of colonialism on current economic outcomes. The authors propose wind speed and direction as an instrument for colonization, arguing (plausibly) that Europeans were more likely to colonize an island if they were more likely to encounter it while sailing. So far so good. Then they argue that, while colonization in the past has an effect on economic outcomes in the present, being situated in a location favorable for sailing in the past (i.e., before steam-powered ships) does not. Is this really plausible? The authors think so, I don't, and it isn't obvious that there is a way to resolve the matter. In the second example, the failure of ruling dynasties to produce an heir in Indian princely states is used as an instrument for the imposition of direct rule by the British. Here the exclusion restriction may be more plausible (or - shameless plug - maybe not, if it is the shift from a hereditary to a non-hereditary regime rather than colonialism per se that affects outcomes). One way or the other, is this really what we should be arguing about?

None of this is to say that instrumental variable models can never be useful. When we can be more confident that the exclusion restriction is satisfied (usually because we designed the instrument ourselves), then IV approaches make a lot of sense. Unfortunately (or fortunately), we can't go back and randomly assign island discoveries using something like a coin flip rather than the trade winds. Despite this, nothing seems to slow down the pursuit of more and more tortured instruments. The observation that "the instrumental variable now enjoys an almost imperial grip on the imagination of economists" carries more irony that was perhaps intended.

Posted by Mike Kellermann at 11:03 AM

November 16, 2006

How to present math in talks

Since writing my last post (The cognitive style of better powerpoint), I noticed that two other bloggers wrote rather recently on the same topic. The first, from Dave Munger at Cognitive Daily, actually proposes a bit of an experiment to compare the efficacy of text vs. powerpoint - results to be posted Friday. The second, from Chad Orzel at Uncertain Principles, offers a list of "rules of thumb" for doing a good PowerPoint talk.

Given all this, you'd think I wouldn't have anything to add, right? Well, never underestimate my willingness to blather on and on about something. I actually think there's one thing neither they nor I discuss much, and that is presenting mathematical, technical, or statistical information. Both Orzel and I recommend, as much as possible, avoiding equations and math in your slides. And that's all well and good, but sometimes you just have to include some (especially if you're a math teacher and the talk in question is a lecture). For me, this issue crops up whenever I need to describe a computational model -- you need to give enough detail that it doesn't look like the results just come out of thin air, because if you don't, nobody will care about what you've done. And often "enough detail" means equations.

So, for whatever it's worth, here are my suggestions for how to present math in the most painless and effective way possible:

Abandon slideware. This isn't always feasible (for instance, if the conference doesn't have blackboards), nor even necessarily a good idea if the equation count is low enough and the "pretty picture" count is high enough, but I think slideware is sometimes overused, especially if you're a teacher. When you do the work on the blackboard, the students do it with you; when you do it on slideware, they watch. It is almost impossible to be engaged (or keep up) when rows of equations appear on slides; when the teacher works out the math on the spot, it is hard not to. (Okay, harder).

If you can't abandon slideware:

1. Include an intuitive explanation of what the equation means. (This is a good test to make sure you understand it yourself!). Obviously you should always do this verbally, but I find it very useful to write that part in text on the slide also. It's helpful for people to refer to as they try to match it with the equation and puzzle out how it works and what it means -- or, for the people who aren't very math-literate, to still get the gist of the talk without understanding the equation at all.

2. Decompose the equation into its parts. This is really, really useful. One effective way to do this is to present the entire thing at once, and then go through each term piece-by-piece, visually "minimizing" the others as you do so (either grey them out or make them smaller). As a trivial example, consider the equation z = x/y. You might first grey out (y) and talk about x. Then talk about y and grey out x: you might note things like that y is the denominator, you can see that when y gets larger our result gets smaller, etc. My example is totally lame, but this sort of thing can be tremendously useful when you get equations that are more complicated. People obviously know what numerators and denominators are, but it's still valuable to explicitly point out in a talk how the behavior of your equation depends on its component parts -- people could probably figure it out given enough time, but they don't have that time, particularly when it's all presented in the context of loads of other new information. And if the equation is important enough to put up, it's important to make sure people understand all of its parts.

3. As Orzel mentioned, define your terms. When you go through the parts of the equation you should verbally do this anyway, but a little "cheat sheet" there on the slide is invaluable. I find it quite helpful sometimes to have a line next to the equation that translates the equation into pseudo-English by replacing the math with the terms. Using my silly example, that would be something like "understanding (z) = clarity of images (x) / number of equations (y)". This can't always be done without cluttering things too much, but when you can, it's great.

4. Show some graphs exploring the behavior of your equation. ("Notice that when you hold x steady, increasing y results in smaller z"). This may not be necessary if the equation is simple enough, but if it's simple enough maybe you shouldn't present it, and just mention it verbally or in English. If what you're presenting is an algorithm, try to display pictorially what it looks like to implement the algorithm. Also, step through it on a very simple dataset. People remember and understand pictures far better than equations most of the time.

5. When referring back to your equation later, speak English. By this I mean that if you have a variable y whose rough English meaning is "number of equations", whenever you talk about it later, refer to it as "number of equations", not y. Half of the people won't remember what y is after you move on, and you'll lose them. If you feel you must use the variable name, at least try to periodically give reminders about what it stands for.

6. Use LaTeX where possible. LaTeX's software creates equations that are clean and easy to read, unlike PowerPoint (even with lots of tweaking). You don't necessarily have to do the entire talk in LaTeX if you don't want to, but at least make the equations in LaTeX, screen capture them and save them as bitmaps, and paste them into PowerPoint. It is much, much easier to read.

Obviously, these points become more or less important depending on the mathematical sophistication of your audience, but I think it's far far easier to make mathematical talks too difficult rather than too simple. This is because it's not a matter (or not mainly a matter) of sophistication -- some of the most egregious violaters of these suggestions that I've seen have been at NIPS, a machine learning conference -- it's a matter of how much information your audience can process in a short amount of time. No matter how mathematically capable your listeners are, it takes a while (and a fair amount of concentration) to see the ramifications and implications of an equation or algorithm while simultaneously fitting it in with the rest of your talk, keeping track of your overall point, and thinking of how all of this fits in with their research. The easier you can make that process, the more successful the talk will be.

Any agreements, disagreements, or further suggestions, I'm all ears.

Posted by Amy Perfors at 11:24 AM

November 15, 2006

Gender as a Personal Choice

Jim Greiner

Greetings from the job market for legal academics, which combines the worst aspects of the job markets of all other fields. Apologies for being slow to bring this up, but an article in last week’s New York Times (Tuesday, November 7, 2006, page A1, by Damien Cave) is worth a look. The subject area is recording gender in New York City records. The City’s Board of Health is considering a proposal to allow persons born in the City to change the sex as documented on their birth certificates upon providing certain documentation (e.g., affidavits from doctors and mental health professionals) asserting that the proposed gender change would be permanent. Previously, the City required more physical manifestations of a sex change before it would change its records.

Question: are we moving toward a world in which sex, like race, becomes a personal choice, at least as recorded in official records? Note that in the race context, the law can’t seem to make up its mind on this. The Census Bureau records self-reports only, and many modern social scientists consider race a social construct only, with no relevant biological component. But some existing statutes still define race in terms of biology (e.g., 18 U.S.C. § 1093(6)).

Second question: suppose we are moving toward such a world; what will it do to our efforts to enforce anti-discrimination laws?

Posted by James Greiner at 1:51 PM

November 14, 2006

Meta-analysis, Part II

Last time I wrote about the popularity of meta-analysis for synthesizing the results of multiple studies and cited education researcher Derek Briggs, who believes that the method is used too often and sometimes incorrectly.

Recently, I informally re-examined the data from a published meta-analysis on reading instruction methods, running four different Bayesian models on the set of effect sizes given in the paper. All of the hierarchical Bayesian models (which varied only in the priors used and covariates included) showed that a significant amount of uncertainty was ignored by the original meta-analysis, which assumed that the effect size produced by each study was an estimate of one overall true mean. The preliminary results from my analysis supported Briggs' position, since they did not show the significant results that were evident in the meta-analysis paper; in other words, none of the Bayesian analyses came close to indicating a significant effect for the reading instruction method in question. I claim no reliable conclusion for my own analysis – I’m even not going to specify the original paper here – but re-examining the methods of meta-analyses seems worthwhile for the purpose of uncovering uncertainty, if not developing new techniques for synthesizing multiple studies.

The implications are nontrivial: the evidence supporting the teaching methods required by the billion dollar Reading First initiative, part of the Department of Education’s No Child Left Behind Act, is a long collection of meta-analyses performed by the National Reading Panel.

Posted by Cassandra Wolos at 12:43 PM

November 13, 2006

Applied Statistics –Joshua Angrist

This week the Applied Statistics Workshop will present a talk by Joshua Angrist, Professor of Economics at the Massachusetts Institute of Technology.

Professor Angrist received his Ph.D. in Economics at Princeton University. After which he joined the Economics Departments at Harvard University and Hebrew University before coming to MIT. He is a Fellow of the American Academy of Arts and Sciences, The Econometric Society, and has served as Co-editor of the Journal of Labor Economics. His publications have appeared in Econometrica, The American Economic Review, The Economic Journal, and The Quarterly Journal of Economics among others. His research interests include the effects of school inputs and organization on student achievement, the impact of education and social programs on the labor market, immigration, labor market regulation and institutions, and econometric methods for program and policy evaluation. Prof. Angrist also has a long-standing interest in public-policy. In addition to his academic work, he has worked as a consultant to the U.S. Social Security Administration, The Manpower Demonstration Research Corporation, and for the Israeli government after the Oslo peace negotiations in 1994.

Professor Angrist will present a talk entitled "Lead them to Water and Pay them to Drink: An Experiment in Services and Incentives for College Achievement." The presentation will be at noon on Wednesday, November 15th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 1:08 PM

November 10, 2006

Chernoff Faces

We haven't had much on graphics on this blog yet, partly because there are several specialized fora for this peculiar aspect of statistics: for instance, junkcharts, the R-gallery, information aesthetics, the Statistical Graphics and Data Visualization blog, the Data Mining blog, Edward Tufte's forum, Andrew Gelman's blog and others. Yet, I assume readers of this blog wouln't mind a picture every once in a while, so here are some Chernoff faces for you right there. In spirit of Mike's recent entry, they illustrate team statistics from the 2005 baseball season:

faces.png

I recently came across the Chernoff faces while looking for a neat way to display multivariate data to compare several cities along various dimensions in a single plot. Chernoff faces are a method introduced by Herman Chernoff (Prof Emeritus of Applied Math at MIT and of Statistics at Harvard) in 1971 that allows one to convert multivariate data to cartoon faces, the features of which are controlled by the variable values. So for example in the above graph, each teams winning percentage are represented by face height, smile curve, and hair styling; hits are represented by face width, eye height, nose height; etc. (for details and extensions see here).

The key idea is that human are well trained to recognize faces and discern small changes without difficulty. Therefore Chernoff faces allow for easy outlier detection and pattern recognition despite multiple dimensions of the data. Since the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen.

Mathematica and R have canned algorithms for Chernoff faces (see here and here). I haven't seen a Chernoff plot in a social science journal yet, but maybe I am reading the wrong journals. Does anyone know articles that use this technique? Also do you think that this is an effective way of displaying data that should be used more often? Obviously there are also problems with this type of display, but even if you don't like the key idea you have to admit that they look much funnier then the boring bar-graphs or line plots we see all the time.

Posted by Jens Hainmueller at 10:29 AM

November 9, 2006

The cognitive style of better powerpoint

Amy Perfors

While at the BUCLD conference this last weekend, I found myself thinking about the cognitive effects of using PowerPoint presentations. If you haven't read Edward Tufte's Cognitive Style of PowerPoint, I highly recommend it. His thesis is that powerpoint is "costly to both content and audience", basically because of the cognitive style that standard default PPT presentations embody: hierarchical path structure for organizing ideas, emphasis on format over content, and low information resolution chief among them.

Many of these negative results -- though not all -- occur because of a "dumb" use of the default templates. What about good powerpoint, that is, powerpoint that isn't forced into the hierarchical path-structure of organization, that doesn't use hideous, low-detail graphs? [Of course, this definition includes other forms of slide presentation, like LaTeX; I'll use the word "slideware" to mean all of these]. What are the cognitive implications of using slideware, as opposed to other types of presentation (transparencies, blackboard, speech)?

Here are my musings, unsubstantiated by any actual research:

I'd bet that the reliance on slideware actually improves the worst talks: whatever its faults, it at least imposes organization of a sort. And it at least gives a hapless audience something to write down and later try to puzzle over, which is harder to do if the talk is a rambling monologue or involves scribbled, messy handwriting on a blackboard.

Perhaps more controversially, I also would guess that slideware improves the best talks - or, at least, that the best talks with slideware can be as good as the best talks using other media. The PowerPoint Gettysburg Address is a funny spoof, but seriously, can you imagine a two-hour long, $23-million-gross movie of someone speaking in front of a blackboard or making a speech? An Inconvenient Truth was a great example of a presentation that was enhanced immeasurably by the well-organized and well-displayed visual content (and, notably, it did not use any templates that I could tell!). In general, because people are such visual learners, it makes sense that a presentation that can incorporate that information in the "right" way will be improved by doing so.

However, I think that for mid-range quality presenters (which most people are) slideware is still problematic. Here are some things I've noticed:

1. Adding slides is so simple and tempting that it's easy to mismanage your time. I've seen too many presentations where the last 10 minutes are spent hastily running through slide after slide, so the audience loses all the content in the disorganized mess the talk has become.

2. Relatedly, slideware creates the tendency to present information faster than it can be absorbed. This is most obvious when the talk involves math -- which I might discuss in a post of its own -- but the problem occurs with graphs, charts, diagrams, or any other high-content slides (which are otherwise great to have). Some try to solve the problem by creating handouts, but the problem isn't just that the audience doesn't have time to copy down the content -- they don't have the time to process it. Talks without slideware, by forcing you to present content at about the pace of writing, give the audience more time to think about the details and implications of what you're saying. Besides, the act of copying it down itself can do wonders for one's understanding and retention.

3. Most critically, slideware makes it easier to give a talk without really understanding the content or having thought through all the implications. If you can talk about something on an ad hoc basis, without the crutch of having written everything written out for you, then you really understand it. This isn't to say that giving a slideware presentation means you don't really understand your content; just that it's easier to get away with not knowing it.

4. Also, Tufte mentioned that slideware forces you to package your ideas into bullet-point size units. This is less of a problem if you don't slavishly follow templates, but even if you don't, you're limited by the size of the slide and font. So, yeah, what he said.

That all said, I think slideware is here to say; plus, it has many advantages over other types of presentation. So my advice isn't to not use slideware (except, perhaps, for math-intensive talks). Just keep these problems in mind when making your talks.

Posted by Amy Perfors at 11:53 AM

November 8, 2006

Fixing Math Education by Making It Less Enjoyable?

Justin Grimmer

In a recent Brookings Institution report on the mathematics scores of junior high and high school students from different nations uncovers some paradoxical correlations. Using standardized test scores, the report shows that nations with the highest scores also have the students with the lowest confidence in their math ability and the lowest levels of enjoyment from learning math. This is evident in American students, with high confidence and enjoyment, but only with middle-of-the-pack scores on standardized tests.

Casting correlation/causation concerns aside, the Brookings report goes on to argue that the American mathematical education experience is perhaps too enjoyable for students. Rather than informing students about the important mathematical concepts that the foreign textbooks provide, American textbooks are characterized as trying too hard to create an enjoyable classroom experience.

The policy implication provided is to make mathematics less enjoyable in American classrooms by discarding colorful pictures and interesting story problems. At the very least, the report suggests that educator’s attention should be redirected from making math fun to making math education solely about mathematics.

Because of the study’s limited nature, any drastic policy recommendations should be avoided. After all, the report’s argument merely identifies two paradoxical relationships and then speculates a causal mechanism that provides one potential explanation for the trend. No effort is made to eliminate other alternative causal mechanisms. For example, cultural explanations could explain the discrepancy of the scores and confidence ratings, aside from differences in teaching methodologies. The study also attempts to make an ecological inference, inferring individual level behavior from aggregated data. While not damming in itself, it does weaken the strength of the conclusions.

That being said, perhaps the problem with American mathematics education does not lie in the attempt to make students happy, but in the material that is presented. Rather than providing students with an in depth understanding of concepts and introducing proof techniques, high school math assignments are often about memorization and a superficial knowledge of the techniques involved. Perhaps, if the focus were changed to make high school mathematics less like balancing a check book and more like Real Analysis, American math students would see an increase in their happiness in the classroom and also their test scores.

Posted by Justin Grimmer at 11:51 AM

November 7, 2006

Election Day

As everyone must know (unless you are lucky enough to not own a television), today is Election Day in the US. I always think of analyzing elections (and pre-election polling) as the quintessential statistical problem in political science, so I'm sure that many of us are eagerly waiting to get our hands of the results. Recent elections in the U.S. have been somewhat controversial, to say the least, which is probably bad for the country but unquestionably good for the discipline (see the Caltech/MIT Voting Technology Project for one example), and my guess is that this election will continue the trend. Law professor Rick Hasen of electionlawblog.org sets the threat level for post-election litigation at orange; anyone looking for an interesting applied statistics project would be well advised to check out his site in the coming weeks. In the meantime, the Mystery Pollster (Mark Blumenthal) has an interesting post on the exit polling strategy for today's election; apparently we shouldn't expect preliminary and incomplete results to be leaked until 5pm this year.

Posted by Mike Kellermann at 12:36 PM

November 3, 2006

Negative Results

Felix Elwert

In September, The Institute of Medicine released its report on “The Future of Drug Safety,” featuring some goodies on the dissemination of research findings.

One of the recommendations echoes one of the favorite hallway complaints at IQSS: that journals are perennially hung up on publishing *** alpha less than 0.05 yay-yay statistically significant results.

Says the Washington Post:

“[According to the report] manufacturers should also be required to register all clinical trials they sponsor in a government-run database to allow patients and physicians to see the outcome of all studies, not just those published in medical journals, the report said. Studies that show positive results for a drug are more likely to be published by journals than negative ones.”

Welcome to the world of publication bias. (The report is yours for a highly significant $44.)

Posted by Felix Elwert at 11:59 AM

November 2, 2006

Incumbency as a Source of Contamination in Mixed Electoral Systems

Jens Hainmueller

Since the early 1990s, more than 30 countries have adopted mixed electoral systems that combine single-member districts (SMD) in one tier with proportional representation (PR) in a second tier. Political scientists like these type of electoral systems because each voter gets to cast two votes, the first vote according to one set of institutional rules and the second vote according to another. Some have argued that this allows for causal inference because it offers a controlled comparison of voting patterns under different electoral rules. But does it really?

The more recent literature on so called contamination effects undermines this claim. Several papers (Herron and Nishikawa 2001; Cox and Schoppa 2002; Ferrara, Herron, and Nishikawa 2005) have found evidence that there are interaction effects between the two tiers in mixed electoral systems. For example, small parties are able to attract more PR votes in those districts in which they run SMD candidates. The argument is that running a SMD candidate gives a human face to the party and thus enables it to attract additional PR votes.

In a recent paper, Holger Kern and I attempt to add to this debate by identifying incumbency as a source of contamination in mixed electoral systems. It is well known that incumbents that run in single-member district (SMD) races have a significant advantage compared to non-incumbents (Gelman and King 1990). It thus seems plausible to expect that this advantage carries over to the proportional representation (PR) tier, and that incumbents are able to attract additional PR votes for their party in the district. In our paper we identify such an effect using a regression-discontinuity design that exploits the local random assignment to incumbency in close district races (based on an earlier paper by Lee 2006). The RD design allows us to separate a subpopulation of district races in which treatment is assigned as good as randomly from the rest of the data that is tainted by selection effects. We find that incumbency causes a gain of 1 to 1.5 percentage points in PR vote share. We also present simulations of Bundestag seat distributions, demonstrating that contamination effects caused by incumbency have been sufficiently large to trigger significant shifts in parliamentary majorities.

Needless to say, any feedback is highly appreciated.

Posted by Jens Hainmueller at 12:00 PM

November 1, 2006

An Individual-Level Story and Ecological Inference

Jim Greiner

I blogged some last year (see here) on whether an individual-level story is necessary, or useful, to ecological inference. For a review of what ecological inference is, and what I mean by an individual-level story, see the end of this entry. Last year, I stated that such a story was helpful in explaining an ecological inference technique, even if it might not be strictly necessary for modeling. Gary disagreed that such a story was at all helpful, and we had a little debate on the subject, which you can access here. Lately, though, I’ve been thinking that an individual-level story really is necessary for good modeling, not just for communication of a model. In particular, it seems like an individual-level model is required to incorporate survey information into an ecological inference model. Survey data is, after all, data collected at the level of the individual, and with only an aggregate-level model, it’s hard to see how one could incorporate it. Any thoughts from anyone out there?

To review: ecological inference is the effort to predict the values of the internal cells of contingency tables (usually assumed to be exchangeable) when only the margins are observed. A classic example is in voting, where one observes how many (say) black, white, and Hispanic potential voters there are in each precinct, and one also observes how many votes were cast for Democratic and Republican candidates. What one wants to know if, say, how many blacks voted Democrat. By an individual-level story, I mean a model of voting behavior at the level of the individual voter and a mathematical theory of how to aggregate up to the precinct-level counts.

Posted by James Greiner at 12:00 PM

October 31, 2006

Predicting Elections

Jacob Eisenstein at MIT has developed an smart election predictor for the US Senate Elections using a Kalman Filter. The filter helps to decide how much extra weight to attach to more recent polls. Check it out here; he also has some details on the method here.

Posted by Sebastian Bauhoff at 2:01 PM

More thoughts on publication bias and p-values

Amy Perfors

In a previous post about the Gerber & Malhotra paper about publication bias in political science, I rather optimistically opined that the findings -- that there were more significant results than would be predicted by chance, and that many of those were suspiciously close to 0.05 -- were probably not deeply worrisome, at least for those fields in which experimenters could vary the number of subjects run based on the significance level achieved thus far.

Well, I now disagree with myself.

This change of mind comes as a result of reading about the Jeffreys-Lindley paradox (Lindley, 1957), a Bayes-inspired critique of significance testing in classical statistics. It says, roughly, that with large enough sample size, a p-value can be arbitrarily close to zero even though the null hypothesis is highly probable (i.e., very close to one). In other words, a classical statistical test might reject the null hypothesis at an arbitrarily low p-value, even though the evidence that it should be accepted is overwhelming. [A discussion of the paradox can be found here].

When I learned about this result a few years ago, it astonished me, and I still haven't fully figured out how to deal with all of the implications. (This is obvious, since I forgot about it when writing the previous post!). As I understand the paradox, the intuitive idea is that, with larger sample size, you will naturally get some data that appears unlikely (and, the more data you collect, the more likely you are to see some really unlikely data). If you forget to compare the probability of that data under the null hypothesis with the probability of the data under the alternative hypotheses, then you might get an arbitrarily low p-value (indicating that the data are unlikely under the null hypothesis) even if the data is even more unlikely under any of the alternatives. Thus, if you just look at the p-value, without taking effect size, sample size, or the comparative posterior probability of each hypothesis under consideration, you are likely to wrongly reject the null hypothesis on the basis of the p-value, even if it is the most likely of all possibilities.

The tie-in with my post before, of course, is that it implies that it isn't necessarily "okay" practice to keep increasing sample size until you achieve statistical significance. Of course, in practice, sample sizes rarely get larger than 24 or 32 -- at the absolute outside, 50 to 100 -- which is much smaller than infinity. Does this practical consideration, then, mean that the practice is okay? As far as I can tell, it is fairly standard (but then, so is the reliance on p-values to the exclusion of effect sizes, confidence intervals, etc., so "common" doesn't mean "okay"). Is this practice a bad idea only if your sample gets extremely large?

Lindley, D.V. (1957) A statistical paradox. Biometrika, 44. 187-192

Posted by Amy Perfors at 10:00 AM

October 30, 2006

Applied Statistics - Nan Laird & Christoph Lang

This week the Applied Statistics Workshop will present a talk by Nan Laird, Professor of Biostatistics in the Harvard School of Public Health, and Christoph Lang, Assistant Professor of Biostatistics in the Harvard School of Public Health.

Before joining the Department of Biostatistics, Professor Laird received her Ph.D. in Statistics from Harvard and was an Assistant Prof. of Statistics at Harvard. She has published extensively in Statistics in Medicine, Biostatistics, American Journal of Human Genetics and the American Journal of Epidemiology among others. Her research interest is the development of statistical methodology in four primary areas: statistical genetics, longitudinal studies, missing or incomplete data, and analysis of multiple informant data.

Professor Lang earned his Ph.D. in Applied Statistics from the University of Reading, and has been a member of the Department of Biostatistics since then. His publications have appeared in Biostatistics, the American Journal of Human Genetics, Genetic Epidemiology, and Genetics. Prof. Lange's current research interests fall into the broad areas of statistical genetics and generalized linear models. Recent topics in statistical genetics include family-based association tests, meta-analysis of linkage studies, GEE-methods in linkage analysis and marker-assisted selection.

Prof. Laird and Prof. Lang will present a talk entitled “Statistical Challenges and Innovations for Gene Discovery”. An abstract for the talk and associated background papers are available from the course website. The presentation will be at noon on Wednesday, November 1st, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 9:05 AM

October 29, 2006

America by the Numbers

Reading the Data Mining blog, I just learned about this cool visualization of the US population density presented by Time magazine.

Take a closer look here. Cute, isn't it?

Posted by Jens Hainmueller at 3:14 PM

October 26, 2006

Newcomb's Paradox: Reversing Causality?

Justin Grimmer

Newcomb’s paradox is a classic problem in philosophy and also an entertaining puzzle to consider. Here is one version of the paradox. Suppose you are presented with two boxes, A and B. You are allowed to take just box A, just box B, or both A and B. There will always be $1000 in box A, and there will either be $0 or $1,000,000 in box B.

A ‘predictor’ determines the contents of box B before you have arrived, using the following plan. If the predictor believes you will pick both box A and B, then she places nothing in box B, but if she believes that you will only take box B, then she places the $1,000,000 in box B.

What makes this predictor special is her amazing accuracy. In the previous billion plays of the game she has never been wrong.

So, you have the two boxes in front of you, what should you do? Keep in mind, the predictor has already made her decision when you arrive at the boxes, so by our normal rules of causality (events in the future cannot cause past events), our actions cannot change what the predictor has decided.

Posted by Justin Grimmer at 12:00 PM

October 25, 2006

Unconscious Bias & Expert Witnesses

Jim Greiner

Quantitative expert witnesses are essential to modern litigation. But why do they disagree so often?

An excerpt from an article by Professor Franklin Fisher appears below. It’s a tad long, but it’s really worth reading. Does it ring a familiar bell with anyone out there?

“It is not, however, always easy to avoid becoming a ‘hired gun’ . . . The danger is sometimes a subtle one, stemming from a growing involvement in the case and friendship with the attorneys. For the serious professional, concerned about preserving his or her standards, the problem is not that one is always being asked to step across a well-defined line by unscrupulous lawyers. Rather, it is that one becomes caught up in the adversary proceeding itself and acquires the desire to win. . . . Particularly because lawyers play by rules that go beyond those of academic fair play, it becomes insidiously easy to see only the apparent unfairness of the other side while overlooking that of one’s own.”

Franklin M. Fisher, Statisticians, Econometricians, and Adversary Proceedings, 81 J. AM. STAT. ASS’N. 277, 285 (1986)

Posted by James Greiner at 12:00 PM

October 24, 2006

Procrastination

Here’s an interesting piece that should help you keep your New Semester resolutions by understanding procrastination better. Sendhil Mullainathan recently used research by Dan Ariely and Klaus Wertenbroch as motivation for his undergraduate psychology and economics class. Though it’s not exactly statistics, it seems the insights could be useful for grad students and their advisors.

Ariely and Wertenbroch did several experiments to see how deadlines might help overcome procrastination. They examine whether deadlines might be effective pre-commitment devices, and whether they can enhance performance. In one of their experiments, they asked participants to proofread three meaningless synthetic texts. Participants received financial rewards for finding errors and submitting on time (just like in a problem set…). They randomized participants into three categories: three evenly-spaced deadlines every 7 days; an end-deadline after 21 days; or a self-imposed schedule of deadlines within a three week period.

Which one would you select if you could? Maybe the end-deadline because it gives you the most flexibility in arranging the work (similar to a final exam or submitting your dissertation all at once)? Ariely and Wertenbroch found that the end-deadline does the worst both in terms of finding errors and submitting on time. Participants with evenly-spaced deadline did best. But that group also liked the task the least, maybe because they had several unpleasant episodes of reading silly texts, or because they spent more time than the other groups.

So when you start your semester with good intentions, consider setting some reasonable and regular deadlines that bind, and get a calendar. Or just wait for the New Year for another chance to become resolute and have another drink in the meantime.

Posted by Sebastian Bauhoff at 12:44 PM

October 19, 2006

Simpson’s Paradox

Jim Greiner

As a lawyer, I have to be interested not just in what quantitative principles are true, but also in how to present “truth” to people without quantitative training. To that end, HELP! One of the maddening things about statistics is Simpson’s paradox. The quantitative concept, undoubtedly well-known to most readers of this blog, is that the correlation between two variables can change sign and magnitude, depending on what is conditioned on. That is, Corr(A, B | C) might be positive, while Corr(A, B | C, D) might be negative, while Corr (A, B | C, D, E) might be positive again. At bottom, this is what’s going on when regression coefficients become (or cease to be) significant as one adds additional variables to the right-hand side. Because regression currently enjoys a stranglehold on expert witness analyses in court cases (I’ll be ranting on that in the future), communicating Simpson's Paradox a matter of real concern for someone like me who cares about what juries see, hear, and think. Any ideas on how to get this concept across?

Posted by James Greiner at 11:13 AM

October 18, 2006

Meta-analysis: To Trust or Not to Trust

Cassandra Wolos

Social scientists, who often have a limited ability to create true experiments and replicate studies, value ways to learn from the synthesized results of previous work. A popular quantitative tool designed for this purpose is meta-analysis, which calculates a standardized effect size for each of a set of studies in a literature review and then performs inference on the resulting set of effect sizes. Meta-analysis is particularly common in education research.

Can we trust the results of these analyses?

On the one hand, when performed correctly, meta-analysis should successfully summarize the information available in multiple studies. Combining the results in this way can increase the power of overall conclusions when the sample size in each study is relatively small.

On the other hand, a good meta-analysis relies on the assumption that the original studies were unbiased and generally well-performed. In addition, we hope that the researchers in each study had the same target population in mind and worked independently of each other. Further complicating matters is the potential for publication bias – a meta-analysis will rarely include unpublished studies with less impressive effect sizes.

The second hand represents the view of Derek Briggs at the University of Colorado, Boulder, who in a 2005 Evaluation Review paperobjected to what he saw as the overuse of meta-analysis in social science research. He also suggested that assumptions necessary for a reliable meta-analysis are not always met.

More to come on this topic next time.

Posted by Cassandra Wolos at 10:00 AM

October 16, 2006

Applied Stats - Loeffler

This week the Applied Statistics Workshop will present a talk by Charles E. Loeffler, Ph.D. Candidate in Sociology at Harvard University.

Charles graduated from Magna Cum Laude from Harvard with a degree in Social Studies, before going on to receive his M. Phil in Criminology from Cambridge University. He has recently completed the National Consortium on Violence Research Pre-Dissertation Fellowship under the mentorship of Prof. Steven Levitt of the University of Chicago. His work has appeared in The New Republic Online, Federal Sentencing Reporter, and Ars Aequi: A Biographical History of Legal Science. Charles's research interests include Criminology, Quasi-Experimental Methods and Decisionmaking.

Charles will present a talk entitled "Is justice blind? A natural experiment in the use of judicial discretion in criminal trials". The working paper for the talk is available from the course website. The presentation will be at noon on Wednesday, October 18th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 5:00 AM

October 12, 2006

Causation and Manipulation VII: The Cartoon Version

Doging Bill Collectors

As Tailor (A) fits customer (B) and calls out measurements, college boy (C) mistakes them for football signals and makes a flying tackle at clothing dummy (D). Dummy bumps head against paddle (E) causing it to pull hook (F) and throw bottle (G) on end of folding hat rack (H) which spreads and pushes head of cabbage (I) into net (J). Weight of cabbage pulls cord (K) causing shears (L) to cut string (M). Bag of sand (N) drops on scale (O) and pushes broom (P) against pail of whitewash (Q) which upsets all over you causing you to look like a marble statue and making it impossible for you to be recognized by bill collectors. Don't worry about posing as any particular historical statue because bill collectors don't know much about art (more on causal chains in cartoons click here).

Posted by Jens Hainmueller at 11:00 PM

October 11, 2006

Further readings on the Iraqi excess deaths study

Today's papers were full with reports of a new study in the Lancet (here) on counting the excess deaths in Iraq since the US invasion in 2003. The article by Johns Hopkins researchers is an update on a study published in 2004 which generated a huge debate about the political as well as statistical significance of the estimates. This time the media's attention is again on the magnitude of the estimate (655,000 excess deaths, most of them due to violence) which is again vastly higher than other available numbers. The large uncertainty (95% CI 390,000 - 940,000) gets fewer comments this time, maybe because the interval is further away from 0 than in the 2004 study.

Just to point you to some interesting articles, here is a good summary in today’s Wall Street Journal. Wikipedia has a broad overview of the two studies and criticisms here. Brad deLong responded to criticisms of the 2004 study here; he also covers problems with the cluster sampling approach. And check this and this for some related posts on this blog.

By the way, the WSJ article has a correction for misinterpreting the meaning of 95% confidence. Maybe you can use it convince your stats students that they should pay attention.

Posted by Sebastian Bauhoff at 11:59 PM

October 10, 2006

Causation and Manipulation VI: The cognitive science version

I can't resist chiming in and contributing post VI on causation and manipulation, but coming at a rather different angle: rather than ask what we as researchers should do, the cognitive science question is what people and children do do - what they assume and know about causal inference and understanding.

You might think that people would (for lack of a better term) suck at this, given other well-known difficulties in reasoning, anecdotal reports from educators everywhere, etc, etc. However, there's a fair amount of evidence that people -- both adults and children -- can be quite sophisticated causal reasoners. The literature on this is vast and growing, so let me just point out one quite interesting finding, and maybe I'll return to the topic in later posts.

One question is whether children are capable of using the difference between evidence from observations and evidence from intervention (manipulation) to build a different causal structure. The well-named "theory theory" theory of development suggests that children are like small scientists and should therefore be quite sophisticated causal reasoners at an early age. To test this, Schulz, Kushnir, & Gopnik [pdf showed preschool children a special "stickball machine" consisting of a box, out of which two sticks (X and Y) rose vertically. The children were told that some sticks were "special" and could cause the other sticks to move, and some weren't. In the test condition, children saw X and Y move together on their own three times; the experimenter then intervened to pull on Y, causing it to move and X to fail to move. In the experimental condition, the experimenter pulled on one stick (X) and both X and Y moved three times; a fourth time the experimenter pulled on Y again, but only it moved (X was stationary).

The probability of each stickball moving conditioned on the other are the same in both cases: however, if the children reason about causal interventions, then the experimental group -- but not the control group -- should perceive that X might cause Y to move (but not vice-versa). And indeed, this was the case.

Children are also good at detecting interventions that are obviously confounded, overriding prior knowledge, and taking base rate into account (at least somewhat). As I said, this is a huge (and exciting) literature, and understanding people's natural propensities and abilities to do causal reasoning might even help us address the knotty philosophical problems of what a cause is in the first place.

Posted by Amy Perfors at 11:00 PM

October 6, 2006

Causation and Manipulation, V

Jim Greiner

Fair warning: This entry includes a plug for one of my papers

Anti-discrimination laws require lawyers to figure out the causal effect of race (gender, ethnicity) on certain decision making. Previous posts have been exploring the often-tossed-around idea of considering the treatment to be perceived race, as opposed to "actual" (whatever that means) or self-identified race, to answer the no-causation-without-manipulation objection. This feels like a good idea, but it really only works in some cases and not others. It works when we can identify a specific actor (or an institution) whose behavior we want to study. Capital sentencing juries and a defendant firm in an employment discrimination lawsuit are two that work. We can think about changing these specific actors' perceptions of particular units (capital defendants, potential employees), and we can think about WHEN it makes sense to think of treatment (the perception) as being applied: at the moment the actor first perceives the unit's race (or gender or whatever). In contrast, "the public" or "the set of all employers in the United States" are two examples of actors that don't work. The timing of treatment assignment no longer makes sense, the counterfactuals are too hard to imagine, and the usual non-interference-among-units assumption becomes hard to think about.

What does all this buy us? A fair amount. First, this line of thinking identifies cases in which rigorous causal inference based on the potential outcomes framework remains beyond our reach. Figuring out the causal effect of gender or salaries nationwide is one example; another is the causal effect of candidate race on election outcomes. Second, in those cases in which we can identify a specific actor, we get a coherent conceptualization of the timing of treatment assignment, which allows us to distinguish pre- from post-treatment variables. This is a big deal. Entire lawsuits sometimes turn on it.

All this has important implications for civil rights litigation, as I discuss in my paper, "Causal Inference in Civil Rights Litigation." You can get a draft (pdf) of this paper from my website, which you can access by clicking on my name to the left. I'd appreciate any reader reactions/suggestions.

Posted by James Greiner at 10:19 PM

October 5, 2006

Causation and Manipulation IV: Conditional Effects

Mike Kellermann

People who read this blog regularly know that few things get authors and commentators as worked up as questions about causal inference, either philosophical (here, here, and here) or technical (here, here, here, etc.). I wouldn't want to miss out on the fun this time around -- and how could I pass up the opportunity to have the IV post on causation and manipulation?

Jens and Felix have both discussed whether non-manipulable characteristics such as race or gender ("attributes" for Holland) can be considered causes within the potential outcomes framework. I agree with them that, at least as far as Holland is concerned, the answer is (almost always) no - no causation without manipulation. The fact that we are having this discussion 20 years later suggests (to me, at least) that this answer is intuitively unsatisfying. It is worth remembering a comment made by Clark Glymour in his discussion of the Holland (1986) article:

People talk as they will, and if they talk in a way that does not fit some piece of philosophical analysis and seem to understand each other well enough when they do, then there is something going on that the analysis has not caught.

Identifying perceptions of an attribute (rather than the attribute itself) as the factor subject to manipulation makes a lot of sense in situations where the potential outcomes are to a certain degree out of the control of the individual possessing the attribute, as in the discrimination example. Extending this idea to situations in which outcomes are generated by the subject possessing the attribute (in which "self-perceptions" would be manipulated) would commit researchers to a very particular understanding of attributes such as race and gender that would hardly be uncontroversial.

In these cases, I think that it makes more sense to look at the differences in well-specified Rubin-Holland causal effects (i.e. the results of manipulation) conditional on values of the attribute rather than identifying a causal effect as such. So, for example, in the gender discrimination example we could think of the manipulation as either applying or not applying for a particular job. This is clearly something that we could randomize, so the causal effect would be well defined. We could calculate the average treatment effect separately for men and women and compare those two quantities, giving us the difference in conditional causal effects. I'm sure that there is a catchy name for this difference out there in the literature, but I haven't run across it.

So, is this quantity (the difference in conditional causal effects) of interest to applied researchers in the social sciences? I would argue that it is, if for nothing else than giving us a more nuanced view of the consequences of something that we can manipulate. Is it a Rubin-Holland causal effect? No, but that is a problem only to the extent that we privilege "causal" over other useful forms of inference.

Posted by Mike Kellermann at 11:00 PM

October 4, 2006

Causation and Manipulation III: Let’s Be Specific

Felix Elwert

Two recent post by Jim and Jens ponder the holy grail of manipulability via the exchange between Holland and Heckman. Can non-manipulable things like gender or race cause things in the potential outcomes framework?

Holland (1986) says no because it’s hard to conceive of changing the unchangeable. Fair enough. But this argument has been carried too far in some quarters and not far enough in others. Here’s why:

Invoking Holland, some population scientists now go so far to claim that we can’t conceive of things like marriage or divorce as causes because the decision to marry or divorce is beyond the direct control of an experimenter. Please. At most we need some exogeneity, a little speck of indifference, a tipping point to make them amenable to coherent causal thinking (and estimation). Heckman goes even farther than this, and he is right: the issue is not whether I, personally, can wreck all marriages in my study, but whether we can coherently conceive of a counterfactual world where things are different as a matter of theoretical speculation ("mental act"). In this, however, even Heckman seems to yield: A minimum requirement for thinking about counterfactual worlds would appear to be the possibility of conceiving of these worlds in a coherent fashion. And this, I believe is the underlying unease of the statisticians whom Heckman criticizes: whether one can even coherently imagine counterfactual worlds in which gender is changed.

On the other hand, social scientists love to talk about the effects of gender and race, which – pace Michael Jackson and Deidre McCloskey – are really hard to think of as manipulable, ceteris paribus. What Holland’s dictum contributes in this respect is the entirely appropriate call for getting the question straight.* For what most of these studies look for is evidence of discrimination. Thinking about discrimination within the potential outcomes framework makes it clear that the issue really isn’t whether we can manipulate the race or gender of a specific person, but rather whether we can manipulate the perception of the person’s race or gender in the eyes of the discriminator. Cases in point: Goldin and Rouse’s study on discrimination in symphony orchestras, where the gender of applicants was obscured (i.e. perceptions manipulated) by staging auditions behind an opaque gauze barrier. Similarly, Grogger and Ridgeway’s paper in the latest issue of JASA uses natural variation in the perceptibility of driver’s skin color (dusk, the veil of darkness) to test for racial profiling in traffic controls. In either case, the causal question was not, what would happen if we changed the musician/driver from female/black to male/white, but, What would happen if we could change knowledge/perception of race and gender.

In other words, there are important causal questions to be asked about race and gender, but these questions don’t necessarily require the manipulability of race and gender. Not even within the potential outcomes framework of causality.


* My pet peeve: Much of social science is so busy providing answers that it forgets to ask well-formulated questions.

Posted by Felix Elwert at 11:00 PM

October 3, 2006

Causation and Manipulation II: The Causal Effect of Gender?

Jens Hainmueller

In a recent post, Jim Greiner asked whether we adhere to the principle of "no causation without manipulation." This principle, if true, raises the question of whether it makes sense to talk about the causal effect of gender.

The Rubin/Holland position on this is clear: it makes no sense to talk about the causal effect of gender because what manipulation and thus what counterfactual one has in mind (a sex-transformation surgery?) is clearly ill-defined. One can ask related questions like sending resumes to employers randomizing female and male names and see whether one gender is more likely to be invited to a job interview, but it makes no sense to think about a causal effect of gender per se.

The contrasting view is presented by one of their main foils, James Heckman, who writes in a recent paper (Andrew Gelman also had a blog post on this): "Holland claims that there can be no causal effect of gender on earnings. Why? Because we cannot randomly assign gender. This confused statement conflates the act of definition of the causal effect (a purely mental act) with empirical difficulties in estimating it. This type of reasoning is prevalent in statistics. As another example of the same point, Rubin (1978, p. 39) denies that it is possible to define a causal effect of sex on intelligence because a randomization cannot in principle be performed. In this and many other passages in the statistics literature, a causal effect is defined by a randomization. Issues of definition and identification are confused. [...] the act of definition is logically separate from the acts of identification and inference." Heckman sees this as a "view among statisticians that gives rise to the myth that causality can only be determined by randomization, and that glorifies randomization as the ‘‘gold standard’’ of causal inference."

So what do you make of this? Does it make sense to think about a causal effect of gender or not? Does it make sense to try to estimate it, i.e. interpret a gender gap in wages as causal (balance on all confounders except gender). How about the causal effect of race, etc.? Just to be precise here notice that Rubin/Holland admit that "even thought it may not make much sense to talk about the 'causal' effect of a person being a white student versus being a black student, it can be interesting to compare whites and blacks with similar background characteristics to see if there are differences" in some outcome of interest.

Posted by Jens Hainmueller at 10:00 PM

October 2, 2006

Applied Statistics –Subharup Guha & Louise Ryan

This week the Applied Statistics Workshop will present a talk by Subharup Guha, Post-Doctoral Research Fellow in the Harvard School of Public Health Department of Biostatistics, and Louise Ryan, Henry Pickering Walcott Professor of Biostatistics in the Harvard School of Public Health and Department of Biostatistical Science at the Dana-Farber Cancer Institute.

Before coming to Harvard, Dr. Guha received his Ph.D. in Statistics at Ohio State University. Dr. Guha’s publications appear in Environmental and Ecological Statistics, Journal of the American Statistical Association, Journal of Computational and Graphical Statistics and the Journal of the Royal Statistical Society. His research interests include Bayesian modeling, computational biology, MCMC simulation, Semiparametric Bayesian methods, Spatio-temporal models and survival analysis.

Professor Ryan earned her Ph.D. in Statistics from Harvard University, and has been a member of the Department of Biostatistics since then. She has received numerous honors and distinctions during that time including the the Spiegelman Award from the American Public Health Association, and was named Mosteller Statistician of the Year. She has published extensively in Biometrics, Journal of the American Statistical Association, Journal of Clinical Oncology, and the New England Journal of Medicine. Her research interests focus on statistical methods related to environmental risk assessment for cancer, developmental and reproductive toxicity and other non-cancer endpoints such as respiratory disease, with a special interest in the analysis of multiple outcomes as they occur in these applied settings.

Dr. Guha and Professor Ryan will present a talk entitled "Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates." The paper that accompanies the talk is available from the course website. The presentation will be at noon on Wednesday, October 4th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 12:02 PM

It Takes Two (Non-Motown Version)

The New York Times recently published an obituary for David Lykken, who was a pioneer of twin studies. His “Minnesota Twin Studies” suggested the importance of genetic factors in life outcomes. But his work with twins also spurred empirical research in many fields, not just genetics – and for good reason.

The idea of using twins for social science studies is very appealing: some twins are genetically identical, and also grow up in the same family and environment. So from a statistical perspective, comparing outcomes such as earnings between pairs of twins is like having a “perfect match." This idea made the rounds in many fields, such as labor economics. By using the argument that all unobserved characteristics (e.g. “genetic ability”) should be equal and can thus be differenced away, twin studies were used to estimate the returns to education – the effect of education on wages.

Alas there are potential problems with using twin data. For example, measurement error in a difference estimation can lead to severe attenuation bias precisely because twins are so similar. If there is little variation in educational attainment, even small measurement errors can strongly affect the estimate. Researchers have been ingenious about this (e.g. by instrumenting one persons’ education with the level that her twin reported, as in Ashenfelter and Krueger). While this may reduce the attenuation bias it can magnify the omitted variables bias which motivated the use of twins in the first place. Because there are only small differences in schooling, small unobserved differences in ability can lead to a large bias. The culprits can be details such as differences in birth weight (Rosenzweig and Wolpin have a great discussion of such factors). In addition, twins who participate in such studies are a selected group: they are getting along well enough to participate, and many of them get recruited at “twin events.” But not all twins party in Twinsburg, Ohio.

Of course none of this is to belittle the contribution of Dr Lykken, who besides helping to start this flurry of work also was also a major contributor to happiness research.

Posted by Sebastian Bauhoff at 1:10 AM

September 29, 2006

Political Statistics Blogs

Mike Kellermann

With the 2006 election coming up soon, here are a couple of blogs that might appeal to both the political junkie and the methods geek in all of us. Political Arithmetik , a blog by Charles Franklin from Wisconsin, is full of cool graphs that illustrate the power of simple visualization and non-parametric techniques, something that we spend a lot of time talking about in the introductory methods courses in the Gov Department. (On a side note, I think that the plots like this of presidential approval poll results that you find on his site and others have to be one of the best tools for illustrating sampling variability to students who are new to statistics.) Professor Franklin also contributes to another good polling blog, Mystery Pollster, run by pollster Mark Blumenthal. It just moved to a new site, which now has lots of state-level polling data for upcoming races. All in all, plenty of good stuff to distract you from the "serious" work of arguing about causal inference, etc.

Posted by Mike Kellermann at 11:00 PM

September 28, 2006

Causation and Manipulation

Jim Greiner

In a 1986 JASA article, Paul Holland reported that he and Don Rubin had once made up the motto, “NO CAUSATION WITHOUT MANIPULATION.” The idea is that even in an observational study, causal inference cannot proceed unless and until the quantitative analyst identifies an intervention that hypothetically could be implemented (although Professor Holland accepts the idea that the manipulation may be not ever be carried out for physical or ethical reasons). The idea of studying the causal effect of things that we as human beings could never influence is incoherent because such things could never be the subject of a randomized experiment.

My question: do we really adhere to this principle? Take the one causal link established via observational studies that pretty much everyone (even Professor Freedman, see below) agrees on: smoking causes lung cancer. Has anyone ever bothered to imagine what manipulation to make people smoke is contemplated? Aren’t we pretty sure it wouldn’t matter how we intervened, i.e., however it happens that people smoke, those who smoke get lung cancer at a higher rate? (It might matter what they smoke, how much they smoke, perhaps even where and when, but what got them started and what keeps them at it?) If folks agree with me on this, what’s left of Professor Holland’s maxim?

Paul W. Holland, Statistics and Causal Inference, 81 J. Am. Stat. Ass’n 945, 959 (1986)

David Freedman, From Association to Causation: Some Remarks on the History of Statistics, 14 Stat. Sci. 243, 253 (1999)

Posted by James Greiner at 11:00 PM

September 27, 2006

Mind the Coding

Here's something new to pick at, in addition to methods problems: coding isues. A recent Science (August 18, 2006, pages 979-982) article by Bruce Dohrenwend and colleagues reported on revised estimates of post traumatic stress disorders of Vietnam veterans. See here for an NYT article. The new study indicates that some 18.7% of Vietnam veterans developed diagnosable post-traumatic stress, compared with earlier estimates of 30.9%. The differences comes mainly from using revised measures of diagnosis and exposure to combat for a subset of the individuals covered in the original data source, the 1988 National Vietnam Veterans' Readjustment Study (NVVRS). The authors added military records to come up with the new measures.

Given the political and financial importance (the military has a budget for mental health), this is quite a difference. One critical issue pointed out by the Science article is that the original study did not adequately control for veterans who had been diagnosed for mental health problems before being sent to combat. Just looking at the overall rates after combat is not a great study design. But this also makes me wonder about how the data was collected in the first place. Maybe the most disabled veterans didn’t reply to the survey, or were in such state of illness that they couldn’t (or had died of related illnesses). The NVVRS is supposedly representative but this would be an interesting point to examine.

This article also illustrates how important the data, measures and codings are in social science research these days. It seems that taking these issues more seriously should be part of the academic and policy process just like replication should be (see here and here for some discussion this issue). While study and sample design are under much scrutiny these days, there are still few discussions about the sensitivity to coding and data. Given the difference they can make, this should change.

Posted by Sebastian Bauhoff at 11:00 PM

September 26, 2006

Publication bias, really?!?

Amy Perfors

I'm a little late into the game with this, but it's interesting enough that I'll post anyway. Several folks have commented on this paper by Gerber and Malhotra (which they linked to) about publication bias in political science. G&M looked at how many articles were published with significant (p<0.05) vs. non-significant results, and found -- not surprisingly -- that there were more papers with significant results than would be predicted by chance; and, secondly, that many of the significant results were suspiciously close to 0.05.

I guess this is indeed "publication bias" in the sense of "there is something causing articles with different statistical significance to be published differentially." But I just can't see this as something to be worried about. Why?

Well, first of all, there's plenty of good reason to be wary of publishing null results. I can't speak for political science, but in psychology, a result can be non-significant for many many more boring reasons than that there is genuinely no effect. (And I can't imagine why this would be different in poli sci). For instance, suppose you want to prove that there is no relation between 12-month-olds' abilities in task A and task B. It's not sufficient to show a null result. Maybe your sample size wasn't large enough. Maybe you're not actually succeeding in measuring their abilities in either or both of the tasks (this is notoriously difficult with babies, but it's no picnic with adults either). Maybe A and B are related, but the relation is mediated by some other factor that you happen to have controlled for. etcetera. Now, this is not to say that no null results are meaningful or that null results should never be published, but a researcher -- quite rightly -- needs to do a lot more work to make it pass the smell test. And so it's a good thing, not a bad thing, that there are fewer null results published.

Secondly, I'm not even worried about the large number of studies that are just over significance. Maybe I'm young and naive, but I think it's probably less an indication of fudging data than a reflection of (quite reasonable) resource allocation. Take those same 12-month-old babies. If I get significant results with N=12, then I'm not going to run more babies in order to get more significant results. Since, rightly or wrongly, the gold standard is the p<0.05 value (which is another debate entirely), it makes little sense to waste time and other resources running superfluous subjects. Similarly, if I've run, say, 16 babies and my result is almost p<0.05, I'm not going to stop; I'll run 4 more. Obviously there is an upper limit on the number of subjects, but -- given the essential arbitrariness of the 0.05 value -- I can't see this as a bad thing either.


Posted by Amy Perfors at 11:00 PM

Applied Statistics – Ben Hansen

This week the Applied Statistics Workshop will present a talk by Ben Hansen, Assistant Professor of Statistics at the University of Michigan. Professor Hansen graduated from Harvard College, magna cum laude, with a degree in Mathematics and Philosophy. He went on to win a Fulbright Fellowship to study philosophy at the University of Oslo, Norway, after which he earned his Ph.D. in Logic and Methodology of Science at the University of California, Berkeley.

Professor Hansen’s primary research interests involve causal inference in comparative studies, particularly observational studies in the social sciences. His publications appear in the Journal of Computational and Graphical Statistics, Bernoulli, Journal of the American Statistical Association, and Statistics and Probability Letters. He is currently working on providing methods for statistical adjustment that enable researchers to mount focused, specific analogies of their observational studies to randomized experiments.

Professor Hansen will present a talk entitled "Covariate balance in simple, stratified and clustered comparative studies." The working paper that accompanies the talk is available from the course website. The presentation will be at noon on Wednesday, September 27, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

If you missed the workshop’s first meeting, you should check out the abstract of Jake Bowers’ talk, “Fixing Broken Experiments: A Proposal to Bolster the Case for Ignorability Using Subclassification and Full Matching”.

Posted by Eleanor Neff Powell at 4:34 PM

September 25, 2006

Freeloading: Economics meets Poly Sci, imitates Art

In the next few weeks, the number of articles posted to this site is set to increase, partly because school's back in session, and partly because we've recruited some new authors for the committee. This is a good thing in general. However, I know I work best on a deadline, so it happens that I tend to post when the flow is slower, and less when a lot of articles are being posted by the other authors.

To bring this back to the realm of science: Am I taking the position of a economic free rider (or "freeloader", if you prefer), if I tend to post less frequently than other authors, or is someone in my position merely acting as a balancing actor, keeping stability?

As for the "art", I doubt that this observation is opera-worthy, but it does tend to happen a lot in social situations I've seen. Certainly in an early episode of Seinfeld where George wanted to split a cab but not have to pay for it because they "were going that way anyway".

Posted by Andrew C. Thomas at 2:00 PM

September 19, 2006

Dirichlet Simplex Exploration: Or, My Prayer Answered

Andrew Fernandes, a fellow Canadian expat and PhD student at NC State, responded to my earlier request for advice on exploring a Dirichlet-type simplex.

Among other places, the idea is presented in the Wikipedia entry for Simplex. He suggests perturbing the cumulative sums, then putting the perturbed sums back in order to draw a time-reversible proposal. This has the advantage of not sending too many parameters below zero - a maximum of one - as opposed to an equal perturbation of each parameter, and not pinning a high-valued parameter in place with a standard Dirichlet proposal.

Posted by Andrew C. Thomas at 11:32 PM

September 15, 2006

What are your thoughts?

Amy Perfors

Ah, the beginning of fall term -- bringing with it the first anniversary of this blog (yay!), a return to our daily posting schedule (starting soon), and a question for you, our readers:

Do you have any feedback for us? Specifically, are there topics, issues, or themes you would like us to cover more (or less) than we do? Would you like to see more discussion of specific content and papers? More posts on higher-level, recurring issues in each of our fields (or across fields)? More musings about teaching, academia, or the sociology of science? Obviously the main factor in what we write about comes down to our whims and interests, but it's always nice to write things that people actually want to read.

In my specific case, I know that I try not to blog about many cognitive science and psychology topics that I think about if they aren't directly related to statistics or statistical methods in some way: I fear that it wouldn't be of interest to readers who come here for a blog about "Social Science Statistics". However, maybe I've been needlessly restrictive...?

So, readers, what are your opinions?

Posted by Amy Perfors at 11:21 AM

September 10, 2006

The Tenth Dimension

The semester is about to start, which means it is math camp time at the Government Department. The very first topic is usually an introduction to dimensions, starting from R1 (lines), to R2 (planes), to R3 (3D planes), to R4 (3D plane plus time). Here is a nice flash animation (click on “imagining ten dimensions” on the left) that takes you a step further, from zero to ten dimensions in less than 5 minutes (including cool visual and acoustic effects). It doesn’t necessarily become more graspable as you ascend ... :-)

Posted by Jens Hainmueller at 8:26 AM

August 7, 2006

In Which Drew Suggests that Scientists Avoid The Word 'Regression'

I've spent quite a bit of time in the last few weeks - probably too much - thinking about the term 'regression' and its use in statistics, and why I find it so dislikeable. I sincerely doubt any campaign I try to start will have any real effect, so let me lay down the reasons why I feel we as scientists should refer to linear modelling as just such, and not as 'regression'.

One reason is that the word only has a tenuous connection to the actual algorithm - the other is that it far too often implies a causal relationship where none exists.

As the story goes, Francis Galton took a group of tall men and measured the height of their sons, and found that on average, the sons as a group were shorter than their fathers. Drawing on similar work he had done with pea plants, he described this phenomenon as "regression to the mean," recognizing that the sample of fathers was nonrandom. A "regression coefficient" then described the estimated parameter which, when multiplied by the predictor, would produce the mean value.

I can only surmise that "determining regression coefficients through minimizing the least squares difference" was too verbose for Galton and his buddies, and "regression analysis" stuck. Now we have lawyerese terms like "multiple regression analysis," which really should read "multiple parameter regression analysis" since we're only running one algorithm, but we appear stuck with it.

So what's the big deal? Nomenclature isn't an easy business, and two extra syllables in "linear model" might slow things down. But aside from my gripe with using "regress" as a transitive verb (the Latin student in me cringing), even the most generous interpretation of the word's root, and the experiments that revealed it, yield to trouble.

"Regression" literally means "the act of going back." If we accept this definition in this context, we have to have something to which we can return. Clearly, this implies discovering the mean - but chronologically, it can only mean discovering the cause, that which came before.

Linear modelling makes no explicit assumptions about cause and effect, a major source of headache in our discipline, but the word itself, consciously or otherwise, binds us to this fact.

The remedy to this is not simple; after all, I'm talking about trying to break the correlation-is-causation fallacy through words, which is both a difficult task and the sort of behaviour that will keep people from sitting with you at lunch. But we can improve things slowly and subtly in this fashion:

1) If you are confident that your analysis will unveil a causal relationship, say so. Call it "regression-to-cause", or "causal linear model", or something like that.

2) If you're not so sure, call it a (generalized) linear model, or a lin-mod, or a least-squares, or another term that does not necessarily imply causation. Resist the temptation to fall back to the word "regression" until a long time has passed.

This doesn't have to be a completely nerve-wracking exercise; just use a strike-through when necessary, to show that the term regression'linear model' is better suited to describe what we're trying to build here.

Posted by Andrew C. Thomas at 11:30 PM

July 30, 2006

C. Frederick Mosteller, 1916-2006

C. Frederick Mosteller, the first chairman of the Statistics Department at Harvard, passed away last week at the age of 89. He served as chair of the Statistics Department from 1957 to 1969, and later chaired the departments of Biostatistics and Health Policy and Management at the Harvard School of Public Health. His obituary in the New York Times mentions his work reviewing the performance of pollsters in the Dewey-Truman election of 1948 and his explanation of the Red Sox inexplicable loss in the 1946 World Series ("There should be no confusion here between the 'winning team' and the 'better team'"), but doesn't say that he took a leave of absence in the early sixties to record a lecture series for NBC. According to one history of the Statistics Department, 75,000 students took the course for credit and 1.20 million (give or take) watched the lectures on television. Imagine doing that today....

Posted by Mike Kellermann at 8:55 PM

July 1, 2006

Cross-Pollination

A letter I wrote in reaction to the Texas decision made it into today's New York Times. It even has a nice little plug for IQSS at the bottom.

Posted by Andrew C. Thomas at 2:52 PM

June 28, 2006

News In Texas Redistricting

The noted Texas redistricting case, known politically for its role involving Tom DeLay and academically for the amici curiae brief filed by Gary King, Andrew Gelman, Jonathan Katz and Bernard Grofman, was ruled on by the Supreme Court today. In short, the party-based gerrymandering was not a problem - nor was the fact that it was done off the traditional calendar - but the composition of districts involving the dilution of Hispanic voters was. The court has ordered that those irregular districts be redrawn. (Note: only the composition of District 23 was considered to be in violation of the Voting Rights Act, but you obviously can't redraw one district without affecting another.)

The nature of this ruling should surprise no one involved in Jim Greiner's Quantitative Social Science and Expert Witnesses class.

A good summary is here.

Posted by Andrew C. Thomas at 11:23 AM

June 18, 2006

Statz Rap

Amy Perfors

A friend emailed this to me;apparently the teaching assistants at the University of Oregon have creative as well as statistical talents. It's pretty funny. Perhaps every intro to statistics class could begin with a showing... video here

Posted by Amy Perfors at 4:21 PM

May 25, 2006

Dirichlet Spaces and Metropolis Traces

Drew Thomas

A problem I've had come up again and again is the ability to explore a space bound by a Dirichlet prior with a Metropolis-type algorithm. I've yet to find a satisfactory answer and I'm hoping someone else will have some insight.

The research question I have deals with allocating patients to hospitals, considering the effect of the number of beds - one example of the "supply-induced demand" question. (The analysis is being done under Prof. Erol Pekoz, who's visiting Harvard Stats this year.) Conjugate priors for this problem have eluded me, and so the quantity of interest, the probability that a patient will be sent to a particular hospital for inpatient care, is being inferred through a Metropolis algorithm.

Here's the thing: there are at most 64 different hospitals to which a patient can be assigned. Even after assuming that if a hospital has not yet received a patient from a particular area they won't ever, the number of hospitals is extreme.

One suggested proposal has been a Dirichlet distribution with parameters equal to the current values, times a constant. That way the expected value of the proposal will be the same as the last draw. However, when the number is too low, the smallest dimensions will have parameter value less than 1, which leads to trouble, as the value will tend to zero; when it's too high, the biggest parameters don't move at all, and the effect of moving some of its mass is lost.

I've considered implementing a parallel-tempering method but I'd like to keep it cleaner. Does anyone have a better method that's reasonably quick to run, rather than monkeying with each parameter individually?

Posted by Andrew C. Thomas at 6:00 AM

May 23, 2006

Inheritance Laws

Jason Anastasopoulos, guest blogger

Question: Many political philosophers that focused on questions of property (including Plato) believed that equality of conditions were necessary for the development of a virtuous citizenry and virtuous leaders. The key to creating this equality of conditions, they argued, was the implementation of strict inheritance laws limiting the transfer of wealth from one generation to the next. Does anyone know of any quantitative models or empirical studies that examine the interaction between social stratification and inheritance laws? If you do, email me at janastasopoulo@law.harvard.edu.

Posted by James Greiner at 6:00 AM

May 20, 2006

It's summer!

It's the end of the term for both Harvard and MIT... so in view of the fact that we on the authors committee are about to embark on summers of tireless dedication to research while scattered to the far reaches of the planet, posting to this blog will be reduced until fall.

A special thanks to the loyal readers and commenters of this blog -- you folks have made this year a really rewarding experience for us. We won't stop posting, so do hope you still stop by occasionally and are still with us when we resume on a full schedule at the end of the summer.

Posted by Amy Perfors at 2:09 PM

May 18, 2006

Reactions To The Virginity Pledge Study

Drew Thomas

Harvard School of Public Health doctoral candidate Janet Rosenbaum has been in the news lately, following the publication of her study of virginity pledges in the American Journal of Public Health, as well as her recent IQSS seminar. (Full disclosure: Janet is a friend of mine. I'll address her as Ms. Rosenbaum for this entry.) Since it's certainly a hot topic, it's no surprise how much attention her findings have received; first, the big news agencies picked it up, then the blogosphere took their shift - mainly over the "controversy" resulting from the study. (See pandagon.net for an example.)

But I think the more relevant part of the whole debate is the point Ms. Rosenbaum was trying to make about surveys and self-reporting: we use these data to make broad, sweeping conclusions on social phenomena, and while they are the best we have, they aren't up to the best standard we could achieve.

Posted by Andrew C. Thomas at 6:04 AM

May 16, 2006

Communication, Anyone?

Jim Greiner

The course I co-taught this semester on Quantitative Social Science & Law has come to an end. There were a lot of “lessons learned” in the class, both for the students (at least, I hope so) and for the teaching staff (more definitely). Of all of these lessons, one sticks in my head: we ought to focus on teaching quantitative students how to communicate with folks without formal statistical training.

Some quantitative folks will graduate and spend the rest of their lives talking to and working with only quantitative people. Some, but not many. Most of us will be talking and working with people who have little or no statistics classes under their belts. But do we ever teach the communication skills needed to function effectively with the proles? I’ve never seen or heard of a class that focuses on these skills. Not one. Does that strike anyone besides me as odd?

Posted by James Greiner at 6:00 AM

May 15, 2006

A bit of google frivolity

Amy Perfors

Google has just come out with a new tool, Google Trends, which compares the frequencies of different web searches and thus provides hours of entertainment to language and statistics geeks like myself. In honor of that -- and, okay, because it's nearing the end of the term and I'm just in the mood -- here's a rather frivolous post dedicated to the tireless folks at Google, for entertaining me today.

Some observations:

1) One thing that is interesting (though in hindsight not surprising) is that Google Trends seems like a decent tool for identifying how marked a form is. The basic idea is that a default term is unmarked (and often unsaid), but the marked term must be used in order to communicate that concept. For instance, in many sociological domains, "female" is marked more than "male" is -- hence people refer to "female Presidents" a lot more than they refer to "male Presidents", even though there are many more of the latter: the adjective "male" is unnecessary because it just feels redundant. In contrast, you much more often say "male nurse" than "female nurse", because masculinity is marked in the nursing context.

Anyway, I noticed that for many sets of words, the term that is searched for most often is the marked term, even though the unmarked term probably occurs numerically more often. For instance, Blacks, whites indicates far more queries for "blacks"; Gay, straight many more for "gay"; and Rich, poor, middle class the most for rich, followed by poor, and least of all middle class.

I have two hypotheses to explain this: (a) people generally google for information, and seek information about what they don't know; so it's not surprising that more people don't know about the non-default, usually numerically smaller item. And, (b) since unmarked means it doesn't need to be used, it's not really a surprise that people don't use it. Still, I thought it was interesting. And clearly this phenomenon, if real at all, is at most only one of many factors affecting query frequency: for instance, Christian, atheist, muslim indicates far more hits for "Christian", and those in very Christian areas.

2) Another observation: the first five numbers seem to have search frequencies that drop by half with each consecutive number. Is this interesting for cognitive reasons? I have no idea.

3) As far as I can tell, no search occurs more often than "sex." If anyone can find something with greater frequency, I'd love to hear it. On the one hand, it may say good things for our species that "love" beats out "hate", but that may just mean more people are searching for love than hate. And "war" beats out "peace", sadly enough.

4) "Hate bush" peaked right before the 2004 election, "love bush" about six months before that. I have no idea what that's all about.

5) It's amazing to me how many people clearly must use incredibly unspecific searches: who searches for "one"? Or "book"? Though there is no indication of numbers (a y axis on these graphs would be incredibly handy), a search needs a minimum number of queries otherwise it won't show up, so somebody must be making these.

6) In conclusion, I note that Harvard has more queries than MIT. Does this mean that MIT is the "default"? Or that Harvard generates more interest? Since I'm an MIT student but writing for a Harvard blog, I plead conflict of interest...

Posted by Amy Perfors at 6:00 AM

May 12, 2006

Statistical Discrimination in Health Care

Sebastian Bauhoff

This blog has frequently written about testing for discrimination (see for example here, here, and here). This is also a hot issue in health care. In health care there is a case for 'rational' discrimination' where physicians respond to clinical uncertainty by relying on priors about the prevalence of diseases across racial groups (for example).

A paper by Balsa, McGuire and Meredith in 2005 lays out a very nice application of Bayes Rule to look into this question. The Institute of Medicine suggests that there are three types of discrimination: simple prejudice, stereotyping, and statistical discrimination where docs use probability theory to overcome uncertainty. The latter occurs when the uncertainty of a patients condition leads the physician to treat her differently from similar people of different race.

The paper uses Bayes Rule to conceptualize the decision a doctor has to make when hearing symptom reports from a patient and has to decide whether the patient really has the disease:

Pr(Disease | Symptom) = Pr(Symptom | Disease) * Pr(Disease) / Pr(Symptom)

A doc would decide differently if she believed that disease prevalence differs across racial groups (which affects Pr(Disease)), or if diagnostic signals are more noisy from some groups (which changes Pr(symptom)), maybe because the quality of doctor-patient communication differs across races.

The authors test their model on diagnosis data from family physicians and internists, and find that sensible priors about disease prevalance could explain racial differences in the diagnosis of hypertension and diabetes. For the diagnosis of depression there is evidence that differences in doctors' decisions may be driven by different communication patterns between white docs and their white vs. minority patients.

Obviously prejudice and stereotyping are different from statistical discriminiation, and have quite different policy implicatons. This is a really nice paper that makes these distinctions clear as well as nicely using Bayes Rule to conceptualize the issues. The general idea might also apply to other issues of policy including police stop and search.

Posted by Sebastian Bauhoff at 6:00 AM

May 10, 2006

An Intoxicating Story

From Wikipedia's entry on the t-test:

The t-statistic was invented by William Sealy Gosset for cheaply monitoring the quality of beer brews. "Student" was his pen name. Gosset was statistician for Guinness brewery in Dublin, Ireland, hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge for applying biochemistry and statistics to Guinness's industrial processes. Gosset published the t-test in Biometrika in 1908, but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown not only to fellow statisticians but to his employer - the company insisted on the pseudonym so that it could turn a blind eye to the breach of its rules. Today, it is more generally applied to the confidence that can be placed in judgements made from small samples.

I like the way they think.

Posted by Andrew C. Thomas at 6:00 AM

May 9, 2006

Running Statistics On Multiple Processors

Jens Hainmueller

You just bought a state-of-the-art PC with dual processors and yet your model still runs forever? Well, your statistical software is probably not multi-threading, meaning that despite the fact that your computer actually has two processors, the whole computation runs only on one of them. Don’t believe me? Well check your CPU usage, it's probably stuck at 50 percent (or less).

You might ask why statistical software doesn't use both processors simultaneously. The fact is that splitting up computations to two or even more processors is a non-trivial issue that many software packages do not accomplish yet. This may change in the near future, however, as the advent of dual processors for regular PCs exhibits increasing pressure on statistical software producers to allow for multi-threading.


In fact, Stata Corp. has recently released Stata/MP, a new version of Stata/SE that runs on multiprocessor computers. Their website proclaims that: "Stata/MP provides the most extensive support for multiple-processor computers and dual-core computers of any statistics and data-management package." So this bodes well for Stata users.

What’s in it for Non-Stataists? People at S-PLUS told me yesterday that there is "currently an enhancement request to add functionality to S-PLUS that will allow it to use multiple processors. This request has been submitted to our developers for further review." Unfortunately no further information is available at this point.

In my favourite software R, there are efforts to get concurrency and potentially parallelism. Currently, the SNOW package allows for simple parallel computing.

It will be interesting to see how other statistical software producers like SAS, LIMDEP, etc. will react to this trend toward dual processing. Does anybody have more information about this issue?

Posted by Jens Hainmueller at 6:00 AM

May 8, 2006

Coarsened at Random

Jim Greiner

I’m the “teaching fellow” (the “teaching assistant” everywhere but Harvard, which has to have its lovely little quirks: “Spring” semester beginning in February, anyone?) for a course in missing data this semester, and in a recent lecture, an interesting concept came up: coarsened at random.

Suppose you have a dataset in which you know or suspect that some of your data values are rounded. For example, ages of youngsters might be given to the nearest year or half-year. Or perhaps in a survey, you’ve gotten some respondents’ incomes only within certain ranges. Then the data has been “coarsened” in the sense that you know that the true value is within a certain range, but you don’t know where within that range.

Happily, techniques have been developed to handle this sort of situation. In many ways, the game is the same as that in the missing data setting. Just as in the missing data context good things happen when the data are missing at random, so also in this context good things happened when the data are coarsened at random. Thus, to begin with, you have to consider (among other things) whether you think the probability that you will observe only a range of possible data values, as opposed to the specific true value, depends on something you don’t observe (such as that specific true value). A good place to start on all this is Heitjan & Rubin, “Inference from Coarse Data via Multiple Imputation with Application to Age Heaping,” 85 JASA 410 (1990).

One final point: you might think that coarsened at random is a specific case of missing at random. Actually, it’s the other way around. Data can be (and often is assumed to be) coarsened at random but not missing at random. Think and you’ll see why.

Posted by James Greiner at 6:00 AM

May 4, 2006

Detecting Attempted Election Theft

At the Midwest conference last week I saw Walter Mebane presenting his new paper entitled "Detecting Attempted Election Theft: Vote Counts, Voting Machines and Benford's Law." The paper is really fun to read and contains many cool ideas about how to statistically detect vote fraud in situations where only minimal information is available.

With the advent of voting machines that replace traditional paper ballots physically verifying vote counts becomes impossible. As Walter Mebane puts it: "To steal an election it is no longer necessary to toss boxes of ballots in the river, stuff the boxes with thousands of phony ballots, or hire vagrants to cast repeated illicit votes. All that may be needed nowadays is access to an input port and a few lines of computer code.?

How does Mebane utilize statistical tools to detect voting irregularities? He relies on two sets of tests:

The first test relies on Benford’s Law. The idea here is that if individual votes originate from a mix of at least two statistical distributions there may be a rationale to expect that the distribution of the digits in reported vote counts should satisfy the second digit Benford's law. Walter provides simulations showing that the Benford's Law test is sensitive to some kinds of manipulation of vote counts but not others.

The second set of tests relies on randomization. The idea is based on the assumption that in each precinct (especially crowded ones) voters may be randomly and independently assigned to each machine used in the precinct. The test involves checking whether the split of the votes is the same on all the machines used in a precinct. If some of the machines were indeed hacked, the distribution of the votes among candidates would differ on the affected machines. Mebane tests these expectations against data from three Florida counties with very interesting findings.

In general, the paper was very well received by the audience. Some attendees raised concerns about the randomization test, arguing that voters may not be randomly assigned to voting machines (for example old voters may be more likely to go to the first machine in line etc.). The discussant, Jonathan Wand, raised the idea of actually using random assignment of voters to voting machines as an administrative tool to facilitate fraud detection ex post. He also proposed to use sampling techniques to make recounts more feasible (but that would require voting machines that do leave a paper trail). Another comment alluded to the fact that if somebody smart wants to steal an election, he or she might anticipate some of Walter's tests and design manipulations so that they satisfy the test.

Overall, my impression is that although his research is admittedly still at an early stage, Mebane is onto something very cool here and I am eager to see the redrafts and more results in the future. This is a very important topic given that more and more voting machines will be used in the future. Everybody interested in the vote fraud should read this paper.

Posted by Jens Hainmueller at 6:00 AM

May 3, 2006

Sensitivity Analysis

Felix Elwert

Observational studies, however well done, remain exposed to the problem of unobserved confounding. In response, methods of formal sensitivity analysis are growing in popularity these days (see Jens's post on a related issue here.)

Rosenbaum and Rubin's basic idea is to hypothesize the existence of an unobserved covariate, U, and then to recompute point-estimates and p-values for a range of associations between this unobserved covariate and, in turn, the treatment T and the outcome Y. If moderate associations (= moderate confounding) change the inference about the effect of the treatment on the outcome we question the robustness of our conclusions.

But how to assess whether the critical association between U, T, and Y that would invalidate the standard results is large in substantive terms?

One popular strategy compares this critical association to the strength of the association between T, Y, and an important known (and observed) confounder. For example, one might say that the amount of unobserved confounding it would take to invalidate the conclusions of a study on the effect of sibship size on educational achievement would have to be at least as large as the amount of confounding generated by omitting parental education from the model.

This is indeed the strategy used in a few studies. But what if U should be taken to stand not for a single but for a whole collection of unobserved confounders? Clearly, it then is no longer credible to compare the critical association of U with the amount of confounding created by a single known covariate. Better to compare it to a larger set of observed confounders. But with larger sets of included variables, we have the problem of interactions between them, and of surpressing and amplifying relationships. In short, gauging the critical association of U with T and Y in substantive terms will become a whole lot less intuitive.

(FYI, Robins and his colleagues in epi have proposed an alternative method of sensitivity analysis, which hasn’t found followers in the social sciences yet, to my knowledge. I’m currently working on implementing their method in one of my projects.)

Posted by Felix Elwert at 6:03 AM

May 2, 2006

The 80% Rule, Part II

Jim Greiner

In my last post, I introduced the so-called 80% rule in employment discrimination cases. In this post, I discuss some of the reasons why it stinks. For the sake of illustration, pretend I’m interested in knowing whether a company discriminates against women in hiring, and recall that the 80% rule says that I should see whether the hiring rate for women is less than 80% of the hiring rate for men.

The first issue with the 80% rule is that it means different things depending on the hiring rate for men. Suppose 90% of men that apply for a job are hired. 80% of 90% is 72%, so the difference between men and women is 18%; that might seem like something worth investigating. But suppose the company at issue is very exclusive, so it only hires 5% of men who apply; 80% of 5% is 4%. Is this 1% difference something to worry about? Perhaps it is, perhaps it isn’t, but it sure is different from the 18% difference in the previous example.

A second issue with the 80% rule is that it varies depending on whether we’re talking about success rates or failure rates ("success" means getting hired here, "failure" means not getting hired). In one of my hypotheticals above, a company hired 90% of the men who applied. So the success rate is 90%, and the failure rate is 10%. If we apply the 80% rule to the success rate, we should worry if the hiring rate for women is below 72%. But what happens if we apply the reasoning of the rule to the failure rate for men? By analogy to the 80% rule’s reasoning, it seems like we should worry if the failure rate for women is greater than, say, 120% (100% + 20%), or perhaps 125% (1/.8 = 1.25), of the failure rate for men. Take the 125% for the sake of argument, and return to our hypothetical in which the failure rate for men was 10%. 125% of 10% is 12.5%, so we should worry if the failure rate for women is greater than 12.5%. But a failure rate for women of greater than 12.5% corresponds to a success rate for woment of less than 87.5%, and we just said that we’re supposed to worry if the success rate was less than 72%. So which is it, 87.5% or 72%?

A final criticism (for the purposes of this post; I could go on and on here): is any of this significant in the statistical sense? P-values, anyone? Significance tests? Posterior intervals? Anything at all?

Next time you hear someone applying the 80% rule in an employment discrimination case, invite the speaker join us on this planet.

Posted by James Greiner at 6:00 AM

May 1, 2006

Applied Statistics - Ben Hansen

This week the Applied Statistics Workshop will present a talk by Ben Hansen, Assistant Professor of Statistics at the University of Michigan. Professor Hansen received his Ph.D. from the University of California at Berkeley and was an NSF Post-doctoral Fellow before joining the faculty at Michigan in 2003. His research interests include optimal matching and stratification, causal inference in comparative studies, and length-optimal exact confidence procedures. His work has appeared in JASA and the Journal of Computational and Graphical Statistics, among others.

Professor Hansen will present a talk entitled "Matching with prognosis scores: A new method of adjustment for comparative studies." The corresponding paper is available from the course website. The presentation will be at noon on Wednesday, May 3 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. An abstract of the paper appears on the jump:

In one common route to causal inferences from observational data, the statistician builds a model to predict membership in treatment and control groups from pre-treatment variables, X, in order to obtain propensity scores, reductions f(X) of the covariate possessing certain favorable properties. The prediction of outcomes as a function of covariates, using control observations only, produces an alternate score, the prognosis score, with favorable properties of its own. As with propensity scores, stratification on the prognosis score brings to uncontrolled studies a concrete and desirable form of balance, a balance that is more familiar as an objective of experimental control. In parallel with the propensity score, prognosis scores reduce the dimension of the covariate; yet causal inferences conditional on them are as valid as are inferences conditional only on the unreduced covariate. They suggest themselves in certain studies for which propensity score adjustment is infeasible. Other settings call for a combination of prognosis and propensity scores; as compared to propensity scores alone, the pairing can be expected to reduce both the variance and bias of estimated treatment effects. Why have methodologists largely ignored the prognosis score, at a time of increasing popularity for propensity scores? The answer lies in part with older literature, in which a similar, somewhat atheoretical concept was first celebrated and then found to be flawed. Prognosis scores avoid this flaw, as emerges from theory presented herein.

Posted by Mike Kellermann at 9:43 AM

April 28, 2006

Human irrationality?

Amy Perfors

I've posted before about the "irrational" reasoning people use in some contexts, and how it might stem from applying cognitive heuristics to situations they were not evolved to cover. Lest we fall into the depths of despair about human irrationality, I thought I'd talk about another view on this issue, this time showing that people may be less irrational than the gloom-and-doom views might suggest.

In Simple heuristics that make us smart Gigerenzer et. al. argue that, contrary to popular belief, many of the cognitive heuristics people use are actually very rational given the constraints on memory and time that we have to face. One strand of their research suggests that people are far better at reasoning about probabilities when they are presented as natural frequencies rather than numbers (as most studies do). Thus, for instance, if people see pictures of, say, 100 cars, 90 of which are blue, they are more likely not to "forget" this base rate than if they are just told that 90% of cars are blue.

A recent paper in the journal Cognition (vol 98, 287-308) expands on this theme. Zhu & Gigerenzer found that children steadily gain in the ability to reason about probabilities, as long as the information is presented using natural frequencies. Children were told a story such as the following:

Pingping goes to a small village to ask for directions. In this village, the probability that the person he meets will lie is 10%. If a person lies, the probability that he/she has a red nose is 80%. If a person doesn't like, the probability that he/she also has a red nose is 10%. Imagine that Pingping meets someone in the village with a red nose. What is the probability that the person will lie?

Another version of the story gave natural frequencies instead of conditional probabilities, for instance "of the 10 people who lie, 8 have a red nose." None of the fourth-grade through sixth-grade children could answer the conditional probability question correctly, but sixth graders approached the performance of adult controls for the equivalent natural frequency question: 53% of them matched the correct Bayesian posterior probability. The fact that none of the kids could handle the probability question is not surprising -- they had not yet been taught the mathematical concepts of probability and percentage. What is interesting is that, even without being taught, they were capable of reasoning "the Bayesian way" about as well as adults do.

The most interesting part of this research, for me, is less about the question of whether people "are Bayesian" (whatever that means), but rather that it highlights a very important message: representation matters. When information is presented using a representation that is natural, we find it a lot easier to reason about it correctly. I wonder how many of our apparent limitations reveal less about problems with our reasoning, and more about the choice or representation or the nature of the task.

Posted by Amy Perfors at 6:00 AM

April 27, 2006

999

Felix Elwert

Why did people code their missing values as real numbers such as 999 in the old days? Why not “." from the get go? And why do many big, federally funded surveys insist on numerical missing values to this day?

Don’t we all have stories about how funny missing value codes (“-8") got people in trouble (think The Bell Curve)? Are there any anecdotes where people got in trouble for mistaking “." for a legitimate observation?

Posted by Felix Elwert at 6:00 AM

April 26, 2006

Inauthentic Paper Detector

Sebastian Bauhoff

A group at the Indiana School of Informatics has developed a software to detect whether a document is "human written and authentic or not." The idea was inspired by the successful attempt of MIT students in 2004 to place a computer-generated document at a conference (see here). Their program collated random fragments of computer science speak into a short paper that was accepted at a major conference without revision. (That program is online and you can generate your own paper, though unfortunately it only writes computer science articles).

The new tool lets users paste pieces of text and then assesses whether the content is likely to be authentic or just gibberish. The program tries to identify human-style writing that is characterized by certain repition patterns and apparently does rather well. It is not clear whether this works well for social science type articles. The first paragraphs of a recent health economics article (to remain unnamed) only have a 35.5% chance of being authentic. Hmm...

So is this just a joke or useful programming? The authors say it could be used to differentiate whether a website is authentic or bogus, or to identify different types of texts (articles vs blogs, for example). I wonder what the algorithms behind such technology are, and whether this will lead to an arms race between fakers and detectors? If one of them can recognize a human-written text could this be used by the faking software?

If further tweaked, could this have an application in the social sciences? Maybe we could use the faking software to search existing papers, collate them smartly and use that to identify patterns and get new ideas? Maybe everyone should run their papers through a detector software before submitting it to a journal or presenting at a workshop? And students watch out! No more random collating at 3am to meet the next day deadline!

PS: this blog entry has been classified as "inauthentic with a 26.3% chance of being an authentic text"...

Posted by Sebastian Bauhoff at 2:41 PM

Data from China: Land of Plenty? (II)

Sebastian Bauhoff

In the last entry I wrote that China is the new exciting trend for researchers interested in development issues. There are now a number of surveys available, and it is getting easier to obtain data. (For a short list, see here.) However there are two key issues that are still pervasive: language difficulties and little sharing of experiences.

While some Chinese surveys are available in English translation, it is still difficult to fully understand their context. China is a very interesting yet peculiar place. It clearly helps to work with someone who speaks (and reads!) the language, though you might still miss some unexpected information -- and there are many things that can be surprising.

More annoying however is the lack of sharing of information and data. This problem has two associated parts. For the existing data, people seem to struggle with similar problems but don't provide their solutions to others. In the case of the China Health and Nutrition Survey for example, numerous papers have been written on different aspects and the key variables are being cleaned over and over. Apart from the time that goes into that, this can lead to different results.

Another lack of sharing is with regards to existing data or ongoing surveys. There are now a lot of people either who either have or are currently collecting data in China. But it is rather difficult even