| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 |
8 June 2009
The idea of the Hawthorne effect is that individuals may change their behavior because they are being studied, in addition to any real effects of the intervention. Steven Levitt and John List have revisited the illumination experiments at the Hawthorne plant that gave name to the effect, and argue that many of the original conclusions do not hold up to scrutiny. There's an Economist article on the paper here but its subtitle "Being watched may not affect behavior, after all" is misleading: even if the earlier research was sloppy by today's standards the contribution was to point out the possibility of these effects. A better subtitle could have commended replication as important scientific method.
Levitt and List (2009) "Was there Really a Hawthorne Effect at the Hawthorne Plant? An Analysis of the Original Illumination Experiments" NBER Working Paper #15016, http://www.nber.org/papers/w15016.pdf
The Economist (June 4, 2009) "Light work: Questioning the Hawthorne effect", http://www.economist.com/finance/displayStory.cfm?story_id=13788427
Posted by Sebastian Bauhoff at 8:53 AM | Comments (2)
23 May 2009
How will you expect swine flu cases to be distributed by weekday? More specifically, will you expect more cases distributed in weekdays or in weekends? My first reaction is that there will be more cases if there are more social gatherings.
Following this logic, the reasons for supporting more cases in weekdays may include that susceptible population have more contacts with infected population in weekdays, either through school or through work, etc. In addition, as people are more likely to travel in weekends, it means that they will have more contacts with infected subjects during their traveling, but because it takes around two days for the virus to have impacts, the cases will not be identified until a couple of days later. Could this also be due to the fact that there are less clinical services provided in weekends and that people are less likely to visit clinics in weekends?
Here is an old graph I made according to the swine flu updates (4/26/2009 - 05/21/2009) published on WHO's website. To be more accurate, I drew a new graph using the number of confirmed new cases rather than the cumulative number of confirmed cases.
As the reporting times for confirmed new cases vary, some at 18:00 while others at 6:00, etc., I kept only records between 05/01 and 05/21 whose reporting time is at 6:00 and redrew the graph. Weekdays are redefined as well. For example, Thursday 6:00 to Friday 6:00 is defined as Thursday. Could you still see any salient patterns, like the differential between weekdays and weekends? Why Friday is so spiky this time?

Posted by Weihua An at 12:38 AM | Comments (7)
20 May 2009
A few weeks ago my friend Aaron Swartz wrote a blog post called Transparency is Bunk, arguing that government transparency websites don't do what they're supposed to do, and in fact have perversely negative effects: they bury the real story in oceans of pro forma data, encourage apathy by revealing "the mindnumbing universality of waste and corruption," and lull activists into a false sense of accomplishment when occasional successes occur. It's a particularly powerful piece because Aaron uses the platform to announce he's done working on his own government website (watchdog.net). The piece appears to have caused a stir in government transparency/hacktivist circles, where Aaron is pretty well known.
On looking back at it I think Aaron's argument (or rant, more accurately) against the transparency websites is not very strong: indeed, data overload, apathy, and complacency are all dangers these efforts face, but that shouldn't have come as a surprise.
I had two other responses particular to my perch in academia. First, there is some good academic research showing that transparency works, although the evidence on the effectiveness of grassroots watchdogging is less strong than the evidence on auditing from e.g. Ferraz and Finan on Brazilian municipalities (QJE 2008, working paper version) or Olken's field experiment in Indonesia (JPE 2008, working paper version).
Second, my own work and that of other academics benefits greatly from these websites. I have a project right now on the investments of members of Congress (joint with Jens Hainmueller) that is possible only because of websites like the ones Aaron criticizes. I think this paper is going to be useful in helping watchdogs understand how Congress invests and whether additional regulation is a good idea, and it would be a shame if the funders of these sites listened to Aaron and shut them down.
I do agree with Aaron that professional analysis may be better than grassroots citizen activism in achieving the goals of the transparency movement. Sticking with the example of the congressional stock trading data I'm using, I suspect that not much useful watchdogging came out of the web interface that OpenSecrets provides for the investments data. While it may be interesting to know that Nancy Pelosi owns stock in company X, it's hard to get any sense of patterns of ownership across members and how these investments relate to political relationships between members and companies. This is what our paper tries to do. It takes a ton of work, far more than an investigative journalist is going to put in. We do it because of the rewards of publishing interesting and original and careful research, and also because these transparency websites have made it much more manageable: OpenSecrets.org converted the scanned disclosure forms into a database and provided lobbying data, and GovTrack provided committee and bill info, as well as an API linking company addresses to congressional districts. Most of the excitement around these websites seems to center on grassroots citizen activism, but their value to academic research (and the value of academic research to government accountability) should not be overlooked.
Posted by Andy Eggers at 10:53 PM | Comments (1)
15 May 2009
In late April, the FAA released the long-awaited bird strike data. It shows every recorded bird strike since 2000.
Since then, we've had a whole host of stories bemoaning the doubling in bird strikes since 2000, complete with worrisome bar graphs and explanations from experts.
Continue reading "The Most Bird-prone: Frontier, United, Hawaiian"
Posted by Kevin Bartz at 3:06 PM | Comments (8)
13 May 2009
The social sciences have long embraced the idea of text-as-data, but in recent years, increasing numbers of quantitative researchers are investigating how to have computers find answers to questions in texts. This task might appear easy on the outset (as it apparently did to early researchers in machine translation), but, as we know, natural languages are incredibly complicated. In most of the applications in social science, analysts end up making a "bag of words" assumptions--the relevant part of a document are the actual words, not their order (this is not a unreasonable assumptions, especially given the questions being asked).
When I see applications of natural language processing (NLP) in the social sciences, I typically think very quickly to its future. Computers are making strides at being able to understand, in some sense, what they are reading. Two recent articles , however, give a good overview of the challenges that NLP faces. First, John Seabrook of the New Yorker had an article last summer, Hello, Hal, which states the problem clearly:
The first attempts at speech recognition were made in the nineteen-fifties and sixties, when the A.I. pioneers tried to simulate the way the human mind apprehends language. But where do you start? Even a simple concept like "yes" might be expressed in dozens of different ways--including "yes," "ya," "yup," "yeah," "yeayuh," "yeppers," "yessirree," "aye, aye," "mmmhmm," "uh-huh," "sure," "totally," "certainly," "indeed," "affirmative," "fine," "definitely," "you bet," "you betcha," "no problemo," and "okeydoke"--and what's the rule in that?
The article is mostly about speech recognition, but it definitely hits the main points about why human-generated language is so hard tricky. The second article, in the New York Times recently, is a short story about Watson, the computer that IBM is creating to compete on Jeopardy! IBM is trying to push the field of Question Answering quite a bit forward with this challenge. This goal is to create a computer that you can ask a natural language question to and get the correct answer. A quick story in the article indicates that they may a bit to go:
In a demonstration match here at the I.B.M. laboratory against two researchers recently, Watson appeared to be both aggressive and competent, but also made the occasional puzzling blunder.For example, given the statement, "Bordered by Syria and Israel, this small country is only 135 miles long and 35 miles wide," Watson beat its human competitors by quickly answering, "What is Lebanon?"
Moments later, however, the program stumbled when it decided it had high confidence that a "sheet" was a fruit.
This whole Watson enterprise makes me wonder if there are applications for this kind of technology within the social sciences. Would this only be useful as a research aid, or are there empirical discoveries to be made with this? I suppose it comes down to this: if a computer could answer your question, what would you ask?
Posted by Matt Blackwell at 9:43 AM | Comments (3)
10 May 2009
David Brooks wrote a column a few days ago about Will Dobbie and Roland Fryer's working paper on the Harlem Children's Zone charter schools, which the authors report dramatically improved students' performance, particularly in math. Looking at the paper, I think it's a nice example of constructing multiple comparisons to assess the effect of a program and to do some disentangling of mechanisms.
The program they study is enrollment in one of the Promise Academy elementary and middle schools in Harlem Children's Zone, a set of schools that offer extended class days, provide incentives for teacher and student performance, and emphasize a "culture of achievement." The authors assess the schools' effect on student test scores by comparing the performance of students at the schools with that of other students. The bulk of the paper is concerned with how to define this group of comparable non-students, and the authors pursue two strategies:
The estimated effect is very large, particularly for math. Because the estimates are based on comparisons both within the HCZ and between HCZ and non-HCZ students, the authors can speculate somewhat about the relative importance of the schooling itself vs other aspects of the HCZ: they tentatively suggest that the community aspects must not drive the results, because non-HCZ students did just as well.
Overall I thought it was a nice example of careful comparisons in a non-experimental situation providing useful knowledge. I don't really know this literature, but it seems like a case where good work could have a big impact.
Posted by Andy Eggers at 9:40 AM | Comments (0)
28 April 2009
Please join us for our final meeting tomorrow when Thomas Yee, Department of Statistics, University of Auckland will present ``Vector generalized linear and additive models". Thomas provided the following abstract for his talk:
The class of vector generalized linear and additive models (VGLMs/VGAMs) is very large and contains many statistical models relevant to quantitative social science, e.g., univariate and multivariate distributions, categorical data analysis, time series, survival analysis, extreme value analysis, mixture models, correlated binary data, and nonlinear regression. I'll first give an overview of the framework and tie it in with practice using my VGAM package for R. Then we will focus on two sub-topics: reduced-rank VGLMs and quantile/expectile regression. The former handles the reduced-rank multinomial logit model (aka stereotype model) and Goodman's row-column association model; applications of the latter are becoming popular in many fields. Time allowing, I'll describe several sub-projects I'm currently working on since arriving at IQSS.The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 1215 and we usually wrap up around 130 pm.
Posted by Justin Grimmer at 3:12 PM
24 April 2009
Today's New York Times contains an article reporting that the United States is "losing" the war on cancer. This, of course, made me think of a comic from PhD Comics yesterday.
More seriously, it also brought to mind a paper that Bo Honore and Adriana Lleras-Muney wrote several years ago exploring the war on cancer and how we measure success. The challenge in deciding if we are winning is that everyone must die of something, thus when the Times reports large declines in cardiovascular mortality, it follows that some other cause of death must be increasing to (partially) compensate for the decrease in cardiovascular mortality. What Honore and Lleras-Muney do is that they consider the challenges in estimating competing risks models when the causes of death are not independent. In simple mortality models they find that there has been no improvement in cancer mortality from the war on cancer, but the assumptions that are needed there is that individuals who die from non-cancer causes of death are censored in their analysis and that these survival times are independent.
Their more sophisticated analysis recognizes that there are many risk factors for cancer mortality that are also risk factors for other causes of death, so the assumption that the mortality risks are independent is clearly violated. They then present two alternatives to generate more plausible estimates of the effect of the war on cancer on cancer mortality. The first method is to simply look at upper and lower bounds on survival (Manski bounds) and the second method entails making some assumptions about how the distributions of survival times for different causes of death are related. The bounding method leads to quite wide bounds and they state "that it is not possible to make any statement about whether survival from cancer increased or decreased during this period [1970-2000]."
By assuming that the marginal survival distributions follow a specific functional form, they are able to tighten the bounds considerably to draw some conclusions on the efficacy of the war on cancer. Assuming independence, they find that there has been a small improvement in cancer mortality over the period 1970 to 2000. Assuming some dependence between cardiovascular and cancer mortality, however, provides evidence that the war on cancer had a very large effect on cancer mortality of between 10 and 20%, depending on race and gender. Thus there is reasonable evidence that the war on cancer has not been a failure, but perhaps not a stunning success either. The lesson for social scientists is that every assumption matters, relaxing independence between cardiovascular and cancer mortality dramatically increased the effect of the war on cancer and may even overturn the conclusion in the New York Times article.
Posted by Martin Andersen at 10:34 AM
13 April 2009
Here's a paper for the "high internal, low external validity" file (via Kevin Lewis):
Interracial Workplace Cooperation: Evidence from the NBAJoseph Price, Lars Lefgren & Henry Tappen
NBER Working Paper, February 2009Abstract:
Using data from the National Basketball Association (NBA), we examine
whether patterns of workplace cooperation occur disproportionately
among workers of the same race. We find that, holding constant the
composition of teammates on the floor, basketball players are no more
likely to complete an assist to a player of the same race than a
player of a different race. Our confidence interval allows us to
reject even small amounts of same-race bias in passing patterns. Our
findings suggest that high levels of interracial cooperation can occur
in a setting where workers are operating in a highly visible setting
with strong incentives to behave efficiently.
Posted by Andy Eggers at 6:51 PM
12 April 2009
Please join us this Wednesday for the applied statistics workshop when Alberto Abadie, Professor of Public Policy, will present ``A General Theory of Matching Estimation", joint work with Guido Imbens. Alberto provided the following abstract for his talk:
Matching methods provide simple and intuitive tools for adjusting the distribution of covariates among samples from different populations. Probably because of their transparency and intuitive appeal, matching methods are widely used in evaluation research to estimate treatment effects when all treatment confounders are observed (Rubin, 1973, 1977; Rosenbaum, 2002). In spite of their popularity, the problem of establishing the large sample distribution of matching estimators remains largely unsolved, with the exception of some special cases (see Abadie and Imbens, 2006). The reason is that matching estimators are non-smooth functionals of the data, which makes their large sample theory particularly challenging. This talk will describe a new general method to establish the large sample distribution of matching estimators. As an example of the applicability of the method, we will describe how to derive the distribution of matching estimators when matching is carried out without replacement, a result previously unavailable in the literature. We will also discuss how to adjust the standard errors for propensity score matching estimators to take into account first step estimation of the propensity score, a result also previously unavailable.
The Applied Statistics Workshop meets each Wednesday at 12 noon in K-354 CGIS-Knafel (1737 Cambridge St). The workshop begins with a light lunch and presentations usually start around 1215 and last until about 130 pm.
Posted by Justin Grimmer at 7:41 PM