| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 |
February 22, 2007
Let me follow up on yesterday’s post by Jim Greiner.
Jim’s problem: He’s touring the country touting tools for increased honesty in applied statistical research, only to be asked, effectively, for recommendations about using these tools to cheat more effectively. Yay academic job market.
Jim’s example goes like this: An analyst is asked to model the effect of a treatment, T, on the outcome, Y, while controlling for a bunch of confounders, X. To minimize the potential for data dredging we give the analyst only the treatment and the observed potential confounders to model the treatment assignment process, but we withhold the outcome data. Only after the analyst announces success in balancing the data (by including X, functions of X,f(X), deleting off-support observations etc), would we communicate the outcome data, plug the outcome in the equation, run it once, and be done.
So how can we help Jim help his audience cheat? Let’s make two assumptions (which I’d be willing to defend with my life). First, although the analyst is not given the actual outcome data, the analyst does know what the outcome is (wages, say). Second, the analyst is permitted to drop elements of X from the analysis, based on his or her analytic judgment.
Now let’s cheat. First, select the covariate, C, from the pool of potential confounders, X, believed to correlate most strongly with the outcome, Y. Second, treat C as the outcome and build a model through data dredging to maximize (or minimize, if this is your objective) the “effect” of T on C. Specifically, find the subset of functions of X, S(f(X)), that maximizes the effect of T on C while maintaining balance in S(f(X)). Third, upon receiving the outcome data, just plug them into the model but “forget” to mention that you didn’t include C in the treatment assignment model. If C really correlates strongly with Y then this procedure should lead to an upwardly biased estimate of T on Y.
I fear that this would work well in practice (though one could construct a counterexample). Seems to me, however, that it would be more technically demanding to cheat in this way than to cheat in, say, standard regression analysis.
Posted by Felix Elwert at 6:42 PM
January 25, 2007
Dr. King, Esteemed Faculty, Members of the Advisory Board, My Fellow Stats Brats:
The rite of custom brings us together at a defining hour when decisions are hard and courage is needed. We enter the year 2007 with large endeavors under way and others that are ours to begin. In all of this, much is asked of us. We must have the will to face difficult challenges and determined reviewers, and the wisdom to face them together.
We’re not the first to come here with allegiances divided between structural equation modeling and proper counterfactual reasoning and Bayesian uncertainty in the air. Like many before us, we can work through our difference in difference equations, and we can achieve big things for the scientific community. Our readers don’t much care which department we sit in as long as we are willing to walk across campus when there is work to be done. Our job is to make research better for our readers, and to help them to build CVs of hope and opportunity, and this is the business before us tonight.
A future of hope and opportunity begins with a growing software library, and that is what we have. We’re now in the 19th month of uninterrupted dissertation research by many proud graduate students at IQSS, an effort that has created 1,947 pages of prose and equations, so far. Unemployment is low, ignorance is low, wages are rising. These dissertations are on the move, and our job is to keep it that way, not with more error term jabber but with more attention to potential outcomes and causality.
Next week, I’ll deliver a bound report on the state of my dissertation to the Registrar. Tonight, I want to discuss one statistical reform that deserves to be a priority for this Institute.
In particular, there’s the matter of appealing to causality fraudulently. These appeals are often slipped into manuscripts at the last hour when not even copy editors are watching. In 2005 alone, the number of appeals to causality across journals grew to over 13,000. Reviewers did not vet them. Don did not sign off on them. Yet they are treated as if they have the blessing of Don. The time has come to end this practice. So let us work together to reform the review process — expose every slippage to the light of day and cut the number of unfounded appeals to causality at least in half by the end of this session.
This is a decent and honorable Institute, and resilient too. We’ve been through a lot together. We’ve met challenges and faced dangers, and we know that more lie ahead. Yet we can go forward with confidence, because the stats of our union is strong, our cause in the world is right, and tonight that cause goes on.
God bless.
Posted by Felix Elwert at 5:15 AM
November 3, 2006
Felix Elwert
In September, The Institute of Medicine released its report on “The Future of Drug Safety,” featuring some goodies on the dissemination of research findings.
One of the recommendations echoes one of the favorite hallway complaints at IQSS: that journals are perennially hung up on publishing *** alpha less than 0.05 yay-yay statistically significant results.
Says the Washington Post:
“[According to the report] manufacturers should also be required to register all clinical trials they sponsor in a government-run database to allow patients and physicians to see the outcome of all studies, not just those published in medical journals, the report said. Studies that show positive results for a drug are more likely to be published by journals than negative ones.”
Welcome to the world of publication bias. (The report is yours for a highly significant $44.)
Posted by Felix Elwert at 11:59 AM
October 4, 2006
Felix Elwert
Two recent post by Jim and Jens ponder the holy grail of manipulability via the exchange between Holland and Heckman. Can non-manipulable things like gender or race cause things in the potential outcomes framework?
Holland (1986) says no because it’s hard to conceive of changing the unchangeable. Fair enough. But this argument has been carried too far in some quarters and not far enough in others. Here’s why:
Invoking Holland, some population scientists now go so far to claim that we can’t conceive of things like marriage or divorce as causes because the decision to marry or divorce is beyond the direct control of an experimenter. Please. At most we need some exogeneity, a little speck of indifference, a tipping point to make them amenable to coherent causal thinking (and estimation). Heckman goes even farther than this, and he is right: the issue is not whether I, personally, can wreck all marriages in my study, but whether we can coherently conceive of a counterfactual world where things are different as a matter of theoretical speculation ("mental act"). In this, however, even Heckman seems to yield: A minimum requirement for thinking about counterfactual worlds would appear to be the possibility of conceiving of these worlds in a coherent fashion. And this, I believe is the underlying unease of the statisticians whom Heckman criticizes: whether one can even coherently imagine counterfactual worlds in which gender is changed.
On the other hand, social scientists love to talk about the effects of gender and race, which – pace Michael Jackson and Deidre McCloskey – are really hard to think of as manipulable, ceteris paribus. What Holland’s dictum contributes in this respect is the entirely appropriate call for getting the question straight.* For what most of these studies look for is evidence of discrimination. Thinking about discrimination within the potential outcomes framework makes it clear that the issue really isn’t whether we can manipulate the race or gender of a specific person, but rather whether we can manipulate the perception of the person’s race or gender in the eyes of the discriminator. Cases in point: Goldin and Rouse’s study on discrimination in symphony orchestras, where the gender of applicants was obscured (i.e. perceptions manipulated) by staging auditions behind an opaque gauze barrier. Similarly, Grogger and Ridgeway’s paper in the latest issue of JASA uses natural variation in the perceptibility of driver’s skin color (dusk, the veil of darkness) to test for racial profiling in traffic controls. In either case, the causal question was not, what would happen if we changed the musician/driver from female/black to male/white, but, What would happen if we could change knowledge/perception of race and gender.
In other words, there are important causal questions to be asked about race and gender, but these questions don’t necessarily require the manipulability of race and gender. Not even within the potential outcomes framework of causality.
* My pet peeve: Much of social science is so busy providing answers that it forgets to ask well-formulated questions.
Posted by Felix Elwert at 11:00 PM
May 3, 2006
Felix Elwert
Observational studies, however well done, remain exposed to the problem of unobserved confounding. In response, methods of formal sensitivity analysis are growing in popularity these days (see Jens's post on a related issue here.)
Rosenbaum and Rubin's basic idea is to hypothesize the existence of an unobserved covariate, U, and then to recompute point-estimates and p-values for a range of associations between this unobserved covariate and, in turn, the treatment T and the outcome Y. If moderate associations (= moderate confounding) change the inference about the effect of the treatment on the outcome we question the robustness of our conclusions.
But how to assess whether the critical association between U, T, and Y that would invalidate the standard results is large in substantive terms?
One popular strategy compares this critical association to the strength of the association between T, Y, and an important known (and observed) confounder. For example, one might say that the amount of unobserved confounding it would take to invalidate the conclusions of a study on the effect of sibship size on educational achievement would have to be at least as large as the amount of confounding generated by omitting parental education from the model.
This is indeed the strategy used in a few studies. But what if U should be taken to stand not for a single but for a whole collection of unobserved confounders? Clearly, it then is no longer credible to compare the critical association of U with the amount of confounding created by a single known covariate. Better to compare it to a larger set of observed confounders. But with larger sets of included variables, we have the problem of interactions between them, and of surpressing and amplifying relationships. In short, gauging the critical association of U with T and Y in substantive terms will become a whole lot less intuitive.
(FYI, Robins and his colleagues in epi have proposed an alternative method of sensitivity analysis, which hasn’t found followers in the social sciences yet, to my knowledge. I’m currently working on implementing their method in one of my projects.)
Posted by Felix Elwert at 6:03 AM
April 27, 2006
Felix Elwert
Why did people code their missing values as real numbers such as 999 in the old days? Why not “." from the get go? And why do many big, federally funded surveys insist on numerical missing values to this day?
Don’t we all have stories about how funny missing value codes (“-8") got people in trouble (think The Bell Curve)? Are there any anecdotes where people got in trouble for mistaking “." for a legitimate observation?
Posted by Felix Elwert at 6:00 AM
April 11, 2006
Felix Elwert
Race is a surprisingly malleable construct, though it’s usually taken as fixed in statistical models. In a recent paper with Nicholas Christakis (Widowhood and Race, American Sociological Review Vol 71(1), 2006) I had to engage changing racial responses head on.
Assorted previous research has shown that people may change their racial self-description over time because they are multiracial, when they marry somebody of a different racial group, or – not to be neglected – because the answer choices in surveys may change over time.
Most people think that unstable or changing racial self-identification is an issue largely confined to a small group of multiracial individuals. This is a country, after all, of the one-drop rule. But research, including our own, shows that that isn’t so.
In a supplementary analysis of the 2001 Census Quality Survey (CQI), we showed that the racial self-identification of “whites" is also surprisingly unstable. The CQI asks more than 50,000 respondents twice within the span of just a few months to identify their own race. Once they were allowed to select only one race, and the other time they were given the option of selecting multiple races (this gets at the difference between the old and the new Census race questions). The answers were then matched to individual responses from the official 2000 Census.
Depending on whether we compared between consecutive responses to the same race question on the Census and the CQS, or between the different questions asked in the two waves of the CQS, and whether we treat “Hispanic" as a category distinct from black and white, the agreement between answers for whites ranged from 95.6 to 97.5. We obtained really similar answers for blacks.
Meaning, between 2 and 5 percent of people who used to identify as white, would call themselves either something else or a mixture of races when given the chance. And the percentage of “whites" who will change their racial self-description as a function of question wording is about the same as the percentage of “blacks" who will do likewise.
Posted by Felix Elwert at 6:00 AM
April 5, 2006
Felix Elwert
In a previous post, Mike quoted Alan Greenspan, "I suspect greater payoffs will come from more data than from more technique." Not an uncommon opinion. But there are more and less flattering ways of reading such statements.
For what’s behind the sentiment, I sometimes suspect (I’m not picking fights with the Maestro), is not just the desire for better data but a distrust of advanced statistical methods. There’s this perception that more complicated math necessitates more assumptions, ergo less robust results. By this logic, the simpler the method, the more credible the conclusion. Crosstabs rule, ANOVA passes muster. The truth, of course, is the opposite: simple stats in observational data analysis usually require more assumptions. As we move from crosstabs to OLS to GEE for a given analytical goal we are usually trying to relax assumptions. Tragically, the presence of said assumptions often becomes obvious only after the author points them out. And then it’s open season on the messenger.
I witnessed this sort of thinking recently when I reviewed a paper for a leading sociological journal. The author pointed out some serious methodological flaws in one strand of comparative welfare state research, then proposed an alternative to one well regarded analysis by relaxing some offending assumptions. Boom, did he get slammed by one reviewer for allegedly making the very assumptions he had exposed in the first place. The paper was rejected in the first round. (This is sort of a pet peeve of mine, and I might vent again.)
Posted by Felix Elwert at 6:00 AM
March 13, 2006
Felix Elwert
A central assumption in competing risk analysis is the conditional independence of the risks under analysis. Suppose we are interested in cause-specific mortality due to causes A, B, and C. If we assume that the process leading to death from A is independent (conditional on covariates) from the process leading to death from B, then the likelihood factors nicely, and estimation via a series of standard 0/1 hazard models is straightforward. For example, it may be reasonable to assume that death from lung cancer (cause A) is independent of death from being struck by a meteorite (cause B). But it is much less reasonable to assume that death from lung cancer (A) is independent of the risk of dying from emphysema (C), unless we are lucky enough to have, say, appropriate covariate information on smoking history.
The problem is partly rhetorical. The independence assumption in competing risk analysis is the exact same as the assumption of independent censoring in standard hazard models. Few applied papers even mention the latter (unfortunately). In competing risk analysis, however, the assumption becomes quite a bit more visible, and thus harder to hide…
There are a small number of strategies, none particularly popular, to cope with dependence. Sanford C Gordon recently contributed a new strategy in “Stochastic Dependence on Competing Risks? AJPS 46(1), 2002, which builds on an earlier idea of drawing random effects. Rather than drawing individual specific random effects, as has been suggested before by Clayton 1978, Gordon draws risk and individual specific random effects. Thus, a K-risk model on a sample of N individuals may contain up to KxN separate random effects, one for each risk and individual.
The advantage of this strategy is that it allows for the estimation of the direction of dependence (previous work had to assume a specific direction). The disadvantage is that estimation via conditional logit models is very expensive, to the order of several days for moderate size samples of a few thousand cases.
Posted by Felix Elwert at 6:00 AM
March 6, 2006
Felix Elwert
One goal of school desegregation is to promote racial understanding by fostering interracial contact. In an article in the American Journal of Sociology (1998, Vol. 103[5]), Scott Feld and William Carter develop a simple combinatorial argument about a surprising potential consequence of school desegregation.
They argue that under certain (not so outlandish) circumstances, school desegregation may actually decrease rather than increase opportunities for interracial contact.
Here is their argument by way of a stylized example. Suppose there are four schools, one with capacity C1=400, and three schools with capacities C2=C3=C4=200 students. Under segregation, all 100 black students in the district attend the big school. The 900 other students are white. Assuming that students only interact with students in their own school, there are thus 300*100=30,000 possible interracial, intra-school ties. Now desegregate such that the percentage of black students is the same in all four schools. Then there are 360*40 potential interracial, intra-school friendships in the big school, and 180*20 potential interracial, intra-school friendships in each of the three small schools. Hence, the total number of potential interracial friendships post-desegregation is 25,200, as compared to 30,000 pre-desegregation.
Whether this decrease in potential ties will actually result in a decrease in realized ties is an empirical question, dependent on factors spelled out in the article. Feld and Carter go on to show that this particularly instance is an example of the so-called Class Size Paradox, known from various applications in sociology.
Posted by Felix Elwert at 6:00 AM
February 24, 2006
Felix Elwert
As I remarked in an earlier entry, some researchers are troubled by the potential outcomes framework of causality because it makes explicit reference to unobservable quantities. The implication, of course, is that science should stick to what’s observable.
This position strikes me as needlessly restrictive. In any case, unobservble quantities are by no means exclusive to the potential outcomes framework of causal inference.
I hasten to add, of course, that I’m a stranger to the philosophical discourse on the issue. Interestingly, A.P Dawid has advanced the argument that many results from the potential outcomes framework of causality can be obtained without reference to unobservable quantities by sticking to conditional probabilities. Doing that, however, the math gets quite bit uglier than in the standard potential-outcomes way of presenting these results. Not coincidentally, I suppose, this is why some statisticians like Jamie Robins stress the pedagogic and heuristic value of thinking in potential outcomes, which appears to be uncontested even among those with philosophical objections to causal inference.
Heuristics aside, I’m a bit at a loss over the steadfast opposition to dealing with unobservable quantities in certain quarters. Didn’t we ditch the insistence on (and belief in) direct observation with the Wiener Kreis? And don’t references to unobservable quantities suffuse the way we think? Take, for example, the irrealis, or hypothetical subjunctive mood in English (If my wife were queen of Thebes…). Or, even more glaringly, the Konjunktiv II mood in German. Is the notion of potential outcomes really such a stretch?
Interestingly, unobservable quantities also pop up in other areas of statistics, not just in causal inference. Competing risk analysis, a branch of survival analysis, has been dealing in unobservables more or less since its inception in the 1960s. Within the first two or three pages of any treatment of competing risk analysis, the authors will discuss the interpretation of risk specific failure times, hazards, and survival functions. The most popular interpretation of risk specific survival times is “the time a case would fail due to this risk, if it hadn’t failed due to some other risk before.? An unobservable eventuality if I’ve ever seen one.
This is not to say that everybody is happy with this interpretation. Kalbfleisch and Prentice (2002), for example, in what’s easily the most authoritative text on survival analysis, ban this interpretation to a supplementary section because they want to “consider primarily statistical models for observable quantities only and avoid reference to hypothetical and unobserved times to failure? (p.249). Too bad. But even they seem to consider the interpretation a helpful heuristic.
Posted by Felix Elwert at 6:00 AM
February 16, 2006
Felix Elwert
Agreement with the Potential Outcomes Framework of Causality (counterfactual approach, Rubin model) is spreading like wildfire, but is still far from unanimous. Over the past few years I’ve had several conversations with friends in sociology, economics, statistics, and epidemiology who expressed considerable unease with the notion of potential outcomes, or even causality itself.
Two problems keep coming up.
The first is more of a public relations issue than an intellectual problem: Counterfactualists – I at any rate – apparently come on a bit strong at times. I’ve heard the term “counterfascism? (and left the room). I am told that this has to do with offering a simple operational definition for a notion – causality – that has defied a concise discourse for a few centuries too many. How can humble statistics propose a cure where respectable philosophy rails in confusion?
The second, more serious, issue relates to how far we want to go in dealing with the unobservable. The potential outcomes framework clearly and avowedly locates causal effects in the difference between potential outcomes, at least one of which remains unobservable (the “counterfactual' outcome). Direct observation of causal effects thus is impossible, although estimation is possible under certain well-defined circumstances. The exchange between A.P. Dawid (“Causal Inference without Counterfactuals?), Don Rubin, Jamie Robins, Judea Pearl, and others in JASA 1999 considers the problem at its most sophisticated. My conversations, shall we say, rarely reach such heights. But it’s eminently clear that many researchers are troubled to various degrees by admitting unobservable quantities into “science.? Positions here range from moderate empiricism to Vienna style positivism: “you either observe directly or you lie."
I’m in no place to offer solutions. But I do offer this complaint whenever the two issues are combined into a single charge--that counterfactualist potential outcomers are arrogant because they fancy themselves scientists when they deal in unobservable quantities. I’d say that the opposite is true: the potential outcomes framework of causality offers a cutting lesson in humility because it demonstrates the necessity of relying on unobservable (but not necessarily unestimable) quantities, not to mention strong prior theory, for a great many tasks dear to the scientific enterprise.
Posted by Felix Elwert at 6:00 AM
February 10, 2006
Felix Elwert
Though it hardly comports with my own views, there are plenty of people in the social sciences and economics that are troubled by the potential outcomes framework of causality. What intrigues me about this opposition is that most of those who object to the notion of causality appear comfortable with talk about regression “effects.?
If you object to talk about causality, what do you mean by “effect??
By way of preemptive self-defense, this question isn’t about my inability to understand that regression coefficients provide a neat summary of the sample data in a purely descriptive sense (I do get that). But if the goal is getting descriptives, why call regression coefficients “effects?? Doesn’t “effect? imply agency? Sure, the predicted Y might increase by b units if we change X by one unit (agency! ha!) but then that’s really the analyst’s doing (we shift X by one unit) - and didn’t we want the analysis to speak to what’s happening in the world outside of that scatter plot print out?
Here’s the task: Can anybody provide an interpretation of the word “effect? that (a) doesn’t just refer to what the analyst can do with that scatter plot on the desk, and that (b) does not take recourse to a manipulability (counterfactualist or potential outcomes) account of causality?
What’s your preferred non-causal explanation for why one might call regression coefficients “effects??
Posted by Felix Elwert at 6:00 AM
October 21, 2005
Felix Elwert
Let’s salute the New York Time’s for its near perfect polling documentation. In a recent edition of the Sunday Magazine, the Times includes a two-page spread on a phone survey on New York City politics. Though the survey touches on some life-and-death issues (“Would you ever date a Republican??), it’s really more for laughs than higher learning. Regardless, the Times goes to great length to describe its methodology:
“Methodology: This telephone poll of a random sample of 1,011 adults in New York City was conducted for the New York Times Magazine by Blum &Weprin Associates Inc. between Aug. 29 and Sept. 1. The sample was based on a random-digital-dialing design that draws numbers from all existing telephone exchanges in the five boroughs of New York, giving all numbers, listed and unlisted, a proportionate chance of being included. Respondents were selected randomly within the household and offered the option of being interviewed in Spanish. The overall sample results were weighted demographically and geographically to population data. The estimated average sample tolerance for data from the survey is plus or minus 3 percent at the 95 percent confidence level. Sampling error for subgroups is higher. Sampling is only one source of error. Other sources of error may include question wording, question order and interviewer effects.?
That’s 146 words on survey sampling likely lost on many readers. We may quibble about the omission of the nonresponse rate (although they mention that results were weighted to represent known geographic and demographic distributions). We may find the phrase “sample tolerance? for “confidence interval? a tad confusing. We may protest that they forgot a comma before the “and? in the closing enumeration. But that’s about it.
I would cry tears of joy if the major papers in my native Germany would start taking survey sampling nearly as seriously as the Times. Instead, we get anecdote-laden head scratching over recent failures to predict national election results with anything approaching accuracy. Seriously, I know Europeans aren’t currently inclined to follow American examples. But how would attention to basic statistical ethics work for an exception?
Posted by Felix Elwert at 5:26 AM