| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
« The answer is -3.9% (plus or minus 17.4%) | Main | New York's anti-poverty experiment »
28 March 2007
Amy Perfors
This post started off as little more than some amusing wordplay brought on by the truism that "the plural of anecdote is not data". It's a sensible admonition -- you can't just exchange anectodes and feel like that's the equivalent of actual scientific data -- but, like many truisms, it's not necessarily true. After all, the singular of data is anecdote: every individual datapoint in a scientific study constitutes an anecdote (though admittedly probably a quite boring one, depending on the nature of your study). A better truism would therefore be more like "the plural of anecdote is probably not data", which of course isn't nearly as catchy.
The post started that way, but then I got to thinking about it more and I realized that the attitude embodied by "the plural of anecdote is not data" -- while a necessary corrective in our culture, where people far more often go too far in the other direction -- isn't very useful, either.
A very important caveat first: I think it's an admirable goal -- definitely for scientists in their professional lives, but also for everyone in our personal lives -- to as far as possible try to make choices and draw conclusions informed not by personal anecdote(s) but rather by what "the data" shows. Anecdote is notoriously unreliable; it's distorted by context and memory; because it's emotionally fraught it's all too easy to weight anecdotes that resound with our experience more highly and discount those that don't; and, of course, the process of anecdote collection is hardly systematic or representative. For all of those reasons, it's my natural temptation to distrust "reasoning by anecdote", and I think that's a very good suspicion to hone.
But... but. It would be too easy to conclude that anecdotes should be discounted entirely, or that there is no difference between anecdotes of different sorts, and that's not the case. The main thing that turns an anecdote into data is the sampling process: if attention is paid to ensuring not only that the source of the data is representative, but also that the process of data collection hasn't greatly skewed the results in some way, then it is more like data than anecdote. (There are other criteria, of course, but I think that's a main one).
That means, though, that some anecdotes are better than others. One person's anecdote about an incredibly rare situation should properly be discounted more than 1000 anecdotes from people drawn from an array of backgrounds (unless, of course, one wants to learn about that very rare situation); likewise, a collection of stories taken from the comments of a highly partisan blog where disagreement is immediately deleted -- even if there are 1000 of them -- should be discounted more than, say, a focus group of 100 people carefully chosen to be representative, led by a trained moderator.
I feel like I'm sort of belaboring the obvious, but I think it's also easy for "the obvious" to be forgotten (or ignored, or discounted) if its opposite is repeated enough.
Also, I think the tension between the "focus on data only" philosophy on one hand, and "be informed by anecdote" philosophy on the other, is a deep and interesting one: in my opinion, it is one of the main meta-issues in cognitive science, and of course comes up all the time in other areas (politics and policy, personal decision-making, stereotyping, etc). The main reason it's an issue, of course, is that we don't have data about most things -- either because the question simply hasn't been studied scientifically, or because it has but in an effort to "be scientific" the sample has been restricted enough that it's to know how well one can generalize beyond it. For a long time most studies in medicine used white men only as subjects; what then should one infer regarding women, or other genders? One is caught between the Scylla of using possibly inappropriate data, and the Charybdis of not using any data at all. Of course in the long term one should go out and get more data, but life can't wait for "the long term." Furthermore, if one is going to be absolutely insistent on a rigid reliance on appropriate data, there is the reductive problem that, strictly speaking, a dataset never allows you to logically draw a conclusion about anything other than itself. Unless it is the entire population, it will always be different than the population; the real question comes in deciding whether it is too different -- and as far as I can tell, aside from a few simple metrics, that decision is at least as much art as science (and is itself made partly on the basis of anecdote).
Another example, one I'm intimately familiar with, is the constant tension in psychology between ecological and external validity on the one hand, and proper scientific methodology on the other. Too often, increasing one means sacrificing the other: if you're interested in categorization, for instance, you can try to control for every possible factor by limiting your subjects to undergrad students in the same major, testing everyone in the same blank room at the same time of day, creating stimuli consisting of geometric figures with a clear number of equally-salient features, randomizing the order of presentation, etc. You can't be completely sure you've removed all possible confounds, but you've done a pretty good job. The problem is that what you're studying is now so unlike the categorization we do every day -- which is flexible, context-sensitive, influenced by many factors of the situation and ourselves, and about things that are not anything like abstract geometric pictures (unless you work in a modern art museum, I suppose) -- that it's hard to know how it applies. Every cognitive scientist I know is aware of this tension, and in my opinion the best science occurs right on the tightrope - not at the extremes.
That's why I think it's worth pointing out why the extreme -- even the extreme I tend to err on -- is best avoided, even if it seems obvious.
Posted by Amy Perfors at March 28, 2007 10:06 AM
Amy, I like your thoughts and the title of your posting. It seems related to the admonition I once heard a statistician give management, something to the effect that "One datapoint doesn't make a trend."
I've been wrestling with similar thoughts, too; see http://www.facilitatedsystems.com/weblog/2005/08/storytelling-and-reason-in-balance.html for my understanding of Edward Tufte's approach and http://www.facilitatedsystems.com/weblog/2005/12/proof-by-authority-reflections-on.html for a general guideline I try to use. Thoughts?
Posted by: Bill Harris at March 28, 2007 7:13 PM
Herbert Simon, writing on empirical research in organization theory, once noted: "Although case studies are only samples of one, such samples are infinitely more informative than samples of none. . . . [V]alid hypotheses are much more likely to emerge from direct, intimate encounter with organizations than from speculation" (Journal of Economic Literature, September 1992, p. 1504). This speaks to Amy's point that often all we have is anecdotes -- or, to put the point slightly differently, a well-informed anectode is better than quantitative data that proxy only very crudely for the underlying phenomenon of interest.
Posted by: Peter G. Klein at March 29, 2007 3:44 PM
Bill: thanks for the links. I really like the PGP idea -- moving from Particular to General and back to Particular. He seemed to be writing more about to the presentation of ideas than their discovery, but it might describe (some) types of discovery as well... I know that sometimes I get my own ideas by noticing a discrepancy of some kind or elaborating on from my experience -- using an anecdote, if you will -- which will inspire me to then test it more rigorously.
Peter: that's a great quote. It and your comment are far more succinct than my post, and say just what I was trying to get at.
Posted by: Amy at March 29, 2007 6:16 PM