| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
« May 9, 2008 | Main | May 15, 2008 »
13 May 2008
I know this isn't my normal day, but three points today:
| Error | Actual | Predicted |
|---|---|---|
I'm less worried about the turnout discrepancy; it happened because there had been no semi-open Democratic primary since Huckabee dropped out of the Republican contest. I was forced to use Pennsylvania (a closed primary) and Ohio (a semi-open primary, but with Huckabee still formally in) to predict turnout, which resulted in my underestimates. I'm more confident about my turnout projection in West Virginia, which is a semi-open primary, now that I have North Carolina to use as a predictor.
In predicting voter shares, my overall county-level correlations were .81 for Indiana and .88 for North Carolina -- on the whole pretty good, but with some problems. Below are spatial plots of residuals for North Carolina, and Indiana's appear above. Dark red corresponds to overestimation of Obama's support, and dark grey to underestimation of Obama's support.
| Error | Actual | Predicted |
|---|---|---|
The biggest mistake in my North Carolina predictions came with rural Blacks, who had not appeared significantly in my training data. The largest-magnitude residual was Greene County, a rural county that's 50% White and 40% Black (it's the small dark red). I projected a 70%-30% Obama victory, as is typical for counties with this racial split (note that among Democrats in such a county, Blacks will dominate). But somehow Clinton actually won this county 53% to 47%, putting me 23% off. In all of the neighboring rural black counties I had similarly overestimated Obama's support. This points to a possible interaction effect -- that rural blacks are more pro-Clinton than urban blacks.
Now to my top-line West Virginia prediction: Clinton 70.5%, Obama 29.5%, with a turnout of 300,000 votes. The map is below. I have Clinton taking every county in the state. Obama comes closest in Jefferson (a high-income, well-educated county next to Virginia) and Monongalia (a well-educated urban county that’s part of Pittsburgh tri-state).
With Clinton's impending departure, however, I plan to abandon these projections and move on to other fun. I really want to try a language model on Obama's and McCain's speeches.
Posted by Kevin Bartz at 5:48 PM
I recently came across Datamob.org, a site featuring public datasets and interfaces that have been built to help the public explore them.
From datamob's about page:
Our listings emphasize the connection between data posted by governments and public institutions and the interfaces people are building to explore that data.It's for anyone who's ever looked at a site like MAPLight.org and wondered, "Where did they get their data?" And for anyone who ever looked at THOMAS and thought, "There's got to be a better way to organize this!"
I continue to wonder how the types of interfaces featured on datamob will affect the dissemination of information in society. The dream of a lot of these interface builders is to disintermediate information provision -- ie, to make it possible for citizens to do their own research, produce their own insights, publish their findings on blogs and via data-laden widgets. (We welcomed Fernanda and Martin from Many Eyes, two prominent participants in this movement, earlier this year at our applied stats workshop.) At the same time, the new interfaces make it cheaper for professional analysts -- academics, journalists, consultants -- to access the data and, as they have always done, package it for public consumption. It makes me wonder to what extent the source of our data-backed insights will really change, ie, how much more common will "I was playing around with data on this website and found out that . . . " become relative to "I heard about this study where they found that . . ."?
My hunch is that, just as blogging and internet news has democratized political commentary, the new data resources will make it possible for a new group of relatively uncertified people to become intermediaries for data analysis. (I think FiveThirtyEight is a good example in political polling, although since the site's editor is anonymous I can't be sure.) People will overwhelmingly continue to get data insights as packaged by intermediaries rather than through new interfaces to raw data, but the intermediaries (who will use these new services) will be quicker to use data in making their points, will become much larger in number, and will on average become less credentialed.
Posted by Andy Eggers at 9:48 AM