August 2008
Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.12


« Data sets and data interfaces at datamob.org | Main | Placebo effects and the probability of assignment to active treatment »

13 May 2008

IN, NC Rehash; WV Prediction

I know this isn't my normal day, but three points today:


  • How I did in IN and NC

  • My prediction for WV

Error Actual Predicted
in.dem.2008.actual.error.png in.dem.2008.actual.share.png in.dem.2008.pred.share.png
  • Indiana was off by about 3%: I had predicted 53.5% Clinton, 46.5% Obama; the result was 50.6% Clinton, 49.4% Obama.
  • North Carolina was near spot-on: I had predicted 58% Obama, 42% Clinton; the result was 57.3% Obama, 42.7% Clinton.
  • I significantly underestimated turnout: there were 1.27 million votes in IN and 1.53 million, while I had predicted 950,000 and 1.2 million, respectively.

I'm less worried about the turnout discrepancy; it happened because there had been no semi-open Democratic primary since Huckabee dropped out of the Republican contest. I was forced to use Pennsylvania (a closed primary) and Ohio (a semi-open primary, but with Huckabee still formally in) to predict turnout, which resulted in my underestimates. I'm more confident about my turnout projection in West Virginia, which is a semi-open primary, now that I have North Carolina to use as a predictor.

In predicting voter shares, my overall county-level correlations were .81 for Indiana and .88 for North Carolina -- on the whole pretty good, but with some problems. Below are spatial plots of residuals for North Carolina, and Indiana's appear above. Dark red corresponds to overestimation of Obama's support, and dark grey to underestimation of Obama's support.

Error Actual Predicted
nc.dem.2008.actual.error.png nc.dem.2008.actual.share.png nc.dem.2008.pred.share.png

The biggest mistake in my North Carolina predictions came with rural Blacks, who had not appeared significantly in my training data. The largest-magnitude residual was Greene County, a rural county that's 50% White and 40% Black (it's the small dark red). I projected a 70%-30% Obama victory, as is typical for counties with this racial split (note that among Democrats in such a county, Blacks will dominate). But somehow Clinton actually won this county 53% to 47%, putting me 23% off. In all of the neighboring rural black counties I had similarly overestimated Obama's support. This points to a possible interaction effect -- that rural blacks are more pro-Clinton than urban blacks.

Now to my top-line West Virginia prediction: Clinton 70.5%, Obama 29.5%, with a turnout of 300,000 votes. The map is below. I have Clinton taking every county in the state. Obama comes closest in Jefferson (a high-income, well-educated county next to Virginia) and Monongalia (a well-educated urban county that’s part of Pittsburgh tri-state).

wv.dem.2008.pred.share.png

With Clinton's impending departure, however, I plan to abandon these projections and move on to other fun. I really want to try a language model on Obama's and McCain's speeches.

Posted by Kevin Bartz at May 13, 2008 5:48 PM

Comments

I was looking at coverage of FiveThirtyEight.com, a site where an anonymous blogger named Poblano similarly uses regression models (and not polls) to estimate election returns, and found some analysis on the National Journal comparing Poblano's accuracy with leading pollsters. Your model didn't do quite as well as Poblano on IN and NC, but it did better than a lot of leading pollsters did. Look here for their figure comparing the different predictions, with Kevin's prediction included.

Posted by: Andy Eggers at May 14, 2008 8:16 AM

Very interesting link, Andrew.

I glanced at his model and it's very similar, but he seems to have picked up two variables I didn't:

- Women in the workforce: this is probably the big one I missed.
- Percentage of people writing in "American" as their nationality: it's hard to tell whether this is a big one or not.

By the way, he predicted a 39-point Clinton win in West Virginia, whereas I had a 41-point win. Very similar!

Posted by: Kevin at May 14, 2008 8:29 AM

Right, and of course your NC prediction was unbeatable.

Posted by: Andy Eggers at May 14, 2008 8:39 AM