| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
« Gelman's Paradox (or, The Probabilistic Backwards Reasoning Fallacy) | Main | FAQs about Statistical Interactions »
22 April 2008
Update: Check out how my predictions fared! Two comparisons are given, one showing both maps in the same image and one as an animated GIF (kudos to the animation package in R).

Overall, my predictions did pretty well. Their overall correlation with the true vote shares was .89 -- leading to an R^2 of .79, just below the in-sample R^2. My biggest miss was Centre County, where I predicted that Clinton would edge out Obama. Instead, Obama won pretty convincingly, with over 60% of the vote. I also overestimated Obama’s support in some of the counties surrounding Philadelphia. Not sure what I can do to improve the model next time. If you have any ideas, leave a comment.
Original entry:This isn't my normal blogging day, but I wanted to show my final Pennsylvania prediction map. Later on I will update my post to include the true map in the same color scheme, so we can compare. I have updated the prediction model after everyone's suggestions last time.
The big problems last time were:
There were other comments, too, but not all of them could be addressed effectively (What else can I do besides predict on the county level? That's where we have data!) Well, I'm happy to say that for the latest model I pulled in lots more covariates from the census:
With all these, the model fits like a dream come true. R^2 = 0.82 and a residual standard error of 0.04 (i.e., +- 8% of Obama's true share). Here are the estimated coefficients (after pruning some variables based on the BIC):
Name | Estimate | Std. Error | t value | Pr(>|t|) |
| (Intercept) | -1.93 | 0.35 | -5.44 | 0.00 |
| kerry | -0.29 | 0.06 | -4.66 | 0.00 |
| black | 1.00 | 0.10 | 9.81 | 0.00 |
| hisp | 0.74 | 0.30 | 2.49 | 0.01 |
| male | -1.52 | 0.33 | -4.60 | 0.00 |
| young | 1.46 | 0.22 | 6.59 | 0.00 |
| log(income) | 0.29 | 0.03 | 9.96 | 0.00 |
The coefficients are pretty much as you expect: counties with more Blacks, young people and higher incomes vote for Obama. Poorer counties and counties where Kerry did well tend to go for Clinton. The only somewhat surprising part is the negative coefficient on male population. You would think counties with more females would go for Clinton. There's probably some confounder, because there were several counties in Ohio with 55% male populations who went for Clinton.
Anyway, I will update this post tomorrow comparing my predictions to the realized results.
Posted by Kevin Bartz at April 22, 2008 11:16 AM
So what is the bottom line? Couldn't you predict the statewide popular vote margin from the estimated county results? And delegates -- although the formula is tricky I suppose.
Posted by: Andy Eggers
at April 22, 2008 12:00 PM
Good point. I know it's silly, but there is no bottom line. I'm not brave enough to try to predict turnout by county this time. There's too much going on, like Pennsylvania being a closed primary state. There's also precious little historical data; when's the last time the Democratic Presidential race was competitive by the time it reached Pennsylvania? Also, the national level of Obama support no doubt rose over the 1.5 months since Ohio. That's why I'm going to limit my prediction to a map -- for this time. I know it's kind of silly not to have a top-line prediction but I want to see how well this will work first. For Indiana, an open-primary state only a week from now, I think a turnout model may be possible.
Posted by: Kevin at April 22, 2008 12:19 PM
I think a turnout model will be a must for this post.
Posted by: Ben at April 27, 2008 4:44 AM
How about plotting the prediction error by county? That would complement your charts nicely, I think.
Posted by: Kaiser at April 28, 2008 10:27 PM
Thanks for the suggestions. We're going to have all of those things in my Indiana and North Carolina predictions on Friday:
- A turnout model (with 80% R^2 fit!)
- A spatial plot of the errors
- A prediction of both turnout and each candidate's vote totals in both Indiana and NC.
Posted by: Kevin at April 29, 2008 5:41 PM