May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Gelman's Paradox (or, The Probabilistic Backwards Reasoning Fallacy) | Main | FAQs about Statistical Interactions »

22 April 2008

Predicting Pennsylvania, Updated

Update: Check out how my predictions fared! Two comparisons are given, one showing both maps in the same image and one as an animated GIF (kudos to the animation package in R).

pa.movie.gif

pa.dem.2008.comp.png

Overall, my predictions did pretty well. Their overall correlation with the true vote shares was .89 -- leading to an R^2 of .79, just below the in-sample R^2. My biggest miss was Centre County, where I predicted that Clinton would edge out Obama. Instead, Obama won pretty convincingly, with over 60% of the vote. I also overestimated Obama’s support in some of the counties surrounding Philadelphia. Not sure what I can do to improve the model next time. If you have any ideas, leave a comment.

Original entry:This isn't my normal blogging day, but I wanted to show my final Pennsylvania prediction map. Later on I will update my post to include the true map in the same color scheme, so we can compare. I have updated the prediction model after everyone's suggestions last time.

pa.dem.2008.png

The big problems last time were:

  • Kerry's vote share was only a loose indicator of Obama's, not enough to base a model upon
  • The model didn't incorporate other obvious factors like population density, nearby colleges, etc.
  • R^2 = 0.16 isn't all that god!

There were other comments, too, but not all of them could be addressed effectively (What else can I do besides predict on the county level? That's where we have data!) Well, I'm happy to say that for the latest model I pulled in lots more covariates from the census:

  • Kerry's 2004 vote share
  • % Whites
  • % Blacks
  • % Hispanics
  • % males
  • % young people (age 18 through 21)
  • % urban population
  • Population density
  • Median household income

With all these, the model fits like a dream come true. R^2 = 0.82 and a residual standard error of 0.04 (i.e., +- 8% of Obama's true share). Here are the estimated coefficients (after pruning some variables based on the BIC):

Name

Estimate

Std. Error

t value

Pr(>|t|)

(Intercept)

-1.93

0.35

-5.44

0.00

kerry

-0.29

0.06

-4.66

0.00

black

1.00

0.10

9.81

0.00

hisp

0.74

0.30

2.49

0.01

male

-1.52

0.33

-4.60

0.00

young

1.46

0.22

6.59

0.00

log(income)

0.29

0.03

9.96

0.00

The coefficients are pretty much as you expect: counties with more Blacks, young people and higher incomes vote for Obama. Poorer counties and counties where Kerry did well tend to go for Clinton. The only somewhat surprising part is the negative coefficient on male population. You would think counties with more females would go for Clinton. There's probably some confounder, because there were several counties in Ohio with 55% male populations who went for Clinton.

Anyway, I will update this post tomorrow comparing my predictions to the realized results.

Posted by Kevin Bartz at April 22, 2008 11:16 AM

Comments

So what is the bottom line? Couldn't you predict the statewide popular vote margin from the estimated county results? And delegates -- although the formula is tricky I suppose.

Posted by: Andy Eggers [TypeKey Profile Page] at April 22, 2008 12:00 PM

Good point. I know it's silly, but there is no bottom line. I'm not brave enough to try to predict turnout by county this time. There's too much going on, like Pennsylvania being a closed primary state. There's also precious little historical data; when's the last time the Democratic Presidential race was competitive by the time it reached Pennsylvania? Also, the national level of Obama support no doubt rose over the 1.5 months since Ohio. That's why I'm going to limit my prediction to a map -- for this time. I know it's kind of silly not to have a top-line prediction but I want to see how well this will work first. For Indiana, an open-primary state only a week from now, I think a turnout model may be possible.

Posted by: Kevin at April 22, 2008 12:19 PM

I think a turnout model will be a must for this post.

Posted by: Ben at April 27, 2008 4:44 AM

How about plotting the prediction error by county? That would complement your charts nicely, I think.

Posted by: Kaiser at April 28, 2008 10:27 PM

Thanks for the suggestions. We're going to have all of those things in my Indiana and North Carolina predictions on Friday:

- A turnout model (with 80% R^2 fit!)
- A spatial plot of the errors
- A prediction of both turnout and each candidate's vote totals in both Indiana and NC.

Posted by: Kevin at April 29, 2008 5:41 PM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)