April 2008
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« Gelman's Paradox (or, The Probabilistic Backwards Reasoning Fallacy) | Main | FAQs about Statistical Interactions »

22 April 2008

Predicting Pennsylvania, Updated

Update: Check out how my predictions fared! Two comparisons are given, one showing both maps in the same image and one as an animated GIF (kudos to the animation package in R).

pa.movie.gif

pa.dem.2008.comp.png

Overall, my predictions did pretty well. Their overall correlation with the true vote shares was .89 -- leading to an R^2 of .79, just below the in-sample R^2. My biggest miss was Centre County, where I predicted that Clinton would edge out Obama. Instead, Obama won pretty convincingly, with over 60% of the vote. I also overestimated Obama’s support in some of the counties surrounding Philadelphia. Not sure what I can do to improve the model next time. If you have any ideas, leave a comment.

Original entry:This isn't my normal blogging day, but I wanted to show my final Pennsylvania prediction map. Later on I will update my post to include the true map in the same color scheme, so we can compare. I have updated the prediction model after everyone's suggestions last time.

pa.dem.2008.png

The big problems last time were:

  • Kerry's vote share was only a loose indicator of Obama's, not enough to base a model upon
  • The model didn't incorporate other obvious factors like population density, nearby colleges, etc.
  • R^2 = 0.16 isn't all that god!

There were other comments, too, but not all of them could be addressed effectively (What else can I do besides predict on the county level? That's where we have data!) Well, I'm happy to say that for the latest model I pulled in lots more covariates from the census:

  • Kerry's 2004 vote share
  • % Whites
  • % Blacks
  • % Hispanics
  • % males
  • % young people (age 18 through 21)
  • % urban population
  • Population density
  • Median household income

With all these, the model fits like a dream come true. R^2 = 0.82 and a residual standard error of 0.04 (i.e., +- 8% of Obama's true share). Here are the estimated coefficients (after pruning some variables based on the BIC):

Name

Estimate

Std. Error

t value

Pr(>|t|)

(Intercept)

-1.93

0.35

-5.44

0.00

kerry

-0.29

0.06

-4.66

0.00

black

1.00

0.10

9.81

0.00

hisp

0.74

0.30

2.49

0.01

male

-1.52

0.33

-4.60

0.00

young

1.46

0.22

6.59

0.00

log(income)

0.29

0.03

9.96

0.00

The coefficients are pretty much as you expect: counties with more Blacks, young people and higher incomes vote for Obama. Poorer counties and counties where Kerry did well tend to go for Clinton. The only somewhat surprising part is the negative coefficient on male population. You would think counties with more females would go for Clinton. There's probably some confounder, because there were several counties in Ohio with 55% male populations who went for Clinton.

Anyway, I will update this post tomorrow comparing my predictions to the realized results.

Posted by Kevin Bartz at April 22, 2008 11:16 AM