May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Cost-effectiveness of anti-terrorist spending | Main | Political Economy Students Conference »

4 April 2008

Predicting Pennsylvania

Here are the results of the Pennsylvania Democratic primary, with Obama counties in purple and Clinton counties in Orange.

pa.dem.2008.png

What, you say? The Pennsylvania primary hasn't happened yet? You're right. Enter statistics!

Consider this scatterplot of Kerry's 2004 vote share versus Obama's 2008 vote shares in Ohio counties. The result is something I call the Kerry-Obama smile: Obama does well in Kerry's best counties, where staunchly Democratic urban blacks are concentrated; and in Kerry's worst regions, presumably due to Obama's appeal to crossover Republicans. Clinton does best in the wide middle swath.

kerry.obama.png

This motivates a very simple modeling idea: fit a curve to the scatterplot. Obviously, a quadratic in Kerry's share looks like a decent fit. That gives us the best-fit line shown on the plot. The R-squared is 0.16, representing an okay fit.

The next step is utterly useless, but utterly fun. We can use Ohio to predict Pennsylvania. In other words, given that we know how Kerry did in Pennsylvania counties in 2004, we can predict how well Obama will do in 2008 in every Pennsylvania county. Note that I first tweaked the model's intercept slightly in Obama's favor, so that the aggregate prediction matches the current polling average (showing Clinton up by 6.6%).

The bad news for Obama is that nearly all of Pennsylvania's counties fall in the middle of the smile. The image below compares Kerry in 2004 to the model's predictions for Obama in 2008. Obama is predicted to carry Philadelphia overwhelmingly, and to do well in some of the curvy, heavily Republican counties in the south-center of the state. Everywhere else, though, is Clinton country.

pa.comp.png

Posted by Kevin Bartz at April 4, 2008 1:15 PM

Comments

I was with you until you got to the actual curve-fit. The graph says it all: When you look at Kerry's share = 0.4, for example, there is a wide spread of actuals. r-squared notwithstanding, it looks as though the actuals at Kerry = 0.4 vary from the predicted by at least +/- 15%! I have a hard time accepting that as "an okay fit".

The other thing is obviously population. Counties don't vote in American elections, voters and delegates do. The one county you've predicted as Obama's is Philadelphia, center of the largest concentration of people in the state. After Pittsburgh and Philadelphia, the remaining counties (which fill up the majority of your colorful map) represent a comparatively small fraction of the Pennsylvania population.

Still, I do not mean to discourage. This is an interesting analysis, and I hope this criticism is taken as constructive.

Posted by: Will Dwinnell at April 4, 2008 8:26 PM

The general vs. primary thing is also an issue. Centre county (that big, Woody-Woodpecker-looking thing right in the middle of PA) consists of State College and rural PA, split about 50-50 population-wise. The latter is entirely GOP territory, the former entirely Democratic -- hence the 50-50 split for Kerry in the 2004 general. The idea that that will translate into trouble for Obama *in the primary* when nearly all Democratic voters in the county are relatively well-off, highly-educated, PSU-affiliated State Collegians (a place where he drew 20K to a rally last weekend) seems like a stretch.

The larger point, of course, is that generals aren't primaries, something I wish the Evan Bayh's of the world would get through their skulls every time I hear them say how important it is that Clinton has won the "big states"...

Posted by: C. Zorn at April 6, 2008 1:21 PM

Great post. I agree that the graph does say it all. It speaks volumes with the numbers in respect of the points made.

Posted by: Jamie Silvers at April 6, 2008 5:14 PM

Hey guys! Thanks for all the comments. I guess I got a little too excited about a rather weak relationship. In reality, of course, there are other, better correlates with the Obama-Hillary vote than the Kerry-Bush 2004 vote. Perhaps rather than predicting Pennsylvania, I would've done better with a more general post about 2004 results vs. 2008 results. One amusing fact is that the only county won by Ron Paul in Iowa happened to be one of three counties won by Howard Dean in 2004. Not surprising!

Posted by: Kevin at April 6, 2008 5:21 PM

Ha ha, Thats right Kevin. However it was a good post, and interesting to read.I have just visited your website politistats, which also showed some examples that are interesting to me too. Thank you.

Posted by: Jamies Silvers at April 7, 2008 3:12 AM

Hi :)

Very interest post...

Regards,

Posted by: Renowacja at April 7, 2008 8:03 AM

This is sad for the Obamaites if the race where to be decided in this state.

Ray S.

Posted by: Online Colleges at April 8, 2008 9:16 AM

Great thinking. It is always useful to have a sound theory when using statistic to predict the future. And I think you have some good arguments eventhough they are difficult to understand.
Thank you for some interesting and challeging thoughts!

Posted by: Søren Kjær Vestergaard at April 10, 2008 2:31 PM

The problem with the forecast is that the Ohio democratic primary was an open primary: républicans and independants were allowed to vote.

In Pennsylvania, the démocratic primary will be a closed one. Only democratic registered voters will be allow to go to the polls.

So, I guess that the outcome will be far better for Clinton in Pennsylvania than in Ohio.

Posted by: Mathieu Gallard at April 11, 2008 11:03 AM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)