| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | |||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 |
| 31 |
« Andy Eggers and Jens Hainmueller on " Estimating Returns to Office in Post-War British Politics" | Main | Kenneth Hill on "Global Health and Global Goals: Do Targets Make a Difference?" »
11 March 2008
While the Democratic nomination contest drags on (and on and on...; Tom Hanks declared himself bored with the race last week), attention is turning to hypothetical general election matchups between Hilary Clinton or Barack Obama and John McCain. Mystery Pollster has a post up reporting on state-by-state hypothetical matchup numbers obtained from surveys of 600 registered voters in each state conducted by Survey USA. There is some debate about the quality of the data (Survey USA uses Interactive Voice Response to conduct its surveys, there is no likely voter screen, etc.). But we have what we have.
At this point, the results are primarily of interest to the extent that they speak to the "electability" question on the Democratic side; who is more likely to beat McCain? MP goes through the results state by state, classifying each state into Strong McCain, Lean McCain, Toss-up, etc. From this you can calculate the number of electoral votes in each category, which provides some information but isn't exactly what we're interested in.
This problem is a natural one for the application of some simple, naive Bayesian ideas. If we throw on some flat priors, make all sorts of unreasonably strong independence assumptions, and assume that the results were derived from simple random sampling, we can quickly get posterior distributions for the support for each candidate in each state and can calculate estimates of the probability of victory. From there, it is easy to calculate the posterior distribution of the number of electoral votes for each candidate and find posterior probabilities that Obama beats McCain, Clinton beats McCain, or the probability that Obama would receive more electoral votes than Clinton.
While I was sitting around at lunch yesterday, I ran a very quick analysis using the reported SurveyUSA marginals. Essentially, I took samples from 50 independent Dirichlet posteriors for both hypothetical matchups, assuming a flat prior and multinomial sampling density (to allow for undecideds); to avoid dealing with the posterior predictive distributions, I'm just going to assume that all registered voters will vote so I can just compare posterior proportions. When you run this, you obtain estimates (conditional on the data and, most importantly, the model) that the probability of an Obama victory over McCain is about 88% and the probability of a Clinton victory is about 72%. There is a roughly 70% posterior probability that Obama would win more electoral votes than Clinton.
As I mentioned, this is an extremely naive Bayesian approach. There are a lot of ways that one could make the model better: adding additional sources of uncertainty, allowing for correlations between the states, using historical information to inform priors, and imposing a hierarchical structure to shrink outlying estimates toward the grand mean. One place to start would be by modeling the pairs of responses to the two hypothetical matchup questions. Any of these things, however, is going to be much easier to do in a Bayesian framework, since calculating posterior distributions of functions of the model parameters is extremely easy.
Posted by Mike Kellermann at March 11, 2008 11:17 AM
Nice post, Mike. Retitle this "Obama more electable" and get it Dugg and we could crash the servers.
Posted by: Andy Eggers at March 12, 2008 11:40 PM
"probability of an Obama victory over McCain is about 88% and the probability of a Clinton victory is about 72%.". That is pretty high... When was this survey taken?
Posted by: nds roms at March 14, 2008 11:26 AM
Hi, I would be very curious to know which kind of modification would you do in this model in the case that you like to incorporate the between states correlations. thanks.
Posted by: yiannis at March 20, 2008 6:53 PM
The first thing that I would think of to introduce correlations between the states would be to use a more complicated, potentially hierarchical prior on the proportions in each state. The results presented here use almost no prior information, but we know for example that Michigan and Pennsylvania have looked pretty similar in recent elections, and both of these states have been somewhat more Democratic than Ohio. In the Survey USA polls, however, MI and PA are both essentially toss-ups while both Democratic candidates have 10 point leads in OH. It would probably make sense to shrink these back toward their typical relative historical positions, and you could do this with an appropriate informative prior.
Incidentally, Erikson and Sigman report the results of a similar simulation exercise in a post on pollster.com and obtain essentially the same answers as we did above (but three days later!).
Posted by: Mike Kellermann at March 22, 2008 5:32 PM