January 2007
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« January 16, 2007 | Main | January 22, 2007 »

17 January 2007

Bayesian inference and natural selection

Amy Perfors

I saw an thought-provoking post at John Baez's diary the other day pointing out an interesting analogy between natural selection and Bayesian inference, and I can't decide if I should classify it as just "neat" or if it might also be "neat, and potentially deep" (which is where I'm leaning). Because it's a rather lengthy post, I'll just quote the relevant bits:

The analogy is mathematically precise, and fascinating. In rough terms, it says that the process of natural selection resembles the process of Bayesian inference. A population of organisms can be thought of as having various "hypotheses" about how to survive - each hypothesis corresponding to a different allele. (Roughly, an allele is one of several alternative versions of a gene.) In each successive generation, the process of natural selection modifies the proportion of organisms having each hypothesis, according to Bayes' law!

Now let's be more precise:

Bayes' law says if we start with a "prior probability" for some hypothesis to be true, divide it by the probability that some observation is made, then multiply by the "conditional probability" that this observation will be made given that the hypothesis is true, we'll get the "posterior probability" that the hypothesis is true given that the observation is made.

Formally, the exact same equation shows up in population genetics! In fact, Chris showed it to me - it's equation 9.2 on page 30 of this book:

* R. Bürger, The Mathematical Theory of Selection, Recombination and Mutation, section I.9: Selection at a single locus, Wiley, 2000.

But, now all the terms in the equation have different meanings!

Now, instead of a "prior probability" for a hypothesis to be true, we have the frequency of occurence of some allele in some generation of a population. Instead of the probability that we make some observation, we have the expected number of offspring of an organism. Instead of the "conditional probability" of making the observation, we have the expected number of offspring of an organism given that it has this allele. And, instead of the "posterior probability" of our hypothesis, we have the frequency of occurence of that allele in the next generation.

Baez goes on to wonder, as I do, if people doing work on genetic programming or Bayesian approaches to machine learning have noticed this relationship. I feel like I would have remembered if I'd seen something like this (at least recently), and I don't remember anything, but that doesn't mean it's not there -- any pointers, anyone? [The closest I can think of is an interesting chapter (pdf) by David MacKay called "Why have sex? Information acquisition and evolution", but it's mainly about how one can use information theory to quantify the argument for why recombination (sex) is a better way to spread useful mutations and clear less-useful ones].

Also, re: the conceptual deepness of this point... I've long thought (and I'm sure I'm not alone in this) that it's useful to see natural selection as a guided search over genotype (or phenotype) space; Bayesian inference, i.e., searching over "problem space" so as to maximize posterior probability seems to be a valuable and useful thing to do in machine learning and cognitive science. [Incidentally, I've also found it to be a useful rhetorical tool in discussing evolution with creationists -- the idea that computers can do intelligent searches over large spaces and find things with small "chance" probability is one that many of them can accept, and from there it's not so much of a leap to think that evolution might be kind of analogous; it also helps them to understand how "natural selection" is not "random chance", which seems to be the common misunderstanding]. Anyway, in that superficial sense, it's perhaps not surprising that this analogy exists; on the other hand, the analogy goes deeper than "they are both searches over a space" -- it's more along the lines of "they are both trying to, essentially, maximize the same equation (posterior probability)." And that's interesting; where they differ is just, of course, in how each particular probability is calculated. I'd guess that a lot of the work in natural selection is done in calculating P (offspring | allele) -- just as in a lot of Bayesian inference, many times the problem-specific part is setting up and figuring out how to calculate the likelihood. Also, I wonder if you could do some interesting work in mathematical genetics by manipulating the prior (P(allele))... Baez's excerpt defines it simply as the frequency of occurrence of some allele in some generation, but of course it's more complicated than that - you also have to include the probability of a mutation to that allele, since otherwise you'd never get the emergence of novel alleles or combinations. And I bet figuring out how to include the additional complexity would be very interesting and meaningful.

Anyway, I'm now speculating on things I know very little about, and I should go read the Burger book (which has been duly added to my ever-expanding reading list). But I thought I'd throw out these speculations right now anyway, since you all might find them interesting. And if anyone has any other references, I'd love to see them.

Posted by Amy Perfors at 9:40 AM