| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 | 31 |
« October 21, 2005 | Main | October 25, 2005 »
24 October 2005
This week, the Applied Statistics Workshop will be presenting a talk by Gopi Goswami of the Harvard Statistics Department entitled "Evolutionary Monte Carlo Methods for Clustering." Gopi Goswami received his Ph.D. from the Department of Statistics at Harvard in June 2005. Before coming to Harvard, he was an undergraduate and master's student at the Indian Statistical Institute in Calcutta. His dissertation, "On Population-Based MCMC Methods," develops new techinques for more efficiently sampling from a target density. He is currently a post-doctoral scholar in the Harvard Statistics Department. The presentation will be at noon on Wednesday, October 26 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The paper he will present on Wednesday explores these methods in the context of clustering problems:
We consider the problem of clustering a group of observations according to some objective function (e.g. K-means clustering, variable selection) or according to a posterior density (e.g. posterior from a Dirichlet Process prior) of cluster indicators. We cast both kinds of problems in the framework of sampling for cluster indicators. So far, Gibbs sampling, “split-merge� Metropolis-Hasting algorithm and various modifications of these have been the basic tools used for sampling in this context. We propose a new population based MCMC approach, in the same vein as parallel tempering. We introduce three new “crossover moves� (based on swapping and reshuffling sub-clusters intersections) which make such an algorithm very efficient with respect to Integrated Autocorrelation Time (IAT) of various relevant statistics and also with respect to the ability to escape from local modes. We call this new algorithm Population Based Clustering (PBC) algorithm. We apply PBC algorithm to motif clustering, Beta mixture of Bernoulli clustering and a Bayesian Information Criterion (BIC) based variable selection problem. We also discuss clustering of mixture of Normals and compare the performance PBC algorithm as a stochastic optimizer with K-means clustering.
Posted by Mike Kellermann at 3:16 PM
You, Jong-Sung
One of my most embarrassing experiences occurred surrounding the use of instrumental variables in my ASR article with Sanjeev Khagram on inequality and corruption (2005). The article developed from my qualifying paper on causes of corruption (2003), in which I examined several hypotheses on the causal effects of inequality, democracy, economic development, and trade openness. Since all these four explanatory variables may be affected by corruption, I tried to find appropriate instruments. Initially, I tried five: latitude, # frost days, malaria prevalence index, ethno-linguistic fractionalization, and constructed openness. They had a strong predictive power for the endogenous variables in the first stage regression, and the p-values for the over-identification test in the second stage regressions were generally large enough so that I could not reject the null hypothesis of no correlation between the instruments and the error term of the regression. I worked with Professor Khagram to make a publishable article from my qualifying paper, and we submitted our manuscript to the ASR. The first review we received from the editor was encouraging. The editor advised us to “revise and resubmit� in his three-page long letter, which showed his interest in our paper. But the editor as well as an anonymous reviewer asked us to provide an argument explaining how our instruments were correlated with the endogenous variables but not directly correlated with corruption. I initially considered responding to this critique by citing Rodrik et al.’s draft paper entitled “Institutions Rule: The Primacy of Institutions over Geography and Integration in Economic Development� (later published in the Journal of Economic Growth, 2004), which argued, “An instrument is something that simply has some desirable statistical properties. It need not be a large part of the causal story.�
However, I was criticized regarding the use of instruments when I presented at a Work-in-Progress Seminar at the Kennedy School of Government and at Comparative Political Economy Conference at Yale University in spring 2004. In the Work-in-Progress Seminar, some professors at the Kennedy School noted that overidentification test can pass if they are all wrong in the same direction. In the Yale conference, Professor Daron Acemoglu of MIT was a discussant for my paper, and he used the term “IV etiquette� to emphasize the importance of giving a plausible story for the first stage. He pointed that without a clear story for the fist stage, it is impossible to tell whether the instrument is uncorrelated with unobserved determinants of the dependent variable. It was really an embarrassing moment when I was criticized for the lack of etiquette in front of many scholars.
So, I had to find more convincing instruments. In this regard, I have to thank my friend, Andrew Leigh, who was a doctoral student in public policy then and is currently Research Fellow at Australian National University. He found that “mature cohort size� can be used as an instrument for inequality in his dissertation paper entitled "Does Equality Lead to Fraternity?", based on Higgins and Williamson's (1999) theory of cohort size effect on income inequality. Also, I came to realize how conference presentations and discussions can be helpful in improving the quality of research.
Posted by SSS Coauthors at 4:04 AM