<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Social Science Statistics Blog</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/" />
    <link rel="self" type="application/atom+xml" href="http://www.iq.harvard.edu/blog/sss/atom.xml" />
   <id>tag:www.iq.harvard.edu,2009:/blog/sss/2</id>
    <link rel="service.post" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2" title="Social Science Statistics Blog" />
    <updated>2009-11-08T01:21:41Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.24-en</generator>
 

<entry>
    <title>Just in time for &quot;Superfreakonomics&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/11/just_in_time_fo.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1233" title="Just in time for &quot;Superfreakonomics&quot;" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1233</id>
    
    <published>2009-11-08T01:02:19Z</published>
    <updated>2009-11-08T01:21:41Z</updated>
    
    <summary>A friend recently pointed me to a 2007 New Republic article in which the author, Noam Scheiber, argues that the &quot;Freakonomics&quot; phenomenon is lamentable because it represents a trend toward research in which clever identification strategies are prized over attempts...</summary>
    <author>
        <name>Deirdre Bloome</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>A friend recently pointed me to a 2007 New Republic <a href="http://www.tnr.com/article/freaks-and-geeks-how-freakonomics-ruining-the-dismal-science?page=0,0">article</a> in which the author, Noam Scheiber, argues that the "Freakonomics" phenomenon is lamentable because it represents a trend toward research in which clever identification strategies are prized over attempts to answer what Scheiber calls "truly deep questions." Although two years and the publication date of a second Levitt and Dubner book have since passed, the article caught my attention because I have been considering a related issue of late. We are all well aware of how difficult it is to make causal inferences in the social sciences, so it is not surprising that researchers are drawn to settings in which some source of exogenous variation allows for identification of the influence of a specific causal factor. In fact, progress on those "truly deep questions" depends in part on this type of work. However, focus on clean identification has some potentially negative implications. Scheiber names one: answering questions of peripheral interest. A second, which is of greater concern for me, is concentrating on population subgroups that may or may not be of scientific interest in and of themselves and that, in either case, are unable to provide direct insights into broader population dynamics. </p>]]>
        <![CDATA[<p>Thanks to <a href="http://www.jstor.org/stable/2951620">Imbens and Angrist</a>, we know that even when it is not possible to identify the population average effect of a "treatment" (i.e., causal factor of interest) on a given outcome, it is often possible to identify a "local average treatment effect," that is, the average effect of a treatment for the subpopulation whose treatment status is affected by changes in the exogenous regressor. This subpopulation is composed of so-called "compliers," who will take the treatment when assigned to take it and will not when they are not. Sometimes this subpopulation is of scientific or policy interest (<a href="http://www.jstor.org/stable/2937954">for example</a>, we may be interested in knowing the effect of additional schooling on earnings for those students who might drop out of high school but for compulsory education laws). Oftentimes, it is not. In contrast, the broader population and the portion of the population that receives treatment are almost always of interest. These groups are certainly policy-relevant (it would be misleading to project the effect of a drug on public health based only on the drug's effect amongst those who were induced to take the drug) and they are needed to generate "stylized facts" that help us organize our understanding of the social world. (Also, these groups can be observed whereas compliers are not a generally identified subpopulation.)</p>

<p>Unfortunately, when treatment effects are heterogeneous, the identified local average effect does not provide direct information about the wider population. This is problematic since treatment effects are likely to be heterogeneous in social science applications. In fact, this heterogeneity is one of the reasons why identifying causal effects is so difficult (individuals' self-selection into a treatment status based in part on anticipated treatment effects induces endogeneity problems). </p>

<p>A number of demographers have discussed the problem of extrapolating local average treatment effect estimates to the broader population. Greg Duncan, in his <a href="http://muse.jhu.edu/journals/demography/summary/v045/45.4.duncan.html">presidential address</a> to the Population Association of America, stated that although causal inference is "often facilitated by eschewing full population representation in favor of an examination of an exceedingly small but strategically selected portion of a general population with the 'right kind' of variation in the key independent variable of interest.... a population-based understanding of causal effects should be our principal goal." Robert Moffitt <a href="http://muse.jhu.edu/journals/demography/summary/v042/42.1moffitt.html">writes</a> that although "some type of implicit weighting is needed" to help us understand how to trade off internal and external validity, "this problem has not really been addressed in the applied research community." Some researchers have suggested using bounds for average treatment effects that are not point-identified (for example, <a href="http://www.jstor.org/stable/2006592">Manski</a>). Of course, the usefulness of bounding techniques depends on the tightness of the bounds, which in turn depends on what assumptions we are willing to impose - and it is exactly scholars' discomfort with prevailing assumptions (e.g., lack of correlation between the error and the treatment indicator) that drove the current focus on non-representative population subgroups. It seems to me that there is still work to be done to connect subpopulation causal estimates to broader population trends. I would be interested to hear of work in this area that you think is promising. <br />
</p>]]>
    </content>
</entry>

<entry>
    <title>Airoldi on &quot;A statistical perspective on complex networks&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/11/airoldi_on_a_st.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1230" title="Airoldi on &quot;A statistical perspective on complex networks&quot;" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1230</id>
    
    <published>2009-11-03T15:47:44Z</published>
    <updated>2009-11-03T15:49:26Z</updated>
    
    <summary>I hope you can join us at the Applied Statistics Workshop this Wednesday, November 4th, when we will be happy to have Edo Airoldi, Assistant Professor in the Department of Statistics here at Harvard. Edo will be presenting a talk...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>I hope you can join us at the Applied Statistics Workshop this Wednesday, November 4th, when we will be happy to have <a href="http://www.people.fas.harvard.edu/~airoldi/">Edo Airoldi</a>, Assistant Professor in the Department of Statistics here at Harvard. Edo will be presenting a talk entitled "A statistical perspective on complex networks" for which he has provided the following abstract:<br />
<blockquote><br />
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of science, as many scientific inquiries involve collections of measurements on pairs of objects. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. In this talk, I will review a few ideas that are central to this burgeoning literature. I will emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. I will conclude by describing open problems and challenges for machine learning and statistics.<br />
</blockquote><br />
The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Happy Halloween</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/happy_halloween.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1227" title="Happy Halloween" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1227</id>
    
    <published>2009-10-30T14:05:40Z</published>
    <updated>2009-10-30T14:15:12Z</updated>
    
    <summary>It made my day when this showed up in my inbox this morning. I&apos;m glad to see someone knows what to do if/when the zombie outbreak occurs....</summary>
    <author>
        <name>Richard Nielsen</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>It made my day when <a href="http://www.mathstat.uottawa.ca/~rsmith/Zombies.pdf">this</a> showed up in my inbox this morning.  I'm glad to see <em>someone</em> knows what to do if/when the zombie outbreak occurs.<br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Matching Markets</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/matching_market.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1226" title="Matching Markets" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1226</id>
    
    <published>2009-10-29T20:50:14Z</published>
    <updated>2009-10-30T09:49:10Z</updated>
    
    <summary>Rich&apos;s post on instruments the other day reminded me of a conversation that I&apos;ve been having with a faculty member; although the connection may not be particularly clear, at least at first. The setup is that there are many markets...</summary>
    <author>
        <name>Martin Andersen</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>Rich's <a href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/multiple_instru.shtml">post</a> on instruments the other day reminded me of a conversation that I've been having with a faculty member; although the connection may not be particularly clear, at least at first.</p>

<p>The setup is that there are many markets in which buyers and sellers are distinct types of actors, for example, the market for spouses has, until recently, been such a market (although I make no claim as to which side of the market is buying and which is selling).  This market, in the form of college applications, was analyzed by Gale and Shapley in a famous 1962 <a href="http://www.jstor.org/pss/2312726">paper</a> in which they proved that there was a solution to this type of matching problem.</p>

<p><br />
</p>]]>
        <![CDATA[<p>Another example, that motivated my interest in the question, is the market for medical residents (see <a href="http://jama.ama-assn.org/cgi/content/abstract/289/7/909">here</a>).  Shortly before graduation medical students apply for positions with various residency programs across the country by submitting rank order lists to a central clearinghouse; residency programs enter into a similar process of ranking medical students.  The clearinghouse then produces an assignment that is optimal in an economic sense.</p>

<p>Unfortunately, this setup does not permit the applied researcher (or poor grad student) much traction for identifying the effect of being assigned to a particular residency program.</p>

<p>One solution comes from some <a href="http://www3.interscience.wiley.com/journal/118483220/abstract">work</a> by Morten Sorensen on matching in venture capital.  His idea is to model the decision process leading to investments by venture capitalists in early stage companies and at the same time to model his outcome of interest (company goes public) thus allowing for correlations between the respective error terms of the attractiveness / matching model and the outcome equation.  Sorensen makes the point that this method makes use of the characteristics of other investors and investments in the market as instruments in order to address the fact that "better" investors may invest in "better" companies.</p>

<p>While in principle this method is attractive, it is computationally difficult and does not convince everyone--the faculty member I was talking to agreed that in principle this method is attractive, but that the results would be more credible with an instrumental variable that affects the probability of being assigned to a particular program.  However, he also pointed out the value of these structural models--they provide estimates that may be valid over a broader range of values and can be used to do policy experiments that one may not be able to do with a model identified by instrumental variables.</p>]]>
    </content>
</entry>

<entry>
    <title>Physics of politics</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/physics_of_poli.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1225" title="Physics of politics" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1225</id>
    
    <published>2009-10-28T10:58:45Z</published>
    <updated>2009-10-28T13:31:59Z</updated>
    
    <summary>A physicist recently emailed me asking if I could help him access election data; he sent me one of his papers, which (to my astonishment) began &quot;Most of the empirical electoral studies conducted by physicists . . .&quot;, followed by...</summary>
    <author>
        <name>Andy Eggers</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>A physicist recently emailed me asking if I could help him access election data; he sent me one of his papers, which (to my astonishment) began "Most of the empirical electoral studies conducted by physicists . . .", followed by a string of citations. I had no idea physicists were studying elections! I suppose I should have known; from what my biologist friend tells me, physicists have been colonizing his field the way economists have done to much of social science. So I guess politics was next.</p>

<p>Reading a few articles in the "physics of politics" as a political scientist, one has the sense of observing an alternate universe. For example: <a href="http://www.staff.uni-mainz.de/schneidj/papers/paperhic.pdf">a paper</a> on the effect of election results on party membership in Germany that has no references to work outside of physics; features many exotic (to me at least) terms like Wegscheider potentials, the Sznajd model, and the Kronecker symbol; and takes a time-series approach to causation that I suspect would be unacceptable to most reviewers in political science and economics these days. </p>

<p>In general, it's clear that physicists doing work on political phenomena (or "sociophysics" more generally) are primarily interested in exploring the individual-level social interactions that might underpin the macro-order we observe in, e.g., regularities in turnout or vote share distributions. As such, political institutions (which are the major preoccupation of political scientists) necessarily disappear from the model and are typically not even mentioned, even when they would seem to be of first-order importance in explaining a particular phenomenon. (Another example of the alternate universe: <a href="http://arxiv.org/pdf/nlin/0405002v1">a paper</a> that argues that party vote shares in Indonesia follow a power law, but which does not describe or mention the electoral system.) These omissions seem foolish on first reading, but it's clear that they reflect a different choice of explanatory variable: physicists seek their explanations in micro-interactions, and we seek them primarily in political institutions. It's probably both of course, but models can only be so complex.</p>

<p>Despite my overall sense of disorientation in reading these papers, there were also somewhat surprising moments of familiarity. Physics heavily influenced economics in an earlier period of colonization, and much of what we read in economics and political science descended from those models. In reading these newer physics papers, there is therefore a sense of distant kinship, the knowledge of a common ancestor several generations back. </p>

<p>I wonder about the scope for collaboration between physicists and social scientists. Based on my admittedly very cursory reading of one area in which physicists have ventured, it's hard to know whether the potential gains from trade are sufficient to overcome the apparent difference in goals. For all I know there already is a lot of productive collaboration going on -- if you know of something interesting, share it in the comments!</p>]]>
        
    </content>
</entry>

<entry>
    <title>Tchetgen on &quot;Doubly robust estimation in a semi-parametric odds ratio model&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/tchetgen_on_dou.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1222" title="Tchetgen on &quot;Doubly robust estimation in a semi-parametric odds ratio model&quot;" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1222</id>
    
    <published>2009-10-26T15:10:52Z</published>
    <updated>2009-10-26T15:13:59Z</updated>
    
    <summary>This Wednesday, October 28th, the Applied Statistics workshop will welcome Eric Tchetgen Tchetgen, Assistant Professor of Epidemiology at Harvard School of Public Health, presenting his work titled &quot;Doubly robust estimation in a semi-parametric odds ratio model.&quot; Eric has provided the...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
        <category term="Abstracts" />
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>This Wednesday, October 28th, the Applied Statistics workshop will welcome <a href="http://www.hsph.harvard.edu/faculty/eric-tchetgen-tchetgen/">Eric Tchetgen Tchetgen</a>, Assistant Professor of Epidemiology at Harvard School of Public Health, presenting his work titled "Doubly robust estimation in a semi-parametric odds ratio model." Eric has provided the following abstract for the paper:</p>

<blockquote>
We consider the doubly robust estimation of the parameters in a semi-parametric conditional odds ratio model characterizing the effect of an exposure in the presence of many confounders. We develop estimators that are consistent and asymptotically normal in a union model where either a prospective baseline density function or a retrospective baseline density function is correctly specified but not necessarily both. The case of a binary outcome is of particular interest, then our approach yields a doubly robust locally efficient estimator in a semi-parametric logistic regression model   For general types of outcomes, we provide a strategy to obtain doubly robust estimators that are nearly locally efficient   We illustrate the method in a simulation study and an application in statistical genetics. Finally, we briefly discuss extensions of the proposed method to the semi-parametric estimation of a parameter indexing an interaction between two exposures on the logistic scale, as well as extensions to the setting of a time-varying exposure in the presence of time-varying confounding.
</blockquote>

<p>The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it.<br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Sources of Randomness</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/sources_of_rand.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1220" title="Sources of Randomness" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1220</id>
    
    <published>2009-10-23T21:20:28Z</published>
    <updated>2009-10-23T22:35:01Z</updated>
    
    <summary>During a recent conversation with some colleagues regarding data sources, an interesting point was made that left me pondering. One member of our group stated that he would not trust a particular source of data to provide useful estimates of...</summary>
    <author>
        <name>Deirdre Bloome</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>During a recent conversation with some colleagues regarding data sources, an interesting point was made that left me pondering. One member of our group stated that he would not trust a particular source of data to provide useful estimates of population means, but he would trust it to estimate regression coefficients. This puzzled me, because a regression coefficient is a (perhaps slightly fancy) version of a mean. Why, then, would a data source that cannot be trusted for a simple average be useful for a coefficient? </p>]]>
        <![CDATA[<p>I think the answer lies in the assumed source of randomness. When we make inferences from our sample data to a wider universe of cases, there are two sources of randomness involved: probabilities introduced through the sampling design and probabilities introduced through an assumed stochastic model underlying our observed data. In the first case, we are interested in the existing finite population and our outcome of interest Y is regarded as fixed; randomness is introduced through the sample inclusion probabilities. In the second case, we are interested in a broader "superpopulation" which we posit is generated through some random process, and thus our outcome Y is regarded as a random variable. In much of social science, researchers are interested in this second source of randomness. Hypotheses center around parameters associated with the probability distribution for Y - such as regression coefficients. </p>

<p>Identifying the sources of randomness underlying our data is important, because they have implications for our analysis. <a href="http://books.google.com/books?id=ufdONK3E1TcC&printsec=frontcover&dq=S%C3%A4rndal,+Swensson,+and+Wretman#v=onepage&q=&f=false">Särndal, Swensson, and Wretman</a> show that the variance of a parameter from a ordinary regression model estimated using sample data can be decomposed into two elements, one based on the sampling design and one based on the model. In the case of a census, the extra variance introduced from the design is zero, and thus the total variance of the estimated parameter is the variance of the <a href="http://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem">"BLUE"</a> estimator. Otherwise, accounting for the sampling design in the analysis should improve inference. <br />
</p>]]>
    </content>
</entry>

<entry>
    <title>Multiple Instruments</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/multiple_instru.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1215" title="Multiple Instruments" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1215</id>
    
    <published>2009-10-21T16:41:23Z</published>
    <updated>2009-10-21T16:41:32Z</updated>
    
    <summary>I recently found a paper by Angus Deaton that attempts to (1) discount the usefulness of instrumental variables for making causal inferences in development economics and (2) discount the usefulness of field experiments. He has definitely stirred the pot a...</summary>
    <author>
        <name>Richard Nielsen</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>I recently found a <a href="http://www.princeton.edu/~deaton/downloads/Instruments%20of%20development%20v1d_mar09_all.pdf">paper</a> by Angus Deaton that attempts to (1) discount the usefulness of instrumental variables for making causal inferences in development economics and (2) discount the usefulness of field experiments.  He has definitely stirred the pot a little and is now part of an interesting <a href="http://pantheon.yale.edu/~dt6/thedebate.html">debate</a>, although the discussion seems to be more focused on Deaton's controversial claims about experiments.</p>]]>
        <![CDATA[<p>I think Deaton overlooks some of the benefits of experimental research but his criticism of instrumental variables seems dead on, especially on the use of multiple instruments (see pages 12-13).  Intuitively, we might think that having many instruments makes for better causal inference -- if one doesn't work out, then the others will pick up the slack.  Following this logic, studies that use multiple instruments and "test" for exogeneity with overidentification tests have become popular in the instrumental variables literature.  Essentially, these tests boil down to re-estimating the model with subsets of the instruments and showing that the estimated coefficients don't change dramatically.  This can mean one of two things: (a) not just one, but all of the instruments are exogenous, or (b) not just one, but all of the instruments are endogenous.  Personally, I think the probability of finding even a single good instrument for a given problem is small, so when shown a research design with multiple instruments, I need some serious convincing that miraculously <em>all</em> of the instruments are valid.</p>

<p>I am probably overly skeptical and I am very sympathetic to heroic attempts to solve difficult problems of causal inference to answer important questions.  Still, it seems that having multiple instruments can become an embarrassment of riches.  A good instrument is so hard to come by that having too many starts to lend evidence against an empirical argument rather than for it.</p>]]>
    </content>
</entry>

<entry>
    <title>Elements of Statistical Learning (Online)</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/elements_of_sta.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1216" title="Elements of Statistical Learning (Online)" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1216</id>
    
    <published>2009-10-20T14:15:51Z</published>
    <updated>2009-10-20T14:34:37Z</updated>
    
    <summary>In case you had not already heard, Trevor Hastie, Robert Tibshirani, and Jerome Friedman have put a PDF copy of the second edition of their excellent text Elements of Statistical Learning on the book&apos;s website. I am sure many of...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>In case you had not already heard, Trevor Hastie, Robert Tibshirani, and Jerome Friedman have put a PDF copy of the second edition of their excellent text <a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn/">Elements of Statistical Learning</a> on the book's website. I am sure many of you already own it, but a searchable version for the laptop is incredibly useful. The second edition has a lot of new content, including completely new chapters on Random Forests, Ensemble Learning, Undirected Graphical Models, and High-Dimensional Problems. </p>

<p>While a copy on your computer is very handy, a desk copy of this book is essential if you are interested in machine learning or data mining. The book is also a sight to behold. You can buy a copy at <a href="http://www.amazon.com/gp/product/0387848576/ref=s9_simz_gw_s0_p14_i1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-2&pf_rd_r=0Q2PXEQXRZ2AYKA0R27H&pf_rd_t=101&pf_rd_p=470938631&pf_rd_i=507846">Amazon</a> or <a href="http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-84857-0">Springer</a>. </p>]]>
        
    </content>
</entry>

<entry>
    <title>Eggers on &quot;Electoral Rules, Opposition Scrutiny, and Policy Moderation in French Municipalities&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/eggers_on_elect.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1214" title="Eggers on &quot;Electoral Rules, Opposition Scrutiny, and Policy Moderation in French Municipalities&quot;" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1214</id>
    
    <published>2009-10-19T23:21:35Z</published>
    <updated>2009-10-19T23:24:34Z</updated>
    
    <summary>Please join us this Wednesday October 21st when we will have a change in the schedule. We are happy to have Andy Eggers (Department of Government) presenting a talk titled &quot;Electoral Rules, Opposition Scrutiny, and Policy Moderation in French Municipalities:...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
        <category term="Abstracts" />
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>Please join us this Wednesday October 21st when we will have a change in the schedule. We are happy to have Andy Eggers (Department of Government) presenting a talk titled "Electoral Rules, Opposition Scrutiny, and Policy Moderation in French Municipalities: An Application of the Regression Discontinuity Design." Andy has provided the following abstract for his talk:</p>

<blockquote>
Regression discontinuity design (RDD) is a powerful and increasingly popular approach to causal inference that can be applied when treatment is assigned deterministically based on a continuous covariate. In this talk, I will present an application of RDD from French municipalities, where the system of electing the municipal council depends on whether the city's population is above or below 3500. First I show that cities above the population cutoff have fewer uncontested elections and more opposition representation on municipal councils, consistent with expectations. I then trace the effect of these political changes -- which amount to a heightening of the scrutiny imposed on the mayor -- on policy outcomes, providing evidence that more opposition scrutiny leads to more moderate policy.
</blockquote>

<p>The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it.</p>]]>
        
    </content>
</entry>

<entry>
    <title>The Fundamental Regret of Causal Inference</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/the_fundamental.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1211" title="The Fundamental Regret of Causal Inference" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1211</id>
    
    <published>2009-10-14T20:19:59Z</published>
    <updated>2009-10-14T20:38:37Z</updated>
    
    <summary>Tim Kreider at the New York Times has a short piece on what he dubs &quot;The Referendum&quot; and how it plagues us: The Referendum is a phenomenon typical of (but not limited to) midlife, whereby people, increasingly aware of the...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
        <category term="Regular Post" />
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>Tim Kreider at the New York Times has <a href="http://happydays.blogs.nytimes.com/2009/09/17/the-referendum/">a short piece</a> on what he dubs "The Referendum" and how it plagues us: </p>

<blockquote>
The Referendum is a phenomenon typical of (but not limited to) midlife, whereby people, increasingly aware of the finiteness of their time in the world, the limitations placed on them by their choices so far, and the narrowing options remaining to them, start judging their peers' differing choices with reactions ranging from envy to contempt. ...Friends who seemed pretty much indistinguishable from you in your 20s make different choices about family or career, and after a decade or two these initial differences yield such radically divergent trajectories that when you get together again you can only regard each other's lives with bemused incomprehension.
</blockquote>

<p>Those familiar with casual inference will recognize this as stemming from the Fundamental Problem of Causal Inference: we cannot observe, for one individual, both their response to treatment and control. The article is an elegant look at how we grow to worry about those mysterious missing potential outcomes--the paths we didn't choose--and how we use our friends' lives to impute those missing missing outcomes. Kreider goes on to make this point exactly, with a beautiful quote from a novel:</p>

<blockquote>The problem is, we only get one chance at this, with no do-overs. Life is, in effect, a non-repeatable experiment with no control. In his novel about marriage, "Light Years," James Salter writes: "For whatever we do, even whatever we do not do prevents us from doing its opposite. Acts demolish their alternatives, that is the paradox." Watching our peers' lives is the closest we can come to a glimpse of the parallel universes in which we didn't ruin that relationship years ago, or got that job we applied for, or got on that plane after all. It's tempting to read other people's lives as cautionary fables or repudiations of our own.</blockquote>

<p>Perhaps the only response is that, while so close to us in so many respects, friends may be poor matches for gauging these kinds of effects. In any case, "Acts demolish their alternatives, that is the paradox" is the best description of the problem of causal inference that I have seen.  </p>]]>
        
    </content>
</entry>

<entry>
    <title>An on &quot;Bayesian Propensity Score Estimation&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/an_on_bayesian.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1210" title="An on &quot;Bayesian Propensity Score Estimation&quot;" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1210</id>
    
    <published>2009-10-13T04:53:07Z</published>
    <updated>2009-10-13T04:56:43Z</updated>
    
    <summary> We hope you can join us at the Applied Statistics workshop this Wednesday, October 14th at 12 noon, when we will be happy to have Weihua An, a graduate student in the Sociology Department here at Harvard. Weihua will...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p><br />
We hope you can join us at the Applied Statistics workshop this Wednesday, October 14th at 12 noon, when we will be happy to have Weihua An, a graduate student in the Sociology Department here at Harvard. Weihua will be presenting "Bayesian Propensity Score Estimators: Simulations and Applications." He has provided the following abstract:<br />
<blockquote><br />
Despite their popularity, conventional propensity score estimators (PSEs) do not take into account the estimation uncertainties in the propensity score into causal inference. This paper develops Bayesian propensity score estimators (BPSEs) to model the joint likelihood of both the outcome and the propensity score in one step, which naturally incorporate such uncertainties into causal inference. Simulations show that PSEs treating estimated propensity scores as if they were known will overestimate the variation in treatment e_ects and result in overly conservative inference, whereas BPSEs will provide corrected variance estimation and valid inference. Compared to other direct adjustment methods (E.g., Abadie and Imbens 2009), BPSEs are guaranteed to provide positive variance estimation, more reliable in small samples, and more flexible to contain complex propensity score models. To illustrate the proposed methods, BPSEs are applied to evaluating a job training program.<br />
</blockquote><br />
The workshop will be in room K354 of CGIS, 1737 Cambridge St. The workshop starts at noon and usually wraps up around 1:30. There will be a light lunch.  We hope you can make it. <br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Tom Coburn can backward induce</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/tom_coburn_can.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1209" title="Tom Coburn can backward induce" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1209</id>
    
    <published>2009-10-09T16:21:30Z</published>
    <updated>2009-10-09T22:27:21Z</updated>
    
    <summary>We are a few days late to comment on the story of Senator Tom Coburn&apos;s amendment to the Commerce, Justice and Science Appropriations Bill to cut all National Science Foundation funding for the political science program and any of its...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>We are a few days late to comment on the story of Senator Tom Coburn's <a href="http://coburn.senate.gov/public/index.cfm?FuseAction=Files.View&FileStore_id=82180b1f-a03e-4600-a2e5-846640c2c880">amendment</a> to the Commerce, Justice and Science Appropriations Bill to cut all National Science Foundation funding for the political science program and any of its missions. Choice quote (of which there are many): "...it is difficult, even for the most creative scientist, to link NSF's political science findings to the advancement of cures to cancer or any other disease." Snap. </p>

<p>This has <a href="http://www.themonkeycage.org/2009/10/nsf_political_science_funding.html">received</a> <a href="http://crookedtimber.org/2009/10/07/tom-coburn-doesnt-like-political-science/">attention</a> from the <a href="http://www.apsanet.org/content_67297.cfm">social</a> <a href="http://www.petition2congress.com/2/2508/keep-nsf-political-science-program/">science</a> <a href="http://drezner.foreignpolicy.com/posts/2009/10/07/tom_coburn_picks_on_political_science">community</a> and <a href="http://www.politico.com/news/stories/1009/28056.html">others</a>. Even Paul Krugman, mentioned in Coburn's press release as an example of (wasteful? political?) NSF funding, has <a href="http://krugman.blogs.nytimes.com/2009/10/08/the-coburn-amendment/">something to say about it</a>. There's no need to rehash the arguments here, which ever-so-nicely point out that Senator Coburn doesn't really know what he's talking about nor do his arguments make a whole lot of sense. </p>

<p>Regardless of the arguments, I just wanted to put a graph up to put all of this in perspective. In the 111th Congress, Coburn has had very little success with his amendments:<br />
<span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="coburn.png" src="http://www.iq.harvard.edu/blog/sss/coburn.png" width="400" height="500" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span><br />
Seven of the rejections are instances when Coburn's amendment was tabled without discussion. Most of the rejections have been of proposed budget cuts or banning funds from certain projects And this is just in this year. Out of all the roll call votes on Coburn-sponsored amendments in the Senate over his tenure, only 8 out of 68 have actually passed. </p>

<p>I understand trying to tackle his critiques, as they track with an internal debate already in the discipline. But I think it may be a tad knee-jerk to start letter-writing campaigns to our Senators. Tom Coburn knows that putting out no-win amendments is a great way to take positions in the Senate without committing to anything. Minority amendments are a costless signal of the blandest kind--even a political scientist can see that. </p>]]>
        
    </content>
</entry>

<entry>
    <title>Criminal tricks and sugary treats</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/sugary_treats_a.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1205" title="Criminal tricks and sugary treats" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1205</id>
    
    <published>2009-10-06T17:48:04Z</published>
    <updated>2009-10-06T22:24:36Z</updated>
    
    <summary>Just in time for Halloween, a study from the British Journal of Psychiatry by Moore, Carter and van Goozen that uses data from the British Cohort Study to estimate the effect of daily candy intake on adult violent behavior. They...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>Just in time for Halloween, a <a href="http://www.groundreport.com/Health_and_Science/Does-eating-candy-in-childhood-lead-to-violence-in_1/2908649">study</a> from the British Journal of Psychiatry by Moore, Carter and van Goozen that uses data from the <a href="http://www.cls.ioe.ac.uk/studies.asp?section=000100020002">British Cohort Study</a> to estimate the effect of daily candy intake on adult violent behavior. </p>

<p>They find that 10 year olds that ate candy daily were much more likely to be convicted of a violent crime at age 34 than those who did not eat candy daily. They cite this as evidence that childhood diet has an effect on adult behavior. One of their hypothesized mechanisms is that using candy as a reward for children (e.g. for behavior modification) inhibits the child's ability to delay gratification. And there is evidence that children that posses problems with delayed gratification tend to score lower on a host of measures, including the SATs (see also: <a href="http://en.wikipedia.org/wiki/Marshmallow_experiment">the marshmallow studies</a>). </p>

<p>The longitudinal data gives them leverage. For instance, the authors are able to control for parenting style at age 5 along with other variables, such as various scales of behavior problems or mental abilities at age 5 (some of these were discarded in the final analysis because of their variable selection rules). These ease my main concern that "problem children" might lead to a certain type of parenting and also indicate a propensity for violent adult behavior. Their controls help to eliminate this possibility (though, I will say that I am not familiar with this literature and they use fairly complicated scales to measure these concepts). </p>

<p>Strangely, at least to me, they do not seem to control for parental income or socio-economic class. I have a few ideas as to why this might matter. First, candy is relatively cheap compared to a good diet, thus poorer families might be forced to choose the cheaper option when feeding their children. Second, financial pressures lead to time pressures, which could force parents to take shortcuts--feeding their children junk food because it is quick or using it to induce behavior because it is easy. Thus, parental income may matter greatly for candy intake and it also may increase propensity to commit violent crimes. I am not certain this is true, but it seems plausible and unmentioned in the paper. Even if the finding is not causal, however, it is still interesting. </p>]]>
        
    </content>
</entry>

<entry>
    <title>Robins on &quot;Optimal Treatment Regimes&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/robins_on_optim.shtml" />
    <link rel="service.edit" type="application/atom+xml" href="https://blogs.hmdc.harvard.edu/mt/mt-atom.cgi/weblog/blog_id=2/entry_id=1203" title="Robins on &quot;Optimal Treatment Regimes&quot;" />
    <id>tag:www.iq.harvard.edu,2009:/blog/sss//2.1203</id>
    
    <published>2009-10-05T15:31:09Z</published>
    <updated>2009-10-05T15:45:02Z</updated>
    
    <summary>Please join us this Wednesday, October 7th at the Applied Statistics workshop when we will be happy to have Jamie Robins, the Mitchell L. and Robin LaFoley Dong Professor of Epidemiology here at Harvard, who will be presenting on &quot;Estimation...</summary>
    <author>
        <name>Matt Blackwell</name>
        
    </author>
    
        <category term="Abstracts" />
    
    <content type="html" xml:lang="" xml:base="http://www.iq.harvard.edu/blog/sss/">
        <![CDATA[<p>Please join us this Wednesday, October 7th at the Applied Statistics workshop when we will be happy to have <a href="http://www.biostat.harvard.edu/~robins/research.html">Jamie Robins</a>, the Mitchell L. and Robin LaFoley Dong Professor of Epidemiology here at Harvard, who will be presenting on "Estimation of Optimal Treatment Strategies from Observational Data with Dynamic Marginal Structural Models." Jamie has passed along a <a href="http://www.people.fas.harvard.edu/~blackwel/extrapolation.pdf">related paper</a> with the following abstract:</p>

<blockquote>
We review recent developments in the estimation of an optimal treatment strategy or regime from longitudinal data collected in an observational study. We also propose novel methods for using the data obtained from an observational database in one health-care system to determine the optimal treatment regime for biologically similar subjects in a second health-care system when, for cultural, logistical, or financial reasons, the two health-care systems differ (and will continue to differ) in the frequency of, and reasons for, both laboratory tests and physician visits. Finally, we propose a novel method for estimating the optimal timing of expensive and/or painful diagnostic or prognostic tests. Diagnostic or prognostic tests are only useful in so far as they help a physician to determine the optimal dosing strategy, by providing information on both the current health state and the prognosis of a patient because, in contrast to drug therapies, these tests have no direct causal effect on disease progression. Our new method explicitly incorporates this no direct effect restriction.
</blockquote>

<p>A <a href="http://www.people.fas.harvard.edu/~blackwel/extrapolation.pdf">copy of the paper</a> is also available. </p>

<p>The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it.<br />
</p>]]>
        
    </content>
</entry>

</feed> 

