<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Social Science Statistics Blog</title>
<link>http://www.iq.harvard.edu/blog/sss/</link>
<description></description>
<copyright>Copyright 2009</copyright>
<lastBuildDate>Tue, 17 Nov 2009 13:49:27 -0500</lastBuildDate>
<generator>http://www.movabletype.org/?v=4.24-en</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 


<item>
<title>Dynamic Panel Models</title>
<description><![CDATA[<p>I have been toying around with dynamic panel models from the econometrics literature and I have hit my head up against a key set of assertions. First, a quick setup. The idea with these models is that we have a set units which we measure at different points in time. For instance, perhaps we survey a group of people multiple times in the course of an election and ask them how they are going to vote, do they plan to vote, how do they rate the candidates, etc. We might then want to know how these answers vary over time or with certain covariates. </p>

<p>Here is a typical model:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="eq-one.gif" src="http://www.iq.harvard.edu/blog/sss/eq-one.gif" width="320" height="30" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p>There are two typical features of these models that seem relevant. First, most include a lagged dependent variable (LDV) to account for persistence in the responses. If I was going to vote for McCain the last time you called, I'll probably still want to do that this time. Makes sense. Second, we include a unit-specific effect, alpha, to account for all other relevant factors. Dynamic panel models tend to identify their effects with a simple differencing by running the following model:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="eq-two.gif" src="http://www.iq.harvard.edu/blog/sss/eq-two.gif" width="360" height="30" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p>Which eliminates the unit-specific effect by the differencing, but our parameters remain, ready to be estimated. I should note that there are some identification issues left to solve and the differences between estimators in this field mostly have to do with how to instrument for the differenced LDV.</p>

<p>Reading these models, I have two questions. One, is there a reason to expect that we need both a LDV and a unit-specific effect? This means that we expect that there is a shock to a unit's dependent variable that is constant across periods. I find this a strange assumption. I understand a unit-specific shock to the <em>initial</em> level and then using LDV thereafter, but in every period? </p>

<p>Two, the entire identification strategy here is based on the additivity of the model, correct? If we were to draw a directed acyclic graph of these models, it would be trivially obvious that we could never identify this model nonparametrically. I understand that we sometimes need to use models to identify effects, but should these identifications depend so heavily on the functional form? It seems that this problem is tied up in the first. We are allowing for the unit-specific effect as a way to free the model of unnecessary assumptions, yet this forces our hand into making different, perhaps stronger assumption to get identification. </p>

<p>Please clear up my confusion in the comments if you are more in the know. </p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/11/dynamic_panel_m.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/11/dynamic_panel_m.shtml</guid>
         
<pubDate>Tue, 17 Nov 2009 13:49:27 -0500</pubDate>
</item>

<item>
<title>Greiner on &quot;Exit Polling and Racial Bloc Voting&quot;</title>
<description><![CDATA[<p>Please join us at the Applied Statistics workshop this Wednesday, November 18th at 12 noon when we will be happy to have <a href="http://www.law.harvard.edu/faculty/directory/index.html?id=705">Jim Greiner</a> of the Harvard Law School presenting on <a href="http://isites.harvard.edu/fs/docs/icb.topic646669.files/RxCEcolInfWithEP.pdf">"Exit Polling and Racial Bloc Voting: Combining Individual-Level and R x C Ecological Data."</a> Jim has provided a companion paper with the following abstract:<br />
<blockquote><br />
Despite its shortcomings, cross-level or ecological inference remains a necessary part of many areas of quantitative inference, including in United States voting rights litigation. Ecological inference suffers from a lack of identification that, most agree, is best addressed by incorporating individual-level data into the model. In this paper, we test the limits of such an incorporation by attempting it in the context of drawing inferences about racial voting patterns using a combination of an exit poll and precinct-level ecological data; accurate information about racial voting patterns is needed to trigger voting rights laws that can determine the composition of United States legislative bodies. Specifically, we extend and study a hybrid model that addresses two-way tables of arbitrary dimension. We apply the hybrid model to an exit poll we administered in the City of Boston in 2008. Using the resulting data as well as simulation, we compare the performance of a pure ecological estimator, pure survey estimators using various sampling schemes, and our hybrid. We conclude that the hybrid estimator offers substantial benefits by enabling substantive inferences about voting patterns not practicably available without its use.<br />
</blockquote></p>

<p><br />
Both the <a href="http://isites.harvard.edu/fs/docs/icb.topic646669.files/RxCEcolInfWithEP.pdf">paper</a> and the <a href="http://isites.harvard.edu/fs/docs/icb.topic646669.files/TexAppForAnnalsAppStat.pdf">technical appendix</a> are on the course website. </p>

<p>The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it. </p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/11/greiner_on_exit.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/11/greiner_on_exit.shtml</guid>
         
<pubDate>Mon, 16 Nov 2009 09:00:00 -0500</pubDate>
</item>

<item>
<title>Choosing variances in general linear models</title>
<description><![CDATA[<p>Today I'm going to talk about a particular problem from my own research and will outline a method for choosing variances in general linear models (GLMs), but I am also asking a question.</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/11/choosing_varian.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/11/choosing_varian.shtml</guid>
         
<pubDate>Thu, 12 Nov 2009 21:17:29 -0500</pubDate>
</item>

<item>
<title>Bill Support by Page Length</title>
<description><![CDATA[<p>There was a lot of press on the 1,000+-page length of the House health care bill, H.R. 3962. That got me thinking... didn't we hear the same thing about the stimulus bill and the Patriot Act? Aren't most "controversial" bills also very long?</p>

<p>It would make sense. Controversial bills require a lot more ink -- pork, special cases, exceptions -- to reel in support. Uncontroversial bills can be written succinctly and pass as is.</p>

<p>To assess this I scraped bills from <a href="http://www.opencongress.org/">OpenCongress</a>, which maintains the full text, voting results and amendment history of House and Senate Resolutions. You can even comment on specific portions of bills. There's already a bunch of neat comments on potential loopholes in H.R. 3962.</p>

<p>I downloaded the text and voting results for all 152 House resolutions passed by the 111th House. A boxplot of page length against support appears below. Each page length group represents roughly 20% of House resolutions. The plot shows the suspected trend, that longer bills have less support. One-page bills almost always pass unanimously!</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><a href="http://www.iq.harvard.edu/blog/sss/assets_c/2009/11/SupportByLength-82.shtml" onclick="window.open('http://www.iq.harvard.edu/blog/sss/assets_c/2009/11/SupportByLength-82.shtml','popup','width=500,height=500,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www.iq.harvard.edu/blog/sss/assets_c/2009/11/SupportByLength-thumb-500x500-82.png" width="500" height="500" alt="SupportByLength.png" class="mt-image-none" style="" /></a></span></p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/11/bill_support_by.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/11/bill_support_by.shtml</guid>
         
<pubDate>Thu, 12 Nov 2009 10:12:32 -0500</pubDate>
</item>

<item>
<title>Answering &quot;why&quot; questions</title>
<description><![CDATA[<p>Brandon Stewart pointed me to an interesting <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/continuing_puzz.html">blog post</a> by Andrew Gelman that touches on the issue of explaining the "causes of effects."  The basic point is that "why" questions are difficult to answer in a potential outcomes framework but often we really care about them.  <a href="http://cas.uchicago.edu/workshops/cpolit/papers/mahoney.pdf">Some folks</a> in political science have gone so far as to argue that researchers using "qualitative" methods are more inclined (and better able) to tackle these "why" questions than their "quantitative" colleagues who mostly focus on "effects of causes."</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/11/looking_for_the.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/11/looking_for_the.shtml</guid>
         
<pubDate>Wed, 11 Nov 2009 16:30:56 -0500</pubDate>
</item>

<item>
<title>Just in time for &quot;Superfreakonomics&quot;</title>
<description><![CDATA[<p>A friend recently pointed me to a 2007 New Republic <a href="http://www.tnr.com/article/freaks-and-geeks-how-freakonomics-ruining-the-dismal-science?page=0,0">article</a> in which the author, Noam Scheiber, argues that the "Freakonomics" phenomenon is lamentable because it represents a trend toward research in which clever identification strategies are prized over attempts to answer what Scheiber calls "truly deep questions." Although two years and the publication date of a second Levitt and Dubner book have since passed, the article caught my attention because I have been considering a related issue of late. We are all well aware of how difficult it is to make causal inferences in the social sciences, so it is not surprising that researchers are drawn to settings in which some source of exogenous variation allows for identification of the influence of a specific causal factor. In fact, progress on those "truly deep questions" depends in part on this type of work. However, focus on clean identification has some potentially negative implications. Scheiber names one: answering questions of peripheral interest. A second, which is of greater concern for me, is concentrating on population subgroups that may or may not be of scientific interest in and of themselves and that, in either case, are unable to provide direct insights into broader population dynamics. </p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/11/just_in_time_fo.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/11/just_in_time_fo.shtml</guid>
         
<pubDate>Sat, 07 Nov 2009 20:02:19 -0500</pubDate>
</item>

<item>
<title>Airoldi on &quot;A statistical perspective on complex networks&quot;</title>
<description><![CDATA[<p>I hope you can join us at the Applied Statistics Workshop this Wednesday, November 4th, when we will be happy to have <a href="http://www.people.fas.harvard.edu/~airoldi/">Edo Airoldi</a>, Assistant Professor in the Department of Statistics here at Harvard. Edo will be presenting a talk entitled "A statistical perspective on complex networks" for which he has provided the following abstract:<br />
<blockquote><br />
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of science, as many scientific inquiries involve collections of measurements on pairs of objects. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. In this talk, I will review a few ideas that are central to this burgeoning literature. I will emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. I will conclude by describing open problems and challenges for machine learning and statistics.<br />
</blockquote><br />
The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it.</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/11/airoldi_on_a_st.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/11/airoldi_on_a_st.shtml</guid>
         
<pubDate>Tue, 03 Nov 2009 10:47:44 -0500</pubDate>
</item>

<item>
<title>Happy Halloween</title>
<description><![CDATA[<p>It made my day when <a href="http://www.mathstat.uottawa.ca/~rsmith/Zombies.pdf">this</a> showed up in my inbox this morning.  I'm glad to see <em>someone</em> knows what to do if/when the zombie outbreak occurs.<br />
</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/happy_halloween.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/happy_halloween.shtml</guid>
         
<pubDate>Fri, 30 Oct 2009 10:05:40 -0500</pubDate>
</item>

<item>
<title>Matching Markets</title>
<description><![CDATA[<p>Rich's <a href="http://www.iq.harvard.edu/blog/sss/archives/2009/10/multiple_instru.shtml">post</a> on instruments the other day reminded me of a conversation that I've been having with a faculty member; although the connection may not be particularly clear, at least at first.</p>

<p>The setup is that there are many markets in which buyers and sellers are distinct types of actors, for example, the market for spouses has, until recently, been such a market (although I make no claim as to which side of the market is buying and which is selling).  This market, in the form of college applications, was analyzed by Gale and Shapley in a famous 1962 <a href="http://www.jstor.org/pss/2312726">paper</a> in which they proved that there was a solution to this type of matching problem.</p>

<p><br />
</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/matching_market.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/matching_market.shtml</guid>
         
<pubDate>Thu, 29 Oct 2009 16:50:14 -0500</pubDate>
</item>

<item>
<title>Physics of politics</title>
<description><![CDATA[<p>A physicist recently emailed me asking if I could help him access election data; he sent me one of his papers, which (to my astonishment) began "Most of the empirical electoral studies conducted by physicists . . .", followed by a string of citations. I had no idea physicists were studying elections! I suppose I should have known; from what my biologist friend tells me, physicists have been colonizing his field the way economists have done to much of social science. So I guess politics was next.</p>

<p>Reading a few articles in the "physics of politics" as a political scientist, one has the sense of observing an alternate universe. For example: <a href="http://www.staff.uni-mainz.de/schneidj/papers/paperhic.pdf">a paper</a> on the effect of election results on party membership in Germany that has no references to work outside of physics; features many exotic (to me at least) terms like Wegscheider potentials, the Sznajd model, and the Kronecker symbol; and takes a time-series approach to causation that I suspect would be unacceptable to most reviewers in political science and economics these days. </p>

<p>In general, it's clear that physicists doing work on political phenomena (or "sociophysics" more generally) are primarily interested in exploring the individual-level social interactions that might underpin the macro-order we observe in, e.g., regularities in turnout or vote share distributions. As such, political institutions (which are the major preoccupation of political scientists) necessarily disappear from the model and are typically not even mentioned, even when they would seem to be of first-order importance in explaining a particular phenomenon. (Another example of the alternate universe: <a href="http://arxiv.org/pdf/nlin/0405002v1">a paper</a> that argues that party vote shares in Indonesia follow a power law, but which does not describe or mention the electoral system.) These omissions seem foolish on first reading, but it's clear that they reflect a different choice of explanatory variable: physicists seek their explanations in micro-interactions, and we seek them primarily in political institutions. It's probably both of course, but models can only be so complex.</p>

<p>Despite my overall sense of disorientation in reading these papers, there were also somewhat surprising moments of familiarity. Physics heavily influenced economics in an earlier period of colonization, and much of what we read in economics and political science descended from those models. In reading these newer physics papers, there is therefore a sense of distant kinship, the knowledge of a common ancestor several generations back. </p>

<p>I wonder about the scope for collaboration between physicists and social scientists. Based on my admittedly very cursory reading of one area in which physicists have ventured, it's hard to know whether the potential gains from trade are sufficient to overcome the apparent difference in goals. For all I know there already is a lot of productive collaboration going on -- if you know of something interesting, share it in the comments!</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/physics_of_poli.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/physics_of_poli.shtml</guid>
         
<pubDate>Wed, 28 Oct 2009 06:58:45 -0500</pubDate>
</item>

<item>
<title>Tchetgen on &quot;Doubly robust estimation in a semi-parametric odds ratio model&quot;</title>
<description><![CDATA[<p>This Wednesday, October 28th, the Applied Statistics workshop will welcome <a href="http://www.hsph.harvard.edu/faculty/eric-tchetgen-tchetgen/">Eric Tchetgen Tchetgen</a>, Assistant Professor of Epidemiology at Harvard School of Public Health, presenting his work titled "Doubly robust estimation in a semi-parametric odds ratio model." Eric has provided the following abstract for the paper:</p>

<blockquote>
We consider the doubly robust estimation of the parameters in a semi-parametric conditional odds ratio model characterizing the effect of an exposure in the presence of many confounders. We develop estimators that are consistent and asymptotically normal in a union model where either a prospective baseline density function or a retrospective baseline density function is correctly specified but not necessarily both. The case of a binary outcome is of particular interest, then our approach yields a doubly robust locally efficient estimator in a semi-parametric logistic regression model   For general types of outcomes, we provide a strategy to obtain doubly robust estimators that are nearly locally efficient   We illustrate the method in a simulation study and an application in statistical genetics. Finally, we briefly discuss extensions of the proposed method to the semi-parametric estimation of a parameter indexing an interaction between two exposures on the logistic scale, as well as extensions to the setting of a time-varying exposure in the presence of time-varying confounding.
</blockquote>

<p>The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it.<br />
</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/tchetgen_on_dou.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/tchetgen_on_dou.shtml</guid>
         
              <category>Abstracts</category>
         
<pubDate>Mon, 26 Oct 2009 11:10:52 -0500</pubDate>
</item>

<item>
<title>Sources of Randomness</title>
<description><![CDATA[<p>During a recent conversation with some colleagues regarding data sources, an interesting point was made that left me pondering. One member of our group stated that he would not trust a particular source of data to provide useful estimates of population means, but he would trust it to estimate regression coefficients. This puzzled me, because a regression coefficient is a (perhaps slightly fancy) version of a mean. Why, then, would a data source that cannot be trusted for a simple average be useful for a coefficient? </p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/sources_of_rand.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/sources_of_rand.shtml</guid>
         
<pubDate>Fri, 23 Oct 2009 17:20:28 -0500</pubDate>
</item>

<item>
<title>Multiple Instruments</title>
<description><![CDATA[<p>I recently found a <a href="http://www.princeton.edu/~deaton/downloads/Instruments%20of%20development%20v1d_mar09_all.pdf">paper</a> by Angus Deaton that attempts to (1) discount the usefulness of instrumental variables for making causal inferences in development economics and (2) discount the usefulness of field experiments.  He has definitely stirred the pot a little and is now part of an interesting <a href="http://pantheon.yale.edu/~dt6/thedebate.html">debate</a>, although the discussion seems to be more focused on Deaton's controversial claims about experiments.</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/multiple_instru.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/multiple_instru.shtml</guid>
         
<pubDate>Wed, 21 Oct 2009 12:41:23 -0500</pubDate>
</item>

<item>
<title>Elements of Statistical Learning (Online)</title>
<description><![CDATA[<p>In case you had not already heard, Trevor Hastie, Robert Tibshirani, and Jerome Friedman have put a PDF copy of the second edition of their excellent text <a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn/">Elements of Statistical Learning</a> on the book's website. I am sure many of you already own it, but a searchable version for the laptop is incredibly useful. The second edition has a lot of new content, including completely new chapters on Random Forests, Ensemble Learning, Undirected Graphical Models, and High-Dimensional Problems. </p>

<p>While a copy on your computer is very handy, a desk copy of this book is essential if you are interested in machine learning or data mining. The book is also a sight to behold. You can buy a copy at <a href="http://www.amazon.com/gp/product/0387848576/ref=s9_simz_gw_s0_p14_i1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-2&pf_rd_r=0Q2PXEQXRZ2AYKA0R27H&pf_rd_t=101&pf_rd_p=470938631&pf_rd_i=507846">Amazon</a> or <a href="http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-84857-0">Springer</a>. </p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/elements_of_sta.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/elements_of_sta.shtml</guid>
         
<pubDate>Tue, 20 Oct 2009 10:15:51 -0500</pubDate>
</item>

<item>
<title>Eggers on &quot;Electoral Rules, Opposition Scrutiny, and Policy Moderation in French Municipalities&quot;</title>
<description><![CDATA[<p>Please join us this Wednesday October 21st when we will have a change in the schedule. We are happy to have Andy Eggers (Department of Government) presenting a talk titled "Electoral Rules, Opposition Scrutiny, and Policy Moderation in French Municipalities: An Application of the Regression Discontinuity Design." Andy has provided the following abstract for his talk:</p>

<blockquote>
Regression discontinuity design (RDD) is a powerful and increasingly popular approach to causal inference that can be applied when treatment is assigned deterministically based on a continuous covariate. In this talk, I will present an application of RDD from French municipalities, where the system of electing the municipal council depends on whether the city's population is above or below 3500. First I show that cities above the population cutoff have fewer uncontested elections and more opposition representation on municipal councils, consistent with expectations. I then trace the effect of these political changes -- which amount to a heightening of the scrutiny imposed on the mayor -- on policy outcomes, providing evidence that more opposition scrutiny leads to more moderate policy.
</blockquote>

<p>The Applied Statistics workshop meets each Wednesday in room K-354, CGIS-Knafel (1737 Cambridge St). We start at 12 noon with a light lunch, with presentations beginning around 12:15 and we usually wrap up around 1:30 pm. We hope you can make it.</p>]]></description>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/10/eggers_on_elect.shtml</link>
<guid>http://www.iq.harvard.edu/blog/sss/archives/2009/10/eggers_on_elect.shtml</guid>
         
              <category>Abstracts</category>
         
<pubDate>Mon, 19 Oct 2009 19:21:35 -0500</pubDate>
</item>


</channel>
</rss>