<?xml version="1.0" encoding="utf-8"?>

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:cc="http://web.resource.org/cc/"
xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="http://www.iq.harvard.edu/blog/sss/">
<title>Social Science Statistics Blog</title>
<link>http://www.iq.harvard.edu/blog/sss/</link>
<description></description>
<dc:creator></dc:creator>
<dc:date>2010-02-08T14:46:40-05:00</dc:date>
<admin:generatorAgent rdf:resource="http://www.movabletype.org/?v=4.24-en" />


<items>
<rdf:Seq>
<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/02/rios_in_new_yor.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/02/natural_exprime.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/random_network.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/mapping_inequal.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/voter_outrage_o.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/visualizing_dat.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/bike_helmet_law.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/professions_and.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/secrets_of_rati.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/netflix_queues.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/sequential_idea.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/relative_distri.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2010/01/discovering_cau.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2009/12/spirling_cited.shtml" />

<rdf:li rdf:resource="http://www.iq.harvard.edu/blog/sss/archives/2009/12/in_the_spirit_o.shtml" />
</rdf:Seq>
</items>

</channel>


<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/02/rios_in_new_yor.shtml">
<title>Rios in the New York Times</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/02/rios_in_new_yor.shtml</link>
<description><![CDATA[<p>Viri Rios has a <a href="http://www.nytimes.com/2010/01/26/opinion/26iht-edrios.html?scp=2&sq=drug%20cartels&st=cse">great op-ed</a> in the New York Times about mathematical social science and Mexican drug politics.</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>rnielsen</dc:creator>
<dc:date>2010-02-08T14:46:40-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/02/natural_exprime.shtml">
<title>Natural experiments</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/02/natural_exprime.shtml</link>
<description><![CDATA[<p>The other week, I read Jared Diamond's <em>Guns, Germs, and Steel</em> which managed to get me a little worked up about a pet peeve of mine: the term "natural experiment."  Just when I had gotten calmed down, the Polmeth list serve alerted me to an <a href="http://www.oxfordjournals.org/page/3851/1">entire issue of Political Analysis</a> devoted to natural experiments.  Arghhh...</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>rnielsen</dc:creator>
<dc:date>2010-02-02T11:30:00-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/random_network.shtml">
<title>Random Network Models</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/random_network.shtml</link>
<description><![CDATA[<p><a href="http://www.people.fas.harvard.edu/~blitz/Site/Home.html">Joe Blitzstein</a>, ever-popular <a href="http://video.google.com/videosearch?q=harvard+statistics+110&rls=com.microsoft:en-us&oe=UTF-8&um=1&ie=UTF-8&ei=l-tgS-qgIMnh8Qa5xuiLDA&sa=X&oi=video_result_group&ct=title&resnum=4&ved=0CBoQqwQwAw#emb=0&q=statistics+110+at+harvard&view=3">professor of Statistics 110</a>, is offering a seminar course this term.  Statistics 340: Random Network Models will meet on Tuesdays (starting February 2nd) between 2pm - 4pm in Science Center 706. Joe invites all interested to come check it out. </p>]]></description>
<dc:subject></dc:subject>
<dc:creator>dbloome</dc:creator>
<dc:date>2010-01-27T20:37:08-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/mapping_inequal.shtml">
<title>Mapping Inequality</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/mapping_inequal.shtml</link>
<description><![CDATA[<p>Unemployment remains at 10 percent, according to today's <a href="http://www.bls.gov/news.release/laus.nr0.htm">Bureau of Labor Statistics' news release</a>. In light of ongoing economic troubles, Goldman Sachs is "trimming" its bonus pool to only $16.2 billion, for an average bonus of almost $500,000 per employee <a href="http://www.nytimes.com/2010/01/22/business/22goldman.html?em">according to the New York Times</a>. It is worth remembering that such extreme displays of economic inequality have not been permanent features of the U.S. economy. Rather, inequality has grown substantially over the last four decades. Watch it spread from the South across the country and intensify everywhere in the figure below.</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="maps.jpeg" src="http://www.iq.harvard.edu/blog/sss/maps.jpeg" width="550" height="550" class="mt-image-none" style="" /></span></p>

<p>Check out these introductory <a href="http://www.people.fas.harvard.edu/~zhukov/spatial.html">slides and R code</a> for great tips on spatial statistics, including information on how to create a figure like the one shown above and much, much more!<br />
</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>dbloome</dc:creator>
<dc:date>2010-01-22T10:28:28-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/voter_outrage_o.shtml">
<title>Voter Outrage over Health Care</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/voter_outrage_o.shtml</link>
<description><![CDATA[<p>Political scientists David Brady and Doug Rivers, along with business and law professor Daniel Kessler wrote an <a href="http://online.wsj.com/article/SB10001424052748704541004575011134134536138.html">op-ed for the WSJ</a> arguing that the health care bill is hurting the Democrats. Their evidence is that states with lower support for the bill also have lower support for incumbent Democratic senatorial candidates:</p>

<blockquote>
Health reform is more popular in some of these states than in others. Where it's popular, Democratic candidates don't have too much of a problem, but where it's unpopular--and that includes most states--the Democratic Senate candidates are fighting an uphill battle. Support for health reform varies in these 11 states from a low of 33% in North Dakota to a high of 48% in Nevada. Democrats trail Republicans in six of the states; three are toss-ups; and in two, Democrats have a solid lead.
</blockquote>
 
I hate to fill any kind of institutional stereotype, but the causal reasoning here leaves much to be desired. The argument of the essay is that BECAUSE of health care, Democrats are doing worse in the polls. On this question, obviously, we have no data: this is why speculation is running rampant. The counterfactual would be: what would have happened to Democratic senatorial candidates if there had been no (or a substantially smaller) health care bill? Pundits can hardly type fast enough to get answers to this question out right now. Certainly, though, a correlation of support for health-care and support for Democrats will not provide the answer (since, you know, there is no variation on the treatment--all states are in the health care reform world). 

<p>Despite the general tone of the piece ("The culprit is the unpopularity of health reform...") , I believe the authors are making a different argument. Namely, that voters are responding to <em>their</em> senator's vote on health care. Based on their evidence, however, I think this is a flawed argument as well. </p>]]></description>
<dc:subject></dc:subject>
<dc:creator>mblackwell</dc:creator>
<dc:date>2010-01-21T16:02:23-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/visualizing_dat.shtml">
<title>visualizing the campaign finance case</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/visualizing_dat.shtml</link>
<description><![CDATA[<p>My colleague, Brandon Stewart, oriented me to this neat webpage, <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/">manyeyes.alphaworks.ibm.com</a>, an IBM-developed web site that allows you to upload data quickly and visualize it using a variety of techniques.</p>

<p>Many Eyes lets you use textual data, so I just tried it out using the majority and dissenting opinions from <a href="http://www.supremecourtus.gov/opinions/09pdf/08-205.pdf">Citizens United v. FEC</a>, today's Supreme Court's decision striking down existing campaign finance law.  (Note: Let's just say it's not a bad idea to use publicly available, non-copyrighted data.) </p>

<p>The resulting visualizations are just terrific, and they actually go far in illustrating the substantive differences between the conservative and liberal Justices on the campaign finance issue.</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>msen</dc:creator>
<dc:date>2010-01-21T13:16:42-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/bike_helmet_law.shtml">
<title>Bike helmet laws</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/bike_helmet_law.shtml</link>
<description><![CDATA[<p>I'm into biking (mostly road-biking these days) so I was interested to read a post on the New York Times' <a href="http://freakonomics.blogs.nytimes.com/2010/01/19/do-bike-helmet-laws-discourage-bicycling/">"Freakonomics" blog</a> about <a href="http://www.gse.uci.edu/docs/Carpenter_Stehr%20Bicycle_Manuscript_50409.pdf">a study</a> that uses variation in bike helmet laws across US states to show that helmet laws decrease bike riding among kids and teens.  Since I think that most people should ride bikes most of the time AND I have been known to bug people to wear helmets, perhaps I've been working against myself.  </p>]]></description>
<dc:subject></dc:subject>
<dc:creator>rnielsen</dc:creator>
<dc:date>2010-01-19T21:01:59-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/professions_and.shtml">
<title>why academics are so liberal</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/professions_and.shtml</link>
<description><![CDATA[<p>I read in this morning's <a href="http://www.nytimes.com/2010/01/18/arts/18liberal.html?hpw">New York Times</a> about research being conducted by two sociologists, Neil Gross (British Columbia) and Ethan Fosse (Harvard), on why academics tend to be left of center. That professors are more liberal than non-academics is a pretty well-known fact; at the same time, we don't have a good idea as to why this is. Previous research on this point has largely relied on anecdotal or qualitative techniques, so Gross and Fosse's paper, which relies on survey data, looks promising. A copy of the working paper is <a href="http://www.soci.ubc.ca/fileadmin/template/main/images/departments/soci/faculty/gross/why_are_professors_liberal.pdf">here</a>.<br />
</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>msen</dc:creator>
<dc:date>2010-01-18T09:55:58-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/secrets_of_rati.shtml">
<title>secrets of rating success</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/secrets_of_rati.shtml</link>
<description><![CDATA[<p>Complementing Matt's post about TV-watching patterns below, here's an interesting <a href="http://www.guardian.co.uk/education/2009/dec/01/research-tv-drama-csi">article </a>from The Guardian about how three British computer scientists are using content analysis techniques to parse out what makes (or does not make) a hit TV script.</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>msen</dc:creator>
<dc:date>2010-01-15T09:02:44-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/netflix_queues.shtml">
<title>Netflix queues by zip code</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/netflix_queues.shtml</link>
<description><![CDATA[<p>The New York Times has put together an awesome <a href="http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html">data visualization on the geography of Netflix</a>. For each zip code they have the top 50 rentals of 2009 and they use these ranks to draw heat maps for each movie. There are all kinds of interesting patterns that point to both how preferences cluster and information spreads. My favorite two maps are the following, which I reference after the jump (darker colors indicate more rentals in that areas):</p>

<p><em>Mad Men</em>, Season 1 Disc 1:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="madmen.png" src="http://www.iq.harvard.edu/blog/sss/madmen.png" width="493" height="465" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p><em>Paul Blart, Mall Cop</em>:<br />
<span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="paul blartt.png" src="http://www.iq.harvard.edu/blog/sss/paul%20blartt.png" width="495" height="467" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span><br />
</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>mblackwell</dc:creator>
<dc:date>2010-01-09T17:06:12-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/sequential_idea.shtml">
<title>Sequential Ideal Points</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/sequential_idea.shtml</link>
<description><![CDATA[<p>Simon Jackman <a href="http://jackman.stanford.edu/blog/?p=1516">puts together a plot</a> of how the estimation of ideal points of the 111th U.S. Senate changes as he adds each roll call. Every Senator starts the term at 0 and then branches out. It illustrates an interesting feature of these <abbr title="Item Response Theory">IRT</abbr> models:</p>

<blockquote>
The other thing is that there doesn't seem to be any obvious "vote 1″ update for ideal points. That is, there is no simple mapping from the ideal point estimate based on m roll call to ideal point estimates based on m+1 roll calls. You have to start the fitting algorithm from scratch each time (and hence the appeal of exploiting multiple cores etc), although the results from the previous run giving pretty good start values.
</blockquote>]]></description>
<dc:subject>Regular Post</dc:subject>
<dc:creator>mblackwell</dc:creator>
<dc:date>2010-01-09T15:57:56-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/relative_distri.shtml">
<title>Relative Distributions</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/relative_distri.shtml</link>
<description><![CDATA[<p>When comparing how different groups fare on a particular measure (for example, the life expectancy of immigrants versus native born individuals or the wages of workers in 1950 versus 2000), we often focus on the difference in the averages of the two distributions. Sometimes we also examine disparity in distributional spreads, inquiring whether one group's outcomes are more variable than the other's. Of course, summarizing distributions with one or two parameters discards a lot of potentially useful information. Enter <a href="http://books.google.com/books?id=XCqoD2GG1nYC&dq=%22Handcock%22+%22Relative+distribution+methods%22+&lr=&source=gbs_navlinks_s">Relative Distribution Methods in the Social Sciences</a>, a clever book by Mark Handcock and Martina Morris. In what follows, I explore the basic insight of the book and test out some techniques myself <em>(with graphs!)</em>.</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>dbloome</dc:creator>
<dc:date>2010-01-08T13:17:49-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2010/01/discovering_cau.shtml">
<title>Discovering Causal Knowledge?</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2010/01/discovering_cau.shtml</link>
<description><![CDATA[<p>How do we learn about causal relationships when we can't run experiments? In my own work, the answer has been to look around for "natural experiments" in which something important varies for roughly random reasons: for example, the winners of close elections are selected almost at random, which allows you to draw conclusions about the effect of being elected on various outcomes (like the <a href="http://www.people.fas.harvard.edu/~aeggers/papers/MPs_for_Sale_published_APSR_Nov_2009.pdf">winner's wealth</a>). </p>

<p>I recently read <a href="http://kdl.cs.umass.edu/papers/Jensen-et-al-kdd2008.pdf">a paper</a> by <a href="http://kdl.cs.umass.edu/people/jensen/">David Jensen</a> and coauthors from the <a href="http://kdl.cs.umass.edu/">UMass Knowledge Discovery Laboratory</a> that proposes a systematic way of uncovering causal relationships from databases. Their approach (which they call AIQ -- "Automated Identification of Quasi-experiments") is not to mine the joint density of variables for independencies that can produce a causal graph (as discussed in <a href="http://www.iq.harvard.edu/blog/sss/archives/2009/03/follow-up_on_ro.shtml">Jamie Robins' talk last March</a>), but rather to produce a list of feasible quasi-experiments based on a standard database schema that has been augmented with some causal information (e.g. A might cause B, C does not cause A or B) and some temporal information (i.e. ordering and frequency of events). In the paper, the authors provide an overview of the approach as applied to three commonly-used databases, including some candidate quasi-experiments that the algorithm suggests. </p>

<p>My impression after reading the paper was that AIQ's discovery potential is pretty limited (at least at this stage), because most users who could provide the inputs AIQ needs could very likely think up the quasi-experimental design themselves. Any valid quasi-experiment design that AIQ can discover at this point appears to come from the user specifying that the treatment and outcome have no common cause or confounding factors, which is a very unusual situation that is either quite obvious (e.g. because there is a lottery or other explicit randomization) or requires significant substantive knowledge. I wonder how commonly a researcher would a) have in mind a causal model that is sufficiently restrictive to produce plausible quasi-experimental designs through AIQ, and b) not have already thought of those designs. </p>

<p>The example of causal discovery the authors provide comes from a combined IMDB/Netflix movie database; they assert that winning an Oscar improves the reviews a movie receives on Netflix. In order for AIQ to suggest this quasi-experiment, the authors had to specify in advance that the Oscar-winning film is chosen from among nominees at random. One can of course criticize that assumption, but the point is that once you make that assumption it should be quite obvious that you have a quasi-experiment with which to study the effect of winning the Oscar on various outcomes; any film-specific, post-awards ceremony outcome should do. AIQ may provide a structured way to go through that exercise, but I'm not convinced there are many circumstances in which it would be useful to a researcher. </p>

<p> </p>]]></description>
<dc:subject></dc:subject>
<dc:creator>aeggers</dc:creator>
<dc:date>2010-01-05T10:18:40-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2009/12/spirling_cited.shtml">
<title>Spirling cited in Nature, underlying pattern of human conflict found</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/12/spirling_cited.shtml</link>
<description><![CDATA[<p>A <a href="http://www.nature.com/nature/journal/v462/n7275/abs/nature08631.html">recent paper in Nature</a> documents power-law patterns (i.e. scale invariance) in the distribution of events within insurgencies: The number of casualties per insurgent event, and the number of insurgent events per day, apparently follow striking regularities across an array of insurgencies. Power laws everywhere!</p>

<p>What makes the paper especially notable around IQSS is that our own Arthur Spirling is cited in the first sentence:</p>
<blockquote>The political scientist Spirling and others have correctly warned that finding common statistical distributions (for example, power laws) in sociological data is not the same as understanding their origin.
</blockquote>
<p>The citation is to Arthur's unpublished paper <a href="http://www.people.fas.harvard.edu/~spirling/documents/powerlawSend.pdf">The Next Big Thing: Scale Invariance in Political Science</a>, which provides a breezy overview of scale invariance as a concept and documents a few previously unremarked examples from political science.</p> 

<p>Part of the point of Arthur's paper is that political science (and social science more broadly) has mostly ignored research in natural sciences that, like the Nature article, examines emergent patterns in social phenomena. As he points out, it's not how we "do business." The hard scientists chasing power laws attempt to explain an underlying random process starting from the distribution of outcomes; we're more accustomed to starting from the joint density of outcomes and covariates.</p> 

<p>In a way, the fact that Arthur's paper was cited at all highlights the lack of interest in this style of work in social science. The authors of the Nature piece wanted to cite social science work on power laws, and they ended up with Arthur's piece, which is, for all its merits, several years old and unpublished.</p> 

<p>I admit I've been a bit of a power-law curmudgeon, like other social scientists, but lately I've come to better understand the value of this approach. I don't expect that I'll be focusing on this kind of work myself, but, like Arthur, I believe it is a growth industry.</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>aeggers</dc:creator>
<dc:date>2009-12-28T09:20:01-05:00</dc:date>
</item>

<item rdf:about="http://www.iq.harvard.edu/blog/sss/archives/2009/12/in_the_spirit_o.shtml">
<title>a cautionary christmas tale</title>
<link>http://www.iq.harvard.edu/blog/sss/archives/2009/12/in_the_spirit_o.shtml</link>
<description><![CDATA[<p>Merry Christmas, everyone!</p>

<p>I was amused to read about the hoopla involving this online "<a href="http://www.bmj.com/cgi/content/extract/339/dec16_1/b5261">study</a>" in the <em>BMJ </em> entitled "Santa Claus: A Public Health Pariah." The tongue-in-cheek article, written by Australian epidiomiologist Nathan J. Grills, contends that the corpulent Mr. Claus and his "rotund sedentary image" set a bad example for kids and adults alike.</p>]]></description>
<dc:subject></dc:subject>
<dc:creator>msen</dc:creator>
<dc:date>2009-12-25T10:54:37-05:00</dc:date>
</item>


</rdf:RDF>