May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Games That Produce Data | Main | Primary Crosstabs »

20 March 2008

Correlation of Ratios or Difference Scores Having Common Terms

Yesterday I went to Professor Stanley Lieberson’s class, Issue in the Interpretation of Empirical Evidence. We discussed a paper, written by Stan and Glenn Fuguitt, titled Correlation of Ratios or Difference Scores Having Common Terms. The basic argument of this paper is that although ratios and difference scores are often used as dependent variables in traditional regression analysis, if there are some independent variables who share the same common term with those dependent variables, the estimated coefficients could be severely biased due to the spurious correlation brought about by this common term (whether it is in the denominator or numerator). For examples, if dependent variables are in the form of X/Z while independent variables are something like Y/Z, Z, or Z/X, etc., the estimated coefficients between the dependent and independent variable could become statistically significant simply due to chance.

For some concrete examples, criminologist often use crime rate (adjusted by city population size) as dependent variable while at the same time using city population size as independent variable; organizational researchers are interested in the relationship between the relative size of administration of organization and the absolute size of organization; and economists often regress GDP per capita on such variables as population growth rate, and/or even population size, etc. According to Stan and Fuguitt’s research, all the above examples will provide spurious coefficients since the dependent variable and the independent variable include common terms. In their paper, they attributed this finding back to a paper written by Kail Pearson in 1897 in which Pearson presented rigorously how the spurious correlation came from and a proximate formula for computing correlations of ratios, etc.

We were asked to do an experiment to prove the above spurious correlation, in which we generated three sets of random integers (namely, X, Y, Z) ranging from 1 to 99, presented the pairwise correlation matrix among them and found no significant correlations between any pair of variables. But we found significant correlation between Y/X and X, and when we regressed Y/X on X, the coefficient became significant too. So after such manipulations like division or subtraction, we artificially build significant correlation among two originally insignificant correlated random integers.

Why not try the following in Stata to see if the above claims are overstated or not?

set obs 50
gen x=int(99*uniform()+1)
gen y=int(99*uniform()+1)
gen z=int(99*uniform()+1)

pwcorr x y z, sig

gen ydx = y/x
pwcorr x ydx, sig
reg x ydx

gen xdz = x/z
gen ydz = y/z
pwcorr xdz ydz, sig
reg xdz ydz

gen zdy = z/y
pwcorr xdz zdy, sig
reg xdz zdy

Are you convinced by now? If not, please go read the source paper below (or just write back and say what is wrong with Stan and Fuguitt’s argument). If yes, the question now becomes what should we do with the spurious correlation. Shall we just use the original forms of variables? Shall we re-specify the Solow model? But what if our research interest is about ratio or difference? … …


Source:
Stanley Lieberson and Glenn Fuguitt, 1974. Correlation of Ratios or Difference Scores Having Common Terms, in Sociological Methodology (1973-1974), edited by Herbert Costner, San Francisco: Jossey-Rass Publishers.

Posted by Weihua An at March 20, 2008 11:17 AM

Comments

Great post. This is a serious issue that comes up often in empirical research. For a recent example check out the following paper. They show that an important published finding about a potentially negative impact of immigration on native-born employment seems to be driven by this econometric artifact.

Jens

Posted by: Jens at March 20, 2008 12:56 PM

Yes, very good point. One issue, tho. Its not the case that this point means that you should never use ratios with the same denominator on both sides of an equation. For example, suppose the raw data are set up as Y*Z for the dep var and X*Z as the explanatory var. In that case, you might well want to take the ratio (i.e., divide both by Z) in order to *eliminate* a spurious relationship. The key issue to decide on whether dividing by Z will help or hurt is a substantive one. You need to decide whether dividing by Z creates or distroys the variable you really care about. This cannot be determined from the math alone.

Posted by: Gary King [TypeKey Profile Page] at March 23, 2008 1:01 PM

Yes, Gary. Only if x is the sole variable you really care about in the model can you obtain unbiased estimation of its coefficient through dividing both sides by Z.

To Jens, the paper you recommended is great and provides a vivid real example. 3x.

Posted by: Weihua at April 1, 2008 7:44 PM

Excellent post. Jens, nice paper that you attached too.

Posted by: Ben at April 5, 2008 11:11 AM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)