May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Applied Statistics - Gary King | Main | Why I wish TV news was really boring »

10 April 2007

What determines which statistical software you use?

I was recently involved in a discussion among fellow grad students about what determines which statistical software package people use to analyze their data. For example, this recent market survey lists 44 products selected from 31 vendors and they do not even include packages like R that many people around Harvard seem to use. Another survey conducted by Alan Zaslavsky lists 15 packages while `just’ looking at the available software for the analysis of surveys with complex sample designs. So how do people pick their packages given the plethora of options? Obviously, many factors will go into this decision (departmental teaching, ease of use, type of methods used, etc. etc. etc. ). One particularly interesting factor in our discussion concerned the importance of academic discipline. It seems to be the case that different packages are popular in different disciplines. But how exactly usage patterns vary across fields remains unclear. We wondered whether any systematic data exists on this issue? For example, how many political scientists use R compared to other programs? What about statisticians, economists, sociologists, etc.? Any information would be highly appreciated.

Posted by Jens Hainmueller at April 10, 2007 10:12 PM

Comments

My anecdotal experience has been something like this:

Biostats Types, SAS
Economist Types, SPSS
Sociology/Psychology Tyoes, JMP
Machine Learning Types, Matlab
Biology Types, Excel and specialized analytical tools like Spotfire and GeneSpring.
Business Types, Excel (though often other types depending on how they got to be Business types)
Physics Types, Fortran :-)

Obviously, none of this is hard and fast--R and S-PLUS are present in all of these disciplines to varying degrees and individuals may use any of the tools if it does . Statisticians use a mixture of all of these (though not Excel so much).

Posted by: Byron at April 11, 2007 3:14 AM

In my experience:

Economist: Excel or other spread sheets :)
Biostats/Bioinformatics: R + Bioconductor
Ecostats: R, SPSS, a little bit of SAS/Base
Econometrics: E-views and recently R

(I am a statistician)

Posted by: Daniele Medri at April 11, 2007 3:46 AM

Economists: Stata is currently the most popular (but also SAS, EViews, Gauss, LIMDEP, whatever). I think some packages like SPSS and TSP are no longer widely used. For techniques that are more cutting edge (or ad hoc), Matlab or R are used (or even C, Java, FORTRAN).
Psychologists: SPSS and JMP are most popular as far as I can tell

Posted by: economist type at April 11, 2007 10:39 AM

Interesting question. I'm a former engineer and manager and now an independent consultant who participates on a program evaluation mailing list. I see program evaluators using a range of tools, with perhaps SPSS, SAS, and R being more frequently mentioned.

I used to use XLISP-STAT and still miss its brushing and other dynamic plots. For simple stuff, I use J more than anything, and I'll refresh my memory on R when I need to do something more complex.

Posted by: Bill Harris at April 11, 2007 11:25 AM

If J came up blank for anyone, especially with respect to statistics, see http://www.cs.ualberta.ca/~smillie/Jpage/jtsp.pdf.

Posted by: Bill Harris at April 11, 2007 11:30 AM

Answer: Whatever package is best for serious empirical work, both research and production. That's R. If you are not using R now, you will be someday.

PS. I do not think that the "e-mail address to receive notification of new comments to this entry" works. At least, it never has for me.

Posted by: David Kane at April 11, 2007 2:30 PM

It's an interesting question. As someone who straddles psychology and machine learning, I tend to use SPSS for some applications and Matlab or C/C++ for most. I've used R, but it was in a statistics class offered through the Government department; and though I found it useful, I don't use it much anymore simply because nobody else in my lab or field does, and it's nice to be able to share code occasionally.

Posted by: Amy at April 11, 2007 3:21 PM

I'm an economist and prefer R.

Some economists not wanting complexity use Stata or Eviews. Other economists will use SAS, Matlab or GAUSS.

Posted by: Tom S. at April 11, 2007 4:14 PM

Interesting comparison here:

Keeling, Kellie B. & Pavur, Robert J. (2007). A comparative study of the reliability of nine statistical software packages. Computational Statistics & Data Analysis, 51, 3811–3831.

Posted by: BrendanH at April 14, 2007 7:12 PM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)