May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Applied Statistics - Brian Ripley | Main | Data from China: Land of Plenty? (II) »

25 April 2006

Open and Transparent Data

You, Jong-Sung

There was a big scandal in scientific research recently. Dr. Hwang Woo-suk, Seoul National University in Korea, announced last June that he and his team had cloned human embryonic stem cells from 11 patients. It was a remarkable breakthrough in stem cell research and many people expected that he would eventually get a Nobel Prize. Hwang's team, however, was found to have intentionally fabricated key data in two landmark papers on human embryonic stem cells, according to a Seoul National University panel. Now, the prosecution is probing into his team’s alleged fabrication of data and violation of bioethics law.

Remarkably, the prestigious journal Science was not able to detect the data faking before and after publication of the articles. It is understandable considering that peer reviewers typically examine the presented analysis of the data but do not receive nor examine the actual data itself. Even more surprisingly, most of the 26 co-authors of the June 2005 article were unaware of the data fabrication. It was revealed only through an inside whistleblower who was the second author of the earlier article, and through a team of investigative journalists.

This incident makes us aware of the weakness and vulnerability of the review system of academic journals. Indeed, there have been many fraud cases in the history of scientific research, and Dr. Hwang has just added one more such case. Although outright faking may not be very common, errors in data and data analysis might be much more common than most people assume them to be.

I was struck by numerous errors that were found by students of Gov 2001 who replicated the analysis of an article published in a prominent social science journal. Many of the errors are probably benign and not critical to their key findings, but some errors may be critical and even deliberate. It can be tempting to distort the data or results of data analysis when a researcher has spent much time and energy to find evidence to support his or her hypothesis and the results are close but fall short of significance.

In his entry entitled Citing and Finding Data, Gary King discussed the [in]ability to reliably cite, access, and find quantitative data, all of which remain in an entirely primitive state of affairs. Sebastian Bauhoff also stressed the need for making data available in his entry Data Availability. I cannot agree with them more. If journals require authors to submit data as well as manuscript of their paper and publish data that were used for articles as an on-line appendix, it will certainly reduce the errors in data and data analysis as well as spur further research. This should be applied to qualitative data (such as interview transcripts) as well as quantitative data.

Posted by Jong-sung You at April 25, 2006 6:00 AM

Comments

Unfortunately our discipline does not seem to award people for collecting new data. Scholars who spend valuable time and effort into such a project may wish to be the sole user for a few years. We should either change the benefits or be understanding.
On the other hand, NSF makes it mandatory that people make their data available and yet it does not decrease the appeal of their funds.

Posted by: Kerimcan at April 25, 2006 11:50 AM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)