May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Andy Eggers (Gov)

Members:

Weihua An (Soc)
Kevin Bartz (Stats)
Sebastian Bauhoff (HealthPol)
John Graves (HealthPol)
Justin Grimmer (Gov)
Jens Hainmueller (Gov)
Mike Kellermann (Gov)
Ellie Powell (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Kevin Quinn, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 3.34


« Questionnaire Design: The Weak Link? | Main | Methods Classes in Spring 06 »

21 January 2006

Citing and Finding Data

How much slower would scientific progress be if the near universal standards for scholarly citation of articles and books had never been developed. Suppose shortly after publication only some printed works could be reliably found by other scholars; or if researchers were only permitted to read an article if they first committed not to criticize it, or were required to coauthor with the original author any work that built on the original. How many discoveries would never have been made if the titles of books and articles in libraries changed unpredictably, with no link back to the old title; if printed works existed in different libraries under different titles; if researchers routinely redistributed modified versions of other authors' works without changing the title or author listed; or if publishing new editions of books meant that earlier editions were destroyed? How much less would we know about the natural, physical, and social worlds if the references at the back of most articles and books were replaced with casual mentions, in varying, unpredictable, and incomplete formats, of only a few of the works relied on?

These questions are all obviously counterfactuals when it comes to printed matter, but remarkably they are entirely accurate descriptions of our [in]ability to reliably cite, access, and find quantitative data, all of which remain in an entirely primitative state of affairs.

Micah Altman and I have just written a paper on this subject that may be of interest. The title is "A Proposed Standard for the Scholarly Citation of Quantitative Data" and a copy can be found here. The abstract follows. Comments welcome!

An essential aspect of science is a community of scholars cooperating and competing in the pursuit of common goals. A critical component of this community is the common language of and the universal standards for scholarly citation, credit attribution, and the location and retrieval of articles and books. We propose a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative data sets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to onl those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.

Posted by Gary King at January 21, 2006 5:50 PM

Comments

Yes. Yes, YES!

Yes, your paper “A Proposed Standard for the Scholarly Citation of Quantitative Data� is right on target and why not fuse it, or fold it into what NARA is doing. That would help insure long term stable funding and because government funded studies would be the easiest to get in proper form by requiring as part of grant funding.

Note that NARA is right now wanting comments on what they should be doing. Please send NARA your proposed standard paper.


Dwight Hines
St. Augustine, Florida
www.globalear.com

===========================
Extracts from
Strategic Directions For NARA
December 23, 2005


Quote 1: Mission:

The National Archives and Records Administration serves American democracy by
safeguarding and preserving the records of our Government, ensuring that the people
can discover, use, and learn from this documentary heritage. We ensure continuing
access to the essential documentation of the rights of American citizens and the actions of
their government. We support democracy, promote civic education, and facilitate
historical understanding of our national experience.

Quote 2: Vision

As the nation’s record keeper, it is our vision that every American will understand the
vital role records play in a democracy, and their own personal stake in the National
Archives. Our holdings and diverse programs will be available to more people than ever
before through modern technology and dynamic partnerships. The stories of our nation
and our people are told in the records and artifacts cared for in NARA facilities around
the country. We want every American to be inspired to explore the records of their
country.

Quote 3: STRATEGIC GOALS

One: As the Nation's recordkeeper, we will expand our leadership and services in managing the
Government's records to help ensure the continuity and effective operations of Federal programs.
Two: We will preserve and process records for opening to the public as soon as legally possible.
Three: We will solve the challenges of electronic records in the Government.
Four: We will provide prompt, easy, and secure access to our holdings anywhere, anytime.
Five: We will increase civic literacy in America through our museum, public outreach, and education
programs.
Six: We will equip NARA to meet the changing needs of our customers.
===================
snip
Get the complete doc in pdf from NARA

Letter, dated January 13, 2006, from David MacMillen:
snip
“Over the next few months, we will be translating these goals and objectives into performance targets and measures that will allow us to determine how well we have achieved those targets. As we move through this process, we will again be discussing with our stakeholder and customer communities ways in which we can develop partnerships that will further both our goals and those of our constituents. I hope that as you read the document you will think about specific projects that can be developed that will further our mutual goals and strengthen the profession.

Please feel free to contact me if you have any questions about this document, or our plans for the future.
David McMillen�
David.Mcmillen@nara.gov
External Affairs Liaison at the National Archives


============

Posted by: Dwight Hines at January 22, 2006 7:04 PM

The timing on your paper seems to be just right. I just received an article about not leaving the data in the dark. I think you need to enlist these folks to capture NARA.
dh
-----
http://www.dlib.org/dlib/january06/linden/01linden.html
D-Lib Magazine January 2006
Volume 12 Number 1 ISSN 1082-9873
Don't Leave the Data in the Dark
Issues in Digitizing Print Statistical Publications

Julie Linden
Government Documents & Information Center
Yale University

Ann Green
Digital Life Cycle Research and Consulting



Introduction
Digitization has the potential to transform scholarly use of data found in print statistical publications. While presenting images of statistical tables in a digital library environment may be desirable, the full potential of such material can be realized only if the resulting digital objects are easy to search and manipulate and are accompanied by sufficient metadata to support extraction of numbers from tables and comparison of numbers across tables.
The Economic Growth Center Digital Library (EGCDL), funded by The Andrew W. Mellon Foundation, addressed these issues in a project that brought together the perspectives of digital libraries and data archives. In EGCDL, PDF reproductions of statistical abstracts co-exist with manipulable Excel files of tables from the abstracts. Rich descriptive metadata can be leveraged to provide discovery of and context for digital objects at varying levels of granularity – from a statistical series to a single number in a table cell. Thus derivative digital objects – the manipulable table or even a cell from it – can be traced back to the original source or a faithful digital reproduction of that source.
Challenges in transforming print statistics to digital objects
SNIP

Posted by: Dwight Hines at January 22, 2006 7:31 PM

I liked the general outline of this paper, and it recognizes the important points of interest. (We've referenced it as useful guidance at the Marine Metadata Interoperability site, feedback welcome.)

I was hoping for more detail on (a) the metadata description page (a schema would be nice!), and (b) the procedural definitions or approaches for computing the Universal Numeric Fingerprint. Until a particular standard for these is proposed and begins to be adapted, actual implementations can not move forward.

Posted by: John Graybeal at March 18, 2006 5:14 PM

Thanks for your note and the citations John. For Metadata, we use the DDI Standard, which was developed for social science data; the schema are worked out on that site in detail. Our proposed citation would of course remain the same with any other metadata standard as well.

The methods of computing Universal Numeric Fingerprints, as well as downloadable R software to compute them, are available at this page on our Virtual Data Center site. The Virtual Data Center software will compute UNFs, and the entire citation we propose, for you automatically, and you're of course welcome to install a VDC yourself. We will also shortly have a dropbox operational for computing UNFs on the fly from our site (and either depositing the data in our archive or not as you see fit).

Posted by: Gary King [TypeKey Profile Page] at March 18, 2006 5:40 PM

Notification

Enter e-mail address to receive notification of new comments to this entry

Post a comment




Remember Me?

(you may use HTML tags for style)