April 2006
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« Data from China: Land of Plenty? (II) | Main | 999 »

26 April 2006

Inauthentic Paper Detector

Sebastian Bauhoff

A group at the Indiana School of Informatics has developed a software to detect whether a document is "human written and authentic or not." The idea was inspired by the successful attempt of MIT students in 2004 to place a computer-generated document at a conference (see here). Their program collated random fragments of computer science speak into a short paper that was accepted at a major conference without revision. (That program is online and you can generate your own paper, though unfortunately it only writes computer science articles).

The new tool lets users paste pieces of text and then assesses whether the content is likely to be authentic or just gibberish. The program tries to identify human-style writing that is characterized by certain repition patterns and apparently does rather well. It is not clear whether this works well for social science type articles. The first paragraphs of a recent health economics article (to remain unnamed) only have a 35.5% chance of being authentic. Hmm...

So is this just a joke or useful programming? The authors say it could be used to differentiate whether a website is authentic or bogus, or to identify different types of texts (articles vs blogs, for example). I wonder what the algorithms behind such technology are, and whether this will lead to an arms race between fakers and detectors? If one of them can recognize a human-written text could this be used by the faking software?

If further tweaked, could this have an application in the social sciences? Maybe we could use the faking software to search existing papers, collate them smartly and use that to identify patterns and get new ideas? Maybe everyone should run their papers through a detector software before submitting it to a journal or presenting at a workshop? And students watch out! No more random collating at 3am to meet the next day deadline!

PS: this blog entry has been classified as "inauthentic with a 26.3% chance of being an authentic text"...

Posted by Sebastian Bauhoff at April 26, 2006 2:41 PM