| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |
« Contagion | Main | Facebook viruses »
5 January 2009
There was an interesting article in today's New York Times on Google books. Google books is a massive effort to scan, essentially, all print media, going back centuries. (Also see effort by Open Content Alliance.) Partially putting aside the important issues around control of the data, the digitization of texts creates the capacity to access, organize, and analyze much of what humanity has "thought" in recent history. From the perspective of a social scientist, the exciting prospect is to view this corpus as, perhaps the most extraordinary data set ever assembled (especially when combined with recent developments in natural language processing). Can we see the rise and fall of social movements? Of ways of thinking about the world, linking these constructs to space and time? This is part of a broader movement, as I have written before, toward a "computational social science."
The one aspect of control that this does raise is what access will there be to the entire Google books corpus for researchers? Indeed, part of the concern that has driven the Open Content Alliance (as I understand it) are the issues around public access to the corpus, where, for example, libraries will need to pay subscription fees for access to what could be a Google monopoly. There are similar concerns, as I see it, regarding access to those who wish to do research on these data. For those readers of the blog who have insight on this, please post comments.
Posted by David Lazer at January 5, 2009 10:25 PM