the book is reading you 01.19.2006, 1:42 PM
posted by ben vershbow
I just noticed that Google Book Search requires users to be logged in on a Google account to view pages of copyrighted works.
They provide the following explanation:
Why do I have to log in to see certain pages?
Because many of the books in Google Book Search are still under copyright, we limit the amount of a book that a user can see. In order to enforce these limits, we make some pages available only after you log in to an existing Google Account (such as a Gmail account) or create a new one. The aim of Google Book Search is to help you discover books, not read them cover to cover, so you may not be able to see every page you're interested in.
So they're tracking how much we've looked at and capping our number of page views. Presumably a bone tossed to publishers, who I'm sure will continue suing Google all the same (more on this here). There's also the possibility that publishers have requested information on who's looking at their books -- geographical breakdowns and stats on click-throughs to retailers and libraries. I doubt, though, that Google would share this sort of user data. Substantial privacy issues aside, that's valuable information they want to keep for themselves.
That's because "the aim of Google Book Search" is also to discover who you are. It's capturing your clickstreams, analyzing what you've searched and the terms you've used to get there. The book is reading you. Substantial privacy issues aside, (it seems more and more that's where we'll be leaving them) Google will use this data to refine Google's search algorithms and, who knows, might even develop some sort of personalized recommendation system similar to Amazon's -- you know, where the computer lists other titles that might interest you based on what you've read, bought or browsed in the past (a system that works only if you are logged in). It's possible Google is thinking of Book Search as the cornerstone of a larger venture that could compete with Amazon.
There are many ways Google could eventually capitalize on its books database -- that is, beyond the contextual advertising that is currently its main source of revenue. It might turn the scanned texts into readable editions, hammer out licensing agreements with publishers, and become the world's biggest ebook store. It could start a print-on-demand service -- a Xerox machine on steroids (and the return of Google Print?). It could work out deals with publishers to sell access to complete online editions -- a searchable text to go along with the physical book -- as Amazon announced it will do with its Upgrade service. Or it could start selling sections of books -- individual pages, chapters etc. -- as Amazon has also planned to do with its Pages program.
Amazon has long served as a valuable research tool for books in print, so much so that some university library systems are now emulating it. Recent additions to the Search Inside the Book program such as concordances, interlinked citations, and statistically improbable phrases (where distinctive terms in the book act as machine-generated tags) are especially fun to play with. Although first and foremost a retailer, Amazon feels more and more like a search system every day (and its A9 engine, though seemingly always on the back burner, is also developing some interesting features). On the flip side Google, though a search system, could start feeling more like a retailer. In either case, you'll have to log in first.
Posted by ben vershbow on January 19, 2006 1:42 PM
tags: Copyright and Copyleft, Libraries, Search and the Web, POD, amazon, books, e-commerce, e-publishing, ebooks, google, google_book_search, google_print, internet, print_on_demand, privacy, publishing, search, web
dan visel on January 19, 2006 2:25 PM:
There's a surprisingly decent article by John Lanchester in this week's London Review of Books; ostensibly a review of two books on Google, it's a well-written summary of current Google-issues for those who'd like to get up to speed. It also mentions the Yahoo/Hong Kong case - and mentions Google's concessions to China - both of which seems relevant to discussions of Google & privacy.
dave munger on January 20, 2006 7:23 AM:
Is the autobiography of Charles Darwin actually still under copyright? I knew the Sonny Bono act was excessive, but that seems a little much.
ben vershbow on January 20, 2006 11:20 AM:
I believe the Darwin is protected because it is a contemporary edition. You actually don't have to sign in for older titles/editions scanned from library collections (e.g. this book from Harvard) where the actual book is older than 1922. This is where the idea of copyright becames very slippery. For contemporary editions of public domain texts, Google defers to the publisher, not the public. If anything, this shows how rooted in the materiality of books Google Book Search actually is.
bowerbird on January 20, 2006 1:01 PM:
obstacles are thrown up even on the public-domain books,
in the form of sign-in requirements, captchas, and so on.
although it's unclear why, i think it _might_ be because
people are scraping google's scan-sets indiscriminately;
even worse, they're doing it in an uncoordinated manner,
and not pooling their scrapings, which means each book
is being scraped far more times than would be "necessary".
there's no need to scrape a scan-set more than once.
having obtained it, it can be uploaded to another site,
freely available to anyone else who might want it later.
(scans of a public-domain book are also public domain,
thanks to the atypically sensible bridgeman v corel ruling.)
i have asked some of the scrapers -- specifically those
from distributed proofreaders, at http://www.pgdp.net
-- to modify their practices so as not to piss off google
(whose robots.txt file bans auto-harvesting the scans),
but they've been non-responsive (when not antagonistic).
so if google keeps throwing up obstacles, or makes them
more strict in the future, i know who _i_ will be blaming.