"the bookish character of books": how google's romanticism falls short 08.15.2007, 4:59 PM
posted by ben vershbow
Check out, if you haven't already, Paul Duguid's witty and incisive exposé of the pitfalls of searching for Tristram Shandy in Google Book Search, an exercise which puts many of the inadequacies of the world's leading digitization program into relief. By Duguid's own admission, Lawrence Sterne's legendary experimental novel is an idiosyncratic choice, but its many typographic and structural oddities make it a particularly useful lens through which to examine the challenges of migrating books successfully to the digital domain. This follows a similar examination Duguid carried out last year with the same text in Project Gutenberg, an experience which he said revealed the limitations of peer production in generating high quality digital editions (also see Dan's own take on this in an older if:book post). This study focuses on the problems of inheritance as a mode of quality assurance, in this case the bequeathing of large authoritative collections by elite institutions to the Google digitization enterprise. Does simply digitizing these - ?books, imprimaturs and all - ?automatically result in an authoritative bibliographic resource?
Duguid's suggests not. The process of migrating analog works to the digital environment in a way that respects the orginals but fully integrates them into the networked world is trickier than simply scanning and dumping into a database. The Shandy study shows in detail how Google's ambition to organizing the world's books and making them universally accessible and useful (to slightly adapt Google's mission statement) is being carried out in a hasty, slipshod manner, leading to a serious deficit in quality in what could eventually become, for better or worse, the world's library. Duguid is hardly the first to point this out, but the intense focus of his case study is valuable and serves as a useful counterpoint to the technoromantic visions of Google boosters such as Kevin Kelly, who predict a new electronic book culture liberated by search engines in which readers are free to find, remix and recombine texts in various ways. While this networked bibliotopia sounds attractive, it's conceived primarily from the standpoint of technology and not well grounded in the particulars of books. What works as snappy Web2.0 buzz doesn't necessarily hold up in practice.
As is so often the case, the devil is in the details, and it is precisely the details that Google seems to have overlooked, or rather sprinted past. Sloppy scanning and the blithe discarding of organizational and metadata schemes meticulously devised through centuries of librarianship, might indeed make the books "universally accessible" (or close to that) but the "and useful" part of the equation could go unrealized. As we build the future, it's worth pondering what parts of the past we want to hold on to. It's going to have to be a slower and more painstaking a process than Google (and, ironically, the partner libraries who have rushed headlong into these deals) might be prepared to undertake. Duguid:
The Google Books Project is no doubt an important, in many ways invaluable, project. It is also, on the brief evidence given here, a highly problematic one. Relying on the power of its search tools, Google has ignored elemental metadata, such as volume numbers. The quality of its scanning (and so we may presume its searching) is at times completely inadequate. The editions offered (by search or by sale) are, at best, regrettable. Curiously, this suggests to me that it may be Google's technicians, and not librarians, who are the great romanticisers of the book. Google Books takes books as a storehouse of wisdom to be opened up with new tools. They fail to see what librarians know: books can be obtuse, obdurate, even obnoxious things. As a group, they don't submit equally to a standard shelf, a standard scanner, or a standard ontology. Nor are their constraints overcome by scraping the text and developing search algorithms. Such strategies can undoubtedly be helpful, but in trying to do away with fairly simple constraints (like volumes), these strategies underestimate how a book's rigidities are often simultaneously resources deeply implicated in the ways in which authors and publishers sought to create the content, meaning, and significance that Google now seeks to liberate. Even with some of the best search and scanning technology in the world behind you, it is unwise to ignore the bookish character of books. More generally, transferring any complex communicative artifacts between generations of technology is always likely to be more problematic than automatic.
Also take a look at Peter Brantley's thoughts on Duguid:
Ultimately, whether or not Google Book Search is a useful tool will hinge in no small part on the ability of its engineers to provoke among themselves a more thorough, and less alchemic, appreciation for the materials they are attempting to transmute from paper to gold.
Gary Frost on August 15, 2007 9:02 PM:
I was at an ALA (American Library Association) meeting lately in which the CEO of Kirtas offered $100 to any librarian who could find 10 Google books (out of two million) without an error. The librarians knew it was not worth trying. A half century experience in microfilming teaches you something.
But error free capture is not even the significant issue. The closer screen imaging comes to print presentation the better it will serve as a bibliographic utility of print. The screen presentation acts as a discovery device for print.
And you know why? Because screen navigation is itself the act of comprehension that assimilation of content is in print. Screen navigation, and its distractive activities of de selection and deletion of search results, compile into an activity unrelated to learning from print.
Surely this cannot be! Surely digital presentation can simulate the attributes of print. Well, it can and does, but on paper.
bowerbird on August 16, 2007 1:40 AM:
i admired much about duguid's earlier article.
this one, however, is embarrassingly slipshod,
and all the more because his topic is _quality_.
there are _many_ problems with google's workflow.
but duguid merely glances off a couple of them,
and -- since he used just the one test-book --
even the points he does make are unconvincing...
ben vershbow on August 16, 2007 2:20 AM:
Interesting. Can you elaborate on some of the areas he misses?
bowerbird on August 16, 2007 10:44 AM:
i can, but that analysis deserves a better venue
than the ghetto of the comment section of a blog,
no offense intended...
K.G. Schneider on August 16, 2007 7:36 PM:
I wrote about the "serials" version of this problem over on Critical Mass:
Moving analog to digital is not as simple as wrinkling one's nose or tapping one's ruby heels together.
Gary Frost on August 16, 2007 8:11 PM:
Duguid differentiates innovative interaction with legacy exchange. Such dichotomy is self-referential, each polarity neatly defining the other. Any thesis should have at least three contentions. For example Duguid could point to automated searching as an innovative interaction and traditional paratext features as legacy discovery resources. Both then define each other and advance topical research. But these tools are then subjected to recursive and reflexive application as the investigation continues. It is easy to discover where you are, but not as easy to re-find that discovery in context of continued research. Print navigation is quite elegant in this regard (recursive and reflexive discovery) and screen navigation is in-elegant; print presents one collation physically secure and the screen presents an infinity of arrangements constantly redrawn. So the researcher must conduct a hybrid investigation as time goes on. Will resolution be achieved on the screen or in print? Its not a useful question because the third factor of recursive and reflexive hybrid interactions are much more crucial.
bowerbird on August 16, 2007 8:19 PM:
> Moving analog to digital
> is not as simple as
> wrinkling one's nose or
> tapping one's ruby heels together.
ha! a straw man with a dorothy reference! :+)
digitization is not hard, and i wish people would
stop saying that it is. no, it's relatively _simple_,
with the caveat that you have to _pay_attention_
-- closely and continuously -- to basic details...
(like, um, something as obvious as recording the
actual volume number of the work you're digitizing.
is there anyone out there who considers that "hard"?)
for whatever reasons, google isn't doing it right.
(most people chalk it up to a desire to do it cheap,
but that's very naive, because their inattention to
the details will cost them _more_ in the long run.)
and google isn't alone, either. the o.c.a. is _also_
doing it wrong -- albeit it not _quite_ as badly, but
still in a manner considerably and deeply flawed --
as have all other efforts, from the making of america
to university efforts to good old project gutenberg.
frankly, it's just ridiculous that _nobody_ has yet
developed a body of best-practices on digitization.
there are people being paid big bucks to do just that,
and i suggest that they should be fired, immediately.
Gary Frost on August 23, 2007 8:56 PM:
"Wonderful exchange. Is Google Books more like a library (Paul) (Duguid), or more like a search catalog (Patrick) (Leary)? I find Patrick closer to the mark, in that Google Books is a hybrid between a catalog and a library, or what I think of a networked universal text. Its value lays in the collective, even if the individual units are imperfect -- and they are very imperfect. The new text is searchable, actionable, and animated in a way that neither libraries, books, or even catalogs in the past were." Kevin Kelly
bowerbird on October 6, 2007 12:20 AM:
i've made a number of posts on this topic here:
including two today. only one of those last two has actually
appeared as of right now, but the second can be found here: