six blind men and an elephant 07.03.2007, 9:23 AM
posted by ben vershbow
Thomas Mann, author of The Oxford Guide to Library Research, has published an interesting paper (pdf available) examining the shortcomings of search engines and the continued necessity of librarians as guides for scholarly research. It revolves around the case of a graduate student investigating tribute payments and the Peloponnesian War. A Google search turns up nearly 80,000 web pages and 700 books. An overwhelming retrieval with little in the way of conceptual organization and only the crudest of tools for measuring relevance. But, with the help of the LC Catalog and an electronic reference encyclopedia database, Mann manages to guide the student toward a manageable batch of about a dozen highly germane titles.
Summing up the problem, he recalls a charming old fable from India:
Most researchers - at any level, whether undergraduate or professional - who are moving into any new subject area experience the problem of the fabled Six Blind Men of India who were asked to describe an elephant: one grasped a leg and said "the elephant is like a tree"; one felt the side and said "the elephant is like a wall"; one grasped the tail and said "the elephant is like a rope"; and so on with the tusk ("like a spear"), the trunk ("a hose") and the ear ("a fan"). Each of them discovered something immediately, but none perceived either the existence or the extent of the other important parts - or how they fit together.
Finding "something quickly," in each case, proved to be seriously misleading to their overall comprehension of the subject.
In a very similar way, Google searching leaves remote scholars, outside the research library, in just the situation of the Blind Men of India: it hides the existence and the extent of relevant sources on most topics (by overlooking many relevant sources to begin with, and also by burying the good sources that it does find within massive and incomprehensible retrievals). It also does nothing to show the interconnections of the important parts (assuming that the important can be distinguished, to begin with, from the unimportant).
Mann believes that books will usually yield the highest quality returns in scholarly research. A search through a well tended library catalog (controlled vocabularies, strong conceptual categorization) will necessarily produce a smaller, and therefore less overwhelming quantity of returns than a search engine (books do not proliferate at the same rate as web pages). And those returns, pound for pound, are more likely to be of relevance to the topic:
Each of these books is substantially about the tribute payments - i.e., these are not just works that happen to have the keywords "tribute" and "Peloponnesian" somewhere near each other, as in the Google retrieval. They are essentially whole books on the desired topic, because cataloging works on the assumption of "scope-match" coverage - that is, the assigned LC headings strive to indicate the contents of the book as a whole....In focusing on these books immediately, there is no need to wade through hundreds of irrelevant sources that simply mention the desired keywords in passing, or in undesired contexts. The works retrieved under the LC subject heading are thus structural parts of "the elephant" - not insignificant toenails or individual hairs.
If nothing else, this is a good illustration of how libraries, if used properly, can still be much more powerful than search engines. But it's also interesting as a librarian's perspective on what makes the book uniquely suited for advanced research. That is: a book is substantial enough to be a "structural part" of a body of knowledge. This idea of "whole books" as rungs on a ladder toward knowing something. Books are a kind of conceptual architecture that, until recently, has been distinctly absent on the Web (though from the beginning certain people and services have endeavored to organize the Web meaningfully). Mann's study captures the anxiety felt at the prospect of the book's decline (the great coming blindness), and also the librarian's understandable dread at having to totally reorganize his/her way of organizing things.
It's possible, however, to agree with the diagnosis and not the prescription. True, librarians have gotten very good at organizing books over time, but that's not necessarily how scholarship will be produced in the future. David Weinberg ponders this:
As an argument for maintaining human expertise in manually assembling information into meaningful relationships, this paper is convincing. But it rests on supposing that books will continue to be the locus of worthwhile scholarly information. Suppose more and more scholars move onto the Web and do their thinking in public, in conversation with other scholars? Suppose the Web enables scholarship to outstrip the librarians? Manual assemblages of knowledge would retain their value, but they would no longer provide the authoritative guide. Then we will have either of two results: We will have to rely on "'lowest common denominator'"and 'one search box/one size fits all' searching that positively undermines the requirements of scholarly research"...or we will have to innovate to address the distinct needs of scholars....My money is on the latter.
As I think is mine. Although I would not rule out the possibility of scholars actually participating in the manual assemblage of knowledge. Communities like MediaCommons could to some extent become their own libraries, vetting and tagging a wide array of electronic resources, developing their own customized search frameworks.
There's much more in this paper than I've discussed, including a lengthy treatment of folksonomies (Mann sees them as a valuable supplement but not a substitute for controlled taxonomies). Generally speaking, his articulation of the big challenges facing scholarly search and librarianship in the digital age are well worth the read, although I would argue with some of the conclusions.
bowerbird on July 3, 2007 2:21 PM:
let me know when mann completes his "scope-match"
catalog on the 10 billion websites out there...
Aaron on July 3, 2007 4:45 PM:
If you had read Mann's work, you would know that he does not believe that all websites should be cataloged. And he's right; most of the world's webpages are ephemeral garbage.
bowerbird on July 5, 2007 6:03 PM:
i guess i have to spell it out for people like aaron.
a cost-benefit ratio is obtained via a consideration
of _both_ costs and benefits.
while the _benefits_ of a "scope-match" catalog
are -- without any question -- immensely large,
so too are the _costs_.
anyone who wants us to ignore those costs is being
unreasonable in the extreme.
in the meantime, as the usefulness of google shows,
the extremely-low cost index that we can get by
scouring _the_text_as_text_ -- without any intent
of ascertaining its "meaning" -- is a bargain
Gary Frost on July 5, 2007 10:24 PM:
I entered "bowerbird". It did not save this reader's time.
Do you think the iPhone will be reviewed here as a manifestation of booknet? I am struck by the half square proportion; the convention of the papyrus book of late antiquity. They are both turned as well.
ben vershbow on July 6, 2007 1:51 AM:
Gary, I'm pretty certain the iPhone will be discussed here soon. None of us have one yet and we're debating about when and whether to surrender to our technolust... I mean... when to start researching. I did get to play around with one for a few minutes the other night and the interface is indeed wonderful. And it's a more credible reading device than anything currently available - that much was clear within moments.