Listing entries tagged with libraries
1 | 2 | 3 | 4 | 5 | 6
google, digitization and archives: despatches from if:book 06.16.2008, 3:35 PM
In discussing with other Institute folks how to go about reviewing four year's worth of blog posts, I've felt torn at times. Should I cherry-pick 'thinky' posts that discuss a particular topic in depth, or draw out narratives from strings of posts each of which is not, in itself, a literary gem but which cumulatively form the bedrock of the blog? But I thought about it, and realised that you can't really have one without the other.
Fair use, digitization, public domain, archiving, the role of libraries and cultural heritage are intricately interconnected. But the name that connects all these issues over the last few years has been Google. The Institute has covered Google's incursions into digitization of libraries (amongst other things) in a way that has explored many of these issues - and raised questions that are as urgent as ever. Is it okay to privatize vast swathes of our common cultural heritage? What are the privacy issues around technology that tracks online reading? Where now for copyright, fair use and scholarly research?
In-depth coverage of Google and digitization has helped to draw out many of the issues central to this blog. Thus, in drawing forth the narrative of if:book's Google coverage is, by extension, to watch a political and cultural stance emerging. So in this post I've tried to have my cake and eat it - to trace a story, and to give a sense of the depth of thought going into that story's discussion.
In order to keep things manageable, I've kept this post to a largely Google-centric focus. Further reviews covering copyright-related posts, and general discussion of libraries and technology will follow.
2004-5: Google rampages through libraries, annoys Europe, gains rivals
In December 2004, if:book's first post about Google's digitization of libraries gave the numbers for the University of Michigan project.
In February 2005, the head of France's national libraries raised a battle cry against the Anglo-centricity implicit in Google's plans to digitize libraries. The company's seemingly relentless advance brought Europe out in force to find ways of forming non-Google coalitions for digitization.
In August, Google halted book scans for a few months to appease publishers angry at encroachments on their copyright. But this was clearly not enough, as in October 2005, Google was sued (again) by a string of publishers for massive copyright infringement. However, undeterred either by European hostility or legal challenges, the same month the company made moves to expand Google Print into Europe. Also in October 2005, Yahoo! launched the Open Content Alliance, which was joined by Microsoft around the same time. Later the same month, a Wired article put the case for authors in favor of Google's searchable online archive.
In November 2005 Google announced that from here on in Google Print would be known as Google Book Search, as the 'Print' reference perhaps struck too close to home for publishers. The same month, Ben savaged Google Print's 'public domain' efforts - then recanted (a little) later that month.
In December 2005 Google's digitization was still hot news - the Institute did a radio show/podcast with Open Source on the topic, and covered the Google Book Search debate at the American Bar Association. (In fact, most of that month's posts are dedicated to Google and digitization and are too numerous to do justice to here).
2006: Digitization spreads
By 2006, digitization and digital archives - with attendant debates - are spreading. From January through March, three posts - 'The book is reading you' parts 1, 2 and 3 looked at privacy, networked books, fair use, downloading and copyright around Google Book Search. Also in March, a further post discussed Google and Amazon's incursions into publishing.
In April, the Smithsonian cut a deal with Showtime making the media company a preferential media partner for documentaries using Smithsonian resources. Jesse analyzed the implications for open research.
In June, the Library of Congress and partners launched a project to make vintage newspapers available online. Google Book Search, meanwhile, was tweaked to reassure publishers that the new dedicated search page was not, in fact, a library. The same month, Ben responded thoughtfully in June 2006 to a French book attacking Google, and by extension America, for cultural imperialism. The debate continued with a follow-up post in July.
In August, Google announceddownloadable PDF versions of many of its public-domain books. Then, in August, the publication of Google's contract with UCAL's library prompted some debate the same month. In October we reported on Microsoft's growing book digitization list, and some criticism of the same from Brewster Kahle. The same month, we reported that the Dutch government is pouring millions into a vast public digitization program.
In December, Microsoft launched its (clunkier) version of Google Books, Microsoft Live Book Search.
2007: Google is the environment
In January, former Netscape player Rich Skrenta crowned Google king of the 'third age of computing': 'Google is the environment', he declared. Meanwhile, having seemingly forgotten 2005's tussles, the company hosted a publishing conference at the New York Public Library. In February the company signed another digitization deal, this time with Princeton; in August, this institution was joined by Cornell, and the Economist compared Google's databases to the banking system of the information age. The following month, Siva's first Monday podcast discussed the Googlization of libraries.
By now, while Google remains a theme, commercial digitization of public-domain archives is a far broader issue. In January, the US National Archives cut a digitization deal with Footnote, effectively paywalling digital access to a slew of public-domain documents; in August, a deal followd with Amazon for commercial distribution of its film archive. The same month, two major audiovisual archiving projects launched.
In May, Ben speculated about whether some 'People's Card Catalog' could be devised to rival Google's gated archive. The Open Archive launched in July, to mixed reviews - the same month that the ongoing back-and-forth between the Institute and academic Siva Vaidyanathan bore fruit. Siva's networked writing project, The Googlization Of Everything, was announced (this would be launched in September). Then, in August, we covered an excellent piece by Paul Duguid discussing the shortcomings of Google's digitization efforts.
In October, several major American libraries refused digitization deals with Google. By November, Google and digitization had found its way into the New Yorker; the same month the Library of Congress put out a call for e-literature links to be archived.
2008: All quiet?
But if Google coverage has been slighter this year, that's not to suggest a happy ending to the story. Microsoft abandoned its book scanning project in mid-May of this year, raising questions about the viability of the Open Content Alliance. It would seem as though Skrenta was right. The Googlization of Everything continues, less challenged than ever.
looking at libraries 05.06.2008, 4:18 PM
A few weeks back though the auspices of TED, I paid a visit to a private library. The owner doesn't want publicity, and I won't reveal details, but it was a staggeringly beautiful (if idiosyncratic) collection, and I can't imagine that there are many collections in private hands that rival it in value in the United States. Just about every lavish book imaginable was present: an elephant folio of Audobon along with a full set of John Gould's more sumptuous prints of birds; a Kelmscott Chaucer; a page from a Gutenberg Bible; a first edition of Johnson's Dictionary; countless antique atlases of anatomy and cosmography; the Arion Press edition of Ulysses illustrated by Robert Motherwell; hand-illuminated Books of Hours. There were exquisite jeweled bindings, books woven entirely from silk, and doubtless many more things that couldn't be seen in a three-hour tour. The collector mentioned in passing that he was thinking of buying a Wyclif Bible for around $600,000 because he didn't have one yet.
Being no stranger to libraries, I'd seen many of these books before. Generally they're the sort of books you see in the context of a museum or library, occasionally for sale in a gallery. They're the sort of books that are generally found safely behind glass, books that one wears white gloves to touch. This was not such a collection: it's not open to the public at all, only to the collector's friends. A librarian would also be astonished that this collection of 30,000 books has no catalogue – the owner shelves all the books himself (by height, for which there's historical precedent) and claims that he remembers where he put things. But what was most striking to me about my visit was how freely the books were handled by the owner, and how freely he allowed his guests to handle his books – not in a cavalier way, but in the way one touches a book one owns. The librarian in me suppressed a gasp when the owner explained how in the summer he opens the bay windows of the library and lets the breeze in. I'm sure that's not how the Morgan Library works.
The collector can afford to let his visitors touch his books. In a way, the books in his collection are functioning as they are intended to function: as objects to be read and appreciated. They're also functioning as signifiers of luxury. His collection is a repository of wealth in a way less metaphorical than we usually talk about library as repositories. No library, private or public, exists entirely outside of this economic system; it's an integral part of the way we consider books.
Walking north on Laguardia Place last week, I was struck by how monolithic NYU's Bobst Library appears from the south: it's a hulking red-brick edifice that admits no entrance:
From inside it's all windows and light, open stacks to be browsed. But: there's the matter of getting inside, as admission is reserved to those with an NYU ID card. Those without cards are excluded. This is a necessary condition for the library to function: long ago on this blog I bemoaned the condition of the Brooklyn Library, where it's almost impossible to find any book you're looking for, though there's still the pleasure of browsing. The quality of a collection seems to be inversely related to the number of people kept out. Keeping the books in and the world out is demonstrated elegantly by the thin marble windows of Yale's Beinecke library which admit a small amount of light but not the viewer's gaze:
What's inside and outside – who's inside and outside – are completely separated. The poet Susan Howe inspects this separation in her book The Midnight, a volume which takes as one of its primary subjects interleaves, the sheets of tissue paper that publishers once put next to plates in books "in order to prevent illustration and text from rubbing together." Howe's work tends to be archivally based: she looks at how manuscripts are read or misread, and consequently has spent a lot of time in libraries. In this prose passage from the book, part of a section entitled "Scare Quotes II", she looks at the way one enters Houghton, Harvard's analogue to the Beinecke:
1991. Entering Houghton Library: Harvard Yard, 9:00 a.m., a fine June summer morning. At the entrance to the red-brick building designed by Robert C. Dean of Perry, Shaw and Hepburn in 1940, two single wooden doors with hinges, concealing two modernist plate glass doors without frames, have been swung into recesses to the left and right so as to be barely visible during open hours. The only metal fitting in each glass consists of a polished horizontal bar at waist height a visitor must pull to open. I enter an oval vestibule, about 10 feet wide and 5–6 feet deep, before me double doors again; again plate glass.
Passing through this first vestibule I find myself in an oval reception antechamber about 35 feet wide and 20 feet deep under what appears to be a ceiling with a dome at its apex. I think I see sunlight but closer inspection reveals electric light concealed under a slightly dropped form, also oval, illuminating the ceiling above. This first false skylight resembles a human eye and the central oval disc its 'pupil.' Maybe ghosts exist as spatiotemporal coordinates, even if they themselves do not occupy space, even if you've never seen one, so what? If the design of the antechamber can be read in terms of power and regimes of library control, and if ghosts 'presently' 'occupy' papers, you need to understand the present tense of 'occupy.'
To enter this neo-Georgian building (a few Modernist touches added) with its state of the art technology for air filtration, security and controlled temperature and humidity for the preservation of materials, is to turn away from contemporary city life with all its follies and parasites in search of a second coming for dry bones. When the soul of a scholar has an inward bent and bias for an author in the Kingdom of Houghton, it is never at rest, until here. Perversely, nothing in Houghton awakens security sooner than curiosity.
Here – every researcher can be a perpetrator.
( pp. 120–121) While Houghton isn't as architecturally ostentatious as the Beinecke, Howe's scrutiny of the architecture of its entrance reveals it to be just as concerned with control. There's a pessimistic view of human behavior embedded in library construction and the watchfulness of the sentries who guard them: if we, the public, could get at the books, we would most certainly destroy them.
There was the expectation that the barriers would be torn down with the coming of electronic libraries, that once the book's spirit left its object, it would likewise escape its economic shackles. Certainly it makes sense: an electronic text isn't degraded by copying in the same way that every reading is an infinitesimal destruction of a physical book. It's unclear, however, that the media universe that's unfolding is following this pattern: while sites like archive.org present a new model, projects like Google Books simply reconfigure the gates.
au courant 11.06.2007, 8:33 AM
Paul Courant is the University Librarian at the University of Michigan as well as a professor of economics. And he now has a blog. He leads off with a response to critics (including Brewster Kahle and Siva Vaidhyanathan) of Michigan's book digitization partnership with Google. Siva responds back on Googlization of Everything. Great to see a university librarian entering the public debate in this way.
"digitization and its discontents" 11.06.2007, 8:14 AM
Anthony Grafton's New Yorker piece "Future Reading" paints a forbidding picture of the global digital library currently in formation on public and private fronts around the world (Google et al.). The following quote sums it up well - ?a refreshing counterpoint to the millenarian hype we so often hear w/r/t mass digitization:
The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we'll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.
Grafton begins and ends in a nostalgic tone, with a paean to the New York Public Library and the critic Alfred Kazin: the poor son of immigrants, City College-educated, who researched his seminal study of American literature On Native Grounds almost entirely with materials freely available at the NYPL. Clearly, Grafton is a believer in the civic ideal of the public library - ?a reservoir of knowledge, free to all - ?and this animates his critique of the balkanized digital landscape of search engines and commercial databases. Given where he appears to stand, I wish he could have taken a stab at what a digital public library might look like, and what sorts of technical, social, political and economic reorganization might be required to build it. Obviously, these are questions that would have required their own article, but it would have been valuable for Grafton, whose piece is one of those occasional journalistic events that moves the issue of digitization and the future of libraries out of the specialist realm into the general consciousness, to have connected the threads. Instead Grafton ends what is overall a valuable and intelligent article with a retreat into print fetishism - ?"crowded public rooms where the sunlight gleams on varnished tables....millions of dusty, crumbling, smelly, irreplaceable documents and books" - ?which, while evocative, obscures more than it illuminates.
Incidentally, those questions are precisely what was discussed at our Really Modern Library meetings last month. We're still compiling our notes but expect a report soon.
cornell joins google book search 08.08.2007, 1:48 PM
...offering up to 500,000 items for digitization. From the Cornell library site:
Cornell is the 27th institution to join the Google Book Search Library Project, which digitizes books from major libraries and makes it possible for Internet users to search their collections online. Over the next six years, Cornell will provide Google with public domain and copyrighted holdings from its collections. If a work has no copyright restrictions, the full text will be available for online viewing. For books protected by copyright, users will just get the basic background (such as the book's title and the author's name), at most a few lines of text related to their search and information about where they can buy or borrow a book. Cornell University Library will work with Google to choose materials that complement the contributions of the project's other partners. In addition to making the materials available through its online search service, Google will also provide Cornell with a digital copy of all the materials scanned, which will eventually be incorporated into the university's own digital library.
the open library 07.18.2007, 1:40 PM
A little while back I was musing on the possibility of a People's Card Catalog, a public access clearinghouse of information on all the world's books to rival Google's gated preserve. Well thanks to the Internet Archive and its offshoot the Open Content Alliance, it looks like we might now have it - ?or at least the initial building blocks. On Monday they launched a demo version of the Open Library, a grand project that aims to build a universally accessible and publicly editable directory of all books: one wiki page per book, integrating publisher and library catalogs, metadata, reader reviews, links to retailers and relevant Web content, and a menu of editions in multiple formats, both digital and print.
Imagine a library that collected all the world's information about all the world's books and made it available for everyone to view and update. We're building that library.
The official opening of Open Library isn't scheduled till October, but they've put out the demo now to prove this is more than vaporware and to solicit feedback and rally support. If all goes well, it's conceivable that this could become the main destination on the Web for people looking for information in and about books: a Wikipedia for libraries. On presentation of public domain texts, they already have Google beat, even with recent upgrades to the GBS system including a plain text viewing option. The Open Library provides TXT, PDF, DjVu (a high-res visual document browser), and its own custom-built Book Viewer tool, a digital page-flip interface that presents scanned public domain books in facing pages that the reader can leaf through, search and (eventually) magnify.
Page turning interfaces have been something of a fad recently, appearing first in the British Library's Turning the Pages manuscript preservation program (specifically cited as inspiration for the OL Book Viewer) and later proliferating across all manner of digital magazines, comics and brochures (often through companies that you can pay to convert a PDF into a sexy virtual object complete with drag-able page corners that writhe when tickled with a mouse, and a paper-like rustling sound every time a page is turned).
This sort of reenactment of paper functionality is perhaps too literal, opting for imitation rather than innovation, but it does offer some advantages. Having a fixed frame for reading is a relief in the constantly scrolling space of the Web browser, and there are some decent navigation tools that gesture toward the ways we browse paper. To either side of the open area of a book are thin vertical lines denoting the edges of the surrounding pages. Dragging the mouse over the edges brings up scrolling page numbers in a small pop-up. Clicking on any of these takes you quickly and directly to that part of the book. Searching is also neat. Type a query and the book is suddenly interleaved with yellow tabs, with keywords highlighted on the page, like so:
But nice as this looks, functionality is sacrificed for the sake of fetishism. Sticky tabs are certainly a cool feature, but not when they're at the expense of a straightforward list of search returns showing keywords in their sentence context. These sorts of references to the feel and functionality of the paper book are no doubt comforting to readers stepping tentatively into the digital library, but there's something that feels disjointed about reading this way: that this is a representation of a book but not a book itself. It is a book avatar. I've never understood the appeal of those Second Life libraries where you must guide your virtual self to a virtual shelf, take hold of the virtual book, and then open it up on a virtual table. This strikes me as a failure of imagination, not to mention tedious. Each action is in a sense done twice: you operate a browser within which you operate a book; you move the hand that moves the hand that moves the page. Is this perhaps one too many layers of mediation to actually be able to process the book's contents? Don't get me wrong, the Book Viewer and everything the Open Library is doing is a laudable start (cause for celebration in fact), but in the long run we need interfaces that deal with texts as native digital objects while respecting the originals.
What may be more interesting than any of the technology previews is a longish development document outlining ambitious plans for building the Open Library user interface. This covers everything from metadata standards and wiki templates to tagging and OCR proofreading to search and browsing strategies, plus a well thought-out list of user scenarios. Clearly, they're thinking very hard about every conceivable element of this project, including the sorts of things we frequently focus on here such as the networked aspects of texts. Acolytes of Ted Nelson will be excited to learn that a transclusion feature is in the works: a tool for embedding passages from texts into other texts that automatically track back to the source (hypertext copy-and-pasting). They're also thinking about collaborative filtering tools like shared annotations, bookmarking and user-defined collections. All very very good, but it will take time.
Building an open source library catalog is a mammoth undertaking and will rely on millions of hours of volunteer labor, and like Wikipedia it has its fair share of built-in contradictions. Jessamyn West of librarian.net put it succinctly:
It's a weird juxtaposition, the idea of authority and the idea of a collaborative project that anyone can work on and modify.
But the only realistic alternative may well be the library that Google is building, a proprietary database full of low-quality digital copies, a semi-accessible public domain prohibitively difficult to use or repurpose outside the Google reading room, a balkanized landscape of partner libraries and institutions left in its wake, each clutching their small slice of the digitized pie while the whole belongs only to Google, all of it geared ultimately not to readers, researchers and citizens but to consumers. Construed more broadly to include not just books but web pages, videos, images, maps etc., the Google library is a place built by us but not owned by us. We create and upload much of the content, we hand-make the links and run the search queries that program the Google brain. But all of this is captured and funneled into Google dollars and AdSense. If passive labor can build something so powerful, what might active, voluntary labor be able to achieve? Open Library aims to find out.
of shelves and selves 07.09.2007, 12:27 PM
William Drenttel has a lovely post over on Design Observer about the exquisite information of bookshelves, a meditation spurred by 60 photographs of the library of renowned San Francisco designer, typographer, printer and founder of Greenwood Press Jack Stauffacher. Each image (they were taken by Dennis Letbetter) gives a detailed view of one section of Stauffacher's shelves, a rare glimpse of one individual's bibliographic DNA, made browseable as a slideshow (unfortunately, the images are not reassembled at the end to give a full view of the collection).
Early evidence suggests that the impulse toward personal mapping through media won't abate as we go deeper into the digital. Delicious Library and Library Thing are more or less direct transpositions of physical shelves to the computer environment, the latter with an added social dimension (people meeting through their virtual shelves). More generally, social networking sites from Facebook to MySpace are full of self-signification through shelves, or rather lists, of favorite books, movies and music. Social bookmarking sites too bear traces of identity in the websites people save and tag (the tags themselves are a kind of personal signature). Much of the texture and spatial language of the physical may be lost, a new social terrain has opened up, one which we're only beginning to understand.
But it's not as though physical bookshelves haven't always been social. We arrange books not only for our own conceptual orientation, but to give others who venture into our space a sense of our self (or what we'd like to appear as our self), our distinct intellectual algorithm. Browsing a friend's thoughtfully arranged shelf is like looking through a lens calibrated to their view of the world, especially when those books have played a crucial role, as in Stauffacher's, in shaping a life's work. Drenttel savors the idiosyncrasies that inevitably are etched into such a collection:
I have seen many great rare book libraries.... But the libraries I most enjoy are working libraries, where the books have been used and cited and annotated - first editions marred with underlining, notes throughout their pages. (I will always remember the chaos of Susan Sontag's library, where every book had been touched, read and filled with notes and ephemera.) The organization of a working library is seldom alphabetical...but rather follows some particular mental construct of its owner. Jack Stauffacher's shelves have some order, one knows. But it is his order, his life.
Or, in Stauffacher's own words:
Without this working library, I would have no compass, no map, to guide me through the density of our human condition.