Listing entries tagged with digitization
1 | 2 | 3 | 4 | 5
google, digitization and archives: despatches from if:book 06.16.2008, 3:35 PM
In discussing with other Institute folks how to go about reviewing four year's worth of blog posts, I've felt torn at times. Should I cherry-pick 'thinky' posts that discuss a particular topic in depth, or draw out narratives from strings of posts each of which is not, in itself, a literary gem but which cumulatively form the bedrock of the blog? But I thought about it, and realised that you can't really have one without the other.
Fair use, digitization, public domain, archiving, the role of libraries and cultural heritage are intricately interconnected. But the name that connects all these issues over the last few years has been Google. The Institute has covered Google's incursions into digitization of libraries (amongst other things) in a way that has explored many of these issues - and raised questions that are as urgent as ever. Is it okay to privatize vast swathes of our common cultural heritage? What are the privacy issues around technology that tracks online reading? Where now for copyright, fair use and scholarly research?
In-depth coverage of Google and digitization has helped to draw out many of the issues central to this blog. Thus, in drawing forth the narrative of if:book's Google coverage is, by extension, to watch a political and cultural stance emerging. So in this post I've tried to have my cake and eat it - to trace a story, and to give a sense of the depth of thought going into that story's discussion.
In order to keep things manageable, I've kept this post to a largely Google-centric focus. Further reviews covering copyright-related posts, and general discussion of libraries and technology will follow.
2004-5: Google rampages through libraries, annoys Europe, gains rivals
In December 2004, if:book's first post about Google's digitization of libraries gave the numbers for the University of Michigan project.
In February 2005, the head of France's national libraries raised a battle cry against the Anglo-centricity implicit in Google's plans to digitize libraries. The company's seemingly relentless advance brought Europe out in force to find ways of forming non-Google coalitions for digitization.
In August, Google halted book scans for a few months to appease publishers angry at encroachments on their copyright. But this was clearly not enough, as in October 2005, Google was sued (again) by a string of publishers for massive copyright infringement. However, undeterred either by European hostility or legal challenges, the same month the company made moves to expand Google Print into Europe. Also in October 2005, Yahoo! launched the Open Content Alliance, which was joined by Microsoft around the same time. Later the same month, a Wired article put the case for authors in favor of Google's searchable online archive.
In November 2005 Google announced that from here on in Google Print would be known as Google Book Search, as the 'Print' reference perhaps struck too close to home for publishers. The same month, Ben savaged Google Print's 'public domain' efforts - then recanted (a little) later that month.
In December 2005 Google's digitization was still hot news - the Institute did a radio show/podcast with Open Source on the topic, and covered the Google Book Search debate at the American Bar Association. (In fact, most of that month's posts are dedicated to Google and digitization and are too numerous to do justice to here).
2006: Digitization spreads
By 2006, digitization and digital archives - with attendant debates - are spreading. From January through March, three posts - 'The book is reading you' parts 1, 2 and 3 looked at privacy, networked books, fair use, downloading and copyright around Google Book Search. Also in March, a further post discussed Google and Amazon's incursions into publishing.
In April, the Smithsonian cut a deal with Showtime making the media company a preferential media partner for documentaries using Smithsonian resources. Jesse analyzed the implications for open research.
In June, the Library of Congress and partners launched a project to make vintage newspapers available online. Google Book Search, meanwhile, was tweaked to reassure publishers that the new dedicated search page was not, in fact, a library. The same month, Ben responded thoughtfully in June 2006 to a French book attacking Google, and by extension America, for cultural imperialism. The debate continued with a follow-up post in July.
In August, Google announceddownloadable PDF versions of many of its public-domain books. Then, in August, the publication of Google's contract with UCAL's library prompted some debate the same month. In October we reported on Microsoft's growing book digitization list, and some criticism of the same from Brewster Kahle. The same month, we reported that the Dutch government is pouring millions into a vast public digitization program.
In December, Microsoft launched its (clunkier) version of Google Books, Microsoft Live Book Search.
2007: Google is the environment
In January, former Netscape player Rich Skrenta crowned Google king of the 'third age of computing': 'Google is the environment', he declared. Meanwhile, having seemingly forgotten 2005's tussles, the company hosted a publishing conference at the New York Public Library. In February the company signed another digitization deal, this time with Princeton; in August, this institution was joined by Cornell, and the Economist compared Google's databases to the banking system of the information age. The following month, Siva's first Monday podcast discussed the Googlization of libraries.
By now, while Google remains a theme, commercial digitization of public-domain archives is a far broader issue. In January, the US National Archives cut a digitization deal with Footnote, effectively paywalling digital access to a slew of public-domain documents; in August, a deal followd with Amazon for commercial distribution of its film archive. The same month, two major audiovisual archiving projects launched.
In May, Ben speculated about whether some 'People's Card Catalog' could be devised to rival Google's gated archive. The Open Archive launched in July, to mixed reviews - the same month that the ongoing back-and-forth between the Institute and academic Siva Vaidyanathan bore fruit. Siva's networked writing project, The Googlization Of Everything, was announced (this would be launched in September). Then, in August, we covered an excellent piece by Paul Duguid discussing the shortcomings of Google's digitization efforts.
In October, several major American libraries refused digitization deals with Google. By November, Google and digitization had found its way into the New Yorker; the same month the Library of Congress put out a call for e-literature links to be archived.
2008: All quiet?
But if Google coverage has been slighter this year, that's not to suggest a happy ending to the story. Microsoft abandoned its book scanning project in mid-May of this year, raising questions about the viability of the Open Content Alliance. It would seem as though Skrenta was right. The Googlization of Everything continues, less challenged than ever.
sparkles from the wheel 11.30.2007, 5:19 PM
Walt Whitman's poem "Sparkles from the Wheel" beautifully captures the pleasure and exhilaration of watching work in progress:
WHERE the city's ceaseless crowd moves on, the live-long day,
Withdrawn, I join a group of children watching - ?I pause aside with them.
By the curb, toward the edge of the flagging,
A knife-grinder works at his wheel, sharpening a great knife;
Bending over, he carefully holds it to the stone - ?by foot and knee,
With measur'd tread, he turns rapidly - ?As he presses with light but firm hand,
Forth issue, then, in copious golden jets,
Sparkles from the wheel.
The scene, and all its belongings - ?how they seize and affect me!
The sad, sharp-chinn'd old man, with worn clothes, and broad shoulder-band of leather;
Myself, effusing and fluid - ?a phantom curiously floating - ?now here absorb'd and arrested;
The group, (an unminded point, set in a vast surrounding;)
The attentive, quiet children - ?the loud, proud, restive base of the streets;
The low, hoarse purr of the whirling stone - ?the light-press'd blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.
I was reminded of this the other day while reading a brief report in Library Journal on Siva's recent cross-blog argument with Michigan University Librarian Paul Courant about Google book digitization contracts. These sorts of exchanges are not new in themselves, but blogs have made it possible for them to occur much more spontaneously and, in Siva's case, to put them visibly in the context of a larger intellectual project. It's a nice snapshot of the sort of moment that can happen along the way when the writing process is made more transparent -? seeing an argument crystallize or a position get clarified. And there's a special kind of pleasure and exhilaration that comes from reading this way, seeing Siva sharpening his knife -? or argument -? and the rhetorical sparks that fly off the screen. Here's that Library Journal bit:
Discussion of Google Scan Plan Heats Up on Blogs:
Now this is why we love the Blogosphere. In launching his blog, University of Michigan's (UM) dean of libraries Paul Courant recently offered a spirited defense of UM's somewhat controversial scan plan with Google. That post drew quite a few comments, and a direct response from Siva Vaidhyanathan the author, blogger, and University of Virginia professor currently writing the Googlization of Everything online at the Institute for the Future of the Book; that of course drew a response from Courant. The result? A lively and illuminating dialog on Google's book scanning efforts.
how to keep google's books open 11.27.2007, 5:27 PM
Whip-smart law blogger Frank Pasquale works through his evolving views on digital library projects and search engines, proposing a compelling strategy for wringing some public good from the tangle of lawsuits surrounding Google Book Search. It hinges on a more expansive (though absolutely legally precedented) interpretation of fair use that takes the public interest and not just market factors into account. Recommended reading. (Thanks, Siva!)
cooking the books 11.08.2007, 11:44 AM
I've been digging through old episodes of Black Books, a relatively little-known comedy series from the UK's Channel 4. The show is set in a second-hand bookshop, run by Bernard Black, a chainsmoking, alcoholic Irishman (Dylan Moran) who shuts the shop at strange hours, swears at customers and becomes enraged when people actually want to buy his books.
It started me thinking about something Nick Currie said at the second Really Modern Library meeting. We were talking about mass digitization and the apparently growing appeal of 'the original', the 'real thing'. The feel of a printed page; the smell of a first edition and so on. He mentioned a previous riff of his about 'the post-bit atom' - the one last piece of any analog cultural object that can't be digitized - and which, in an age of mass digitisation, becomes fetishized to precisely the degree that the digitized object becomes a commodity.
So Black Books struck me as (besides being horribly funny) strangely poignant. While acerbic, in many ways it's full of nostalgia for a kind of independent bookshop that's rapidly disappearing. Bernard Black would be considerably less endearing if he was my only chance of getting the book I wanted; but that in the age of Amazon and Waterstone's, he represents a post-bit atom of bibliophilia, and as such is ripe for fetishization.
au courant 11.06.2007, 8:33 AM
Paul Courant is the University Librarian at the University of Michigan as well as a professor of economics. And he now has a blog. He leads off with a response to critics (including Brewster Kahle and Siva Vaidhyanathan) of Michigan's book digitization partnership with Google. Siva responds back on Googlization of Everything. Great to see a university librarian entering the public debate in this way.
"digitization and its discontents" 11.06.2007, 8:14 AM
Anthony Grafton's New Yorker piece "Future Reading" paints a forbidding picture of the global digital library currently in formation on public and private fronts around the world (Google et al.). The following quote sums it up well - ?a refreshing counterpoint to the millenarian hype we so often hear w/r/t mass digitization:
The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we'll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.
Grafton begins and ends in a nostalgic tone, with a paean to the New York Public Library and the critic Alfred Kazin: the poor son of immigrants, City College-educated, who researched his seminal study of American literature On Native Grounds almost entirely with materials freely available at the NYPL. Clearly, Grafton is a believer in the civic ideal of the public library - ?a reservoir of knowledge, free to all - ?and this animates his critique of the balkanized digital landscape of search engines and commercial databases. Given where he appears to stand, I wish he could have taken a stab at what a digital public library might look like, and what sorts of technical, social, political and economic reorganization might be required to build it. Obviously, these are questions that would have required their own article, but it would have been valuable for Grafton, whose piece is one of those occasional journalistic events that moves the issue of digitization and the future of libraries out of the specialist realm into the general consciousness, to have connected the threads. Instead Grafton ends what is overall a valuable and intelligent article with a retreat into print fetishism - ?"crowded public rooms where the sunlight gleams on varnished tables....millions of dusty, crumbling, smelly, irreplaceable documents and books" - ?which, while evocative, obscures more than it illuminates.
Incidentally, those questions are precisely what was discussed at our Really Modern Library meetings last month. We're still compiling our notes but expect a report soon.
googlesoft gets the brush-off 10.22.2007, 1:55 PM
This is welcome. Several leading American research libraries including the Boston Public and the Smithsonian have said no thanks to Google and Microsoft book digitization deals, opting instead for the more costly but less restrictive Open Content Alliance/Internet Archive program. The NY Times reports, and explains how private foundations like Sloan are funding some of the OCA partnerships.
the really modern library 10.08.2007, 1:48 AM
This is a request for comments. We're in the very early stages of devising, in partnership with Peter Brantley and the Digital Library Federation, what could become a major initiative around the question of mass digitization. It's called "The Really Modern Library."
Over the course of this month, starting Thursday in Los Angeles, we're holding a series of three invited brainstorm sessions (the second in London, the third in New York) with an eclectic assortment of creative thinkers from the arts, publishing, media, design, academic and library worlds to better wrap our minds around the problems and sketch out some concrete ideas for intervention. Below I've reproduced some text we've been sending around describing the basic idea for the project, followed by a rough agenda for our first meeting. The latter consists mainly of questions, most of which, if not all, could probably use some fine tuning. Please feel encouraged to post responses, both to the individual questions and to the project concept as a whole. Also please add your own queries, observations or advice.
The Really Modern Library (basically)
The goal of this project is to shed light on the big questions about future accessibility and usability of analog culture in a digital, networked world.
We are in the midst of a historic "upload," a frenetic rush to transfer the vast wealth of analog culture to the digital domain. Mass digitization of print, images, sound and film/video proceeds apace through the efforts of actors public and private, and yet it is still barely understood how the media of the past ought to be preserved, presented and interconnected for the future. How might we bring the records of our culture with us in ways that respect the originals but also take advantage of new media technologies to enhance and reinvent them?
Our aim with the Really Modern Library project is not to build a physical or even a virtual library, but to stimulate new thinking about mass digitization and, through the generation of inspiring new designs, interfaces and conceptual models, to spur innovation in publishing, media, libraries, academia and the arts.
The meeting in October will have two purposes. The first is to deepen and extend our understanding of the goals of the project and how they might best be achieved. The second is to begin outlining plans for a major international design competition calling for proposals, sketches, and prototypes for a hypothetical "really modern library." This competition will seek entries ranging from the highly particular (for e.g., designs for digital editions of analog works, or new tools and interfaces for handling pre-digital media) to the broadly conceptual (ideas of how to visualize, browse and make use of large networked collections).
This project is animated by a strong belief that it is the network, more than the simple conversion of atoms to bits, that constitutes the real paradigm shift inherent in digital communication. Therefore, a central question of the Really Modern Library project and competition will be: how does the digital network change our relationship with analog objects? What does it mean for readers/researchers/learners to be in direct communication in and around pieces of media? What should be the *social* architecture of a really modern library?
The call for entries will go out to as broad a community as possible, including designers, artists, programmers, hackers, librarians, archivists, activists, educators, students and creative amateurs. Our present intent is to raise a large sum of money to administer the competition and to have a pool for prizes that is sufficiently large and meaningful that it can compel significant attention from the sort of minds we want working on these problems.
Although we have tended to divide the Really Modern Library Project into two stages - the first addressing the question of how we might best take analog culture with us into the digitally networked future and the second, how the digitally networked library of the future might best be conceived and organized - these questions are joined at the hip and not easily or productively isolated from each other.
Realistically, any substantive answer to the question of how to re-present artifacts of analog culture in the digital network immediately raises issues ranging from new forms of browsing (in a social network) to new forms of reading (in a social network) which have everything to do with the broader infrastructure of the library itself.
We're going to divide the day roughly in half, spending the morning confronting the broader conceptual issues and the afternoon discussing what kind of concrete intervention might make sense.
Questions to think about in preparation for the morning discussion:
* if it's assumed that form and content are inextricably linked, what happens when we take a book and render it on a dynamic electronic screen rather than bound paper? same question for movies which move from the large theatrical presentation to the intimacy of the personal screen. interestingly the "old" analog forms aren't as singular as they might seem. books are read silently alone or out loud in public; music is played live and listened to on recordings. a recording of a Beethoven symphony on ten 78rpm discs presents quite a different experience than listening to it on an iPod with random access. from this perspective how do we define the essence of a work which needs to be respected and protected in the act of re-presentation?
* twenty years ago we added audio commentary tracks to movies and textual commentary to music. given the spectacular advances in computing power, what are the most compelling enhancements we might imagine. (in preparation for this, you may find it useful to look at a series of exchanges that took place on the if:book blog regarding an "ideal presentation of Ulysses" (here and here).
* what are the affordances of locating a work in the shared social space of a digital network. what is the value of putting readers, viewers, and listeners of specific works in touch with each other. what can we imagine about the range of interactions that are possible and worthwhile. be expansive here, extrapolating as far out as possible from current technical possibilities.
* it seems to us that visualization tools will be crucial in the digital future both for opening up analog works in new ways and for browsing and making sense of massive media archives. if everything is theoretically connected to everything else, how do we make those connections visible in a way that illuminates rather than overwhelms? and how do we visualize the additional and sometimes transformative connections that people make individually and communally around works? how do we visualize the conversation that emerges?
* in the digital environment, all media break down into ones and zeros. all media can be experienced on a single device: a computer. what are the implications of this? what are the challenges in keeping historical differences between media forms in perspective as digitization melts everything together?
* what happens when computers can start reading all the records of human civilization? in other words, when all analog media are digitized, what kind of advanced data crunching can we do and what sorts of things might it reveal?
* most analog works were intended to be experienced with all of one's attention, but the way we read/watch/listen/look is changing. even when engaging with non-networked media -? a paper book, a print newspaper, a compact disc, a DVD, a collection of photos -? we increasingly find ourselves Googling alongside. Al Pacino paces outside the bank in 'dog day afternoon' firing up the crowded street with "Attica! Attica!" I flip to Wikipedia and do quick read on the Attica prison riots. reading "song of myself" in "leaves of grass," i find my way to the online Whitman archive, which allows me to compare every iteration of Whitman's evolutionary work. or reading "ulysses" i open up Google Earth and retrace Bloom's steps by satellite. while leafing through a book of caravaggio's paintings, a quick google video search leads me to a related episode in simon schama's "power of art" documentary series and a series of online essays. as radiohead's new album plays, i browse fan sites and blogs for backstory, b-sides and touring info. the immediacy and proximity of such supplementary resources changes our relationship to the primary ones. the ratio of text to context is shifting. how should this influence the structure and design of future digital editions?
* if we do decide to mount a competition (we're still far from decided on whether this is the right approach), how exactly should it work? first off, what are we judging? what are we hoping to reward? what is the structure of this contest? what are the motivators? a big concern is that the top-down model -? panel of prestigious judges, serious prize money etc. -? feels very old-fashioned and ignores the way in which much of the recent innovation in digital media has taken place: an emergent, grassroots ferment... open source culture, web2.0, or what have you. how can we combine the heft and focused energy of the former with the looseness and dynamism of the latter? is there a way to achieve some sort of top-down orchestration of emergent creativity? is "competition" maybe the wrong word? and how do we create a meaningful public forum that can raise consciousness of these issues more generally? an accompanying website? some other kind of publication? public events? a conference?
* where are the leverage points are for an intervention in this area? what are the key consituencies, national vs. international?
* for reasons both practical and political, we've considered restricting this contest to the public domain. practical in that the public domain provides an unencumbered test bed of creative content for contributors to work with (no copyright hassles). political in that we wish to draw attention to the threat posed to the public domain by commercially driven digitization projects ( i.e. the recent spate of deals between Google and libraries, the National Archives' deal with Footnote.com and Amazon, the Smithsonian with Showtime etc.). emphasizing the public domain could also exert pressure on the media industries, who to date have been more concerned with preserving old architectures of revenue than with adapting creatively to the digital age. making the public domain more attractive, more dynamic and more *usable* than the private domain could serve as a wake-up call to the big media incumbents, and more importantly, to contemporary artists and scholars whose work is being shackled by overly restrictive formats and antiquated business models. we'd also consider workable areas of the private domain such as the Creative Commons -? works that are progressively licensed so as to allow creative reuse. we're not necessarily wedded to this idea. what do you think?
candida höfer: the library as museum 09.17.2007, 12:53 AM
The large format that characterizes Höfer's photographs of public places, the absence of people, and the angle from which she composes them, invite the viewer "to enter" the rooms and observe. Photography is a silent medium and in Höfer's libraries this is magnified, creating that feeling of "temple of learning" with which libraries have often been identified. On the other hand, the meticulous attention to detail, hand-painted porcelain markers, ornately carved bookcases, murals, stained glass windows, gilt moldings, and precious tomes are an eloquent representation of libraries as palaces of learning for the privileged. In spite of that, and ever since libraries became public spaces, anyone, in theory, has access to books and the concept of gain or monetary value rarely enters the user's mind.
Libraries are a book lover's paradise, a physical compilation of human knowledge in all its labyrinthine intricacy. With digitization, libraries gain storage capacity and readers gain accessibility, but they lose both silence and awe. Even though in the digital context, the basic concept of the library as a place for the preservation of memory remains, for many "enlightened" readers the realization that human memory and knowledge are handled by for-profit enterprises such as Google, produces a feeling of merchants in the temple, a sense that the public interest has fallen, one more time, into private hands.
As we well know, the truly interesting development in the shift from print to digital is the networked environment and its effects on reading and writing. If, as Umberto Eco says, books "are machines that provoke further thoughts" then the born-digital book is a step toward the open text, and the "library" that eventually will hold it, a bird of different feather.