Listing entries tagged with Libraries, Search and the Web


1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

sober thoughts on google: privatization and privacy Post date  11.30.2005, 8:18 AM

nypl reading room.jpg

Siva Vaidhyanathan has written an excellent essay for the Chronicle of Higher Education on the "risky gamble" of Google's book-scanning project -- some of the most measured, carefully considered comments I've yet seen on the issue. His concerns are not so much for the authors and publishers that have filed suit (on the contrary, he believes they are likely to benefit from Google's service), but for the general public and the future of libraries. Outsourcing to a private company the vital task of digitizing collections may prove to have been a grave mistake on the part of Google's partner libraries. Siva:

The long-term risk of privatization is simple: Companies change and fail. Libraries and universities last.....Libraries should not be relinquishing their core duties to private corporations for the sake of expediency. Whichever side wins in court, we as a culture have lost sight of the ways that human beings, archives, indexes, and institutions interact to generate, preserve, revise, and distribute knowledge. We have become obsessed with seeing everything in the universe as "information" to be linked and ranked. We have focused on quantity and convenience at the expense of the richness and serendipity of the full library experience. We are making a tremendous mistake.

This essay contains in abundance what has largely been missing from the Google books debate: intellectual courage. Vaidhyanathan, an intellectual property scholar and "avowed open-source, open-access advocate," easily could have gone the predictable route of scolding the copyright conservatives and spreading the Google gospel. But he manages to see the big picture beyond the intellectual property concerns. This is not just about economics, it's about knowledge and the public interest.

What irks me about the usual debate is that it forces you into a position of either resisting Google or being its apologist. But this fails to get at the real bind we all are in: the fact that Google provides invaluable services and yet is amassing too much power; that a private company is creating a monopoly on public information services. Sooner or later, there is bound to be a conflict of interest. That is where we, the Google-addicted public, are caught. It's more complicated than hip versus square, or good versus evil.

Here's another good piece on Google. On Monday, The New York Times ran an editorial by Adam Cohen that nicely lays out the privacy concerns:

Google says it needs the data it keeps to improve its technology, but it is doubtful it needs so much personally identifiable information. Of course, this sort of data is enormously valuable for marketing. The whole idea of "Don't be evil," though, is resisting lucrative business opportunities when they are wrong. Google should develop an overarching privacy theory that is as bold as its mission to make the world's information accessible - one that can become a model for the online world. Google is not necessarily worse than other Internet companies when it comes to privacy. But it should be doing better.

original google.jpg Two graduate students in Stanford in the mid-90s recognized that search engines would the most important tools for dealing with the incredible flood of information that was then beginning to swell, so they started indexing web pages and working on algorithms. But as the company has grown, Google's admirable-sounding mission statement -- "to organize the world's information and make it universally accessible and useful" -- has become its manifest destiny, and "information" can now encompass the most private of territories.

At one point it simply meant search results -- the answers to our questions. But now it's the questions as well. Google is keeping a meticulous record of our clickstreams, piecing together an enormous database of queries, refining its search algorithms and, some say, even building a massive artificial brain (more on that later). What else might they do with all this personal information? To date, all of Google's services are free, but there may be a hidden cost.

"Don't be evil" may be the company motto, but with its IPO earlier this year, Google adopted a new ideology: they are now a public corporation. If web advertising (their sole source of revenue) levels off, then investors currently high on $400+ shares will start clamoring for Google to maintain profits. "Don't be evil to us!" they will cry. And what will Google do then?

images: New York Public Library reading room by Kalloosh via Flickr; archive of the original Google page

Posted by ben vershbow at 8:18 AM | Comments (7)
tags: Copyright and Copyleft , Libraries, Search and the Web , books , copyright , ethics , google , google_book_search , google_print , intellectual_property , libraries , library , literature , privacy , publishing , university

virtual libraries, real ones, empires Post date  11.28.2005, 12:36 PM

Handsworth readers.jpg Last Tuesday, a Washington Post editorial written by Library of Congress librarian James Billington outlined the possible benefits of a World Digital Library, a proposed LOC endeavor discussed last week in a post by Ben Vershbow. Billington seemed to imagine the library as sort of a United Nations of information: claiming that "deep conflict between cultures is fired up rather than cooled down by this revolution in communications," he argued that a US-sponsored, globally inclusive digital library could serve to promote harmony over conflict:

Libraries are inherently islands of freedom and antidotes to fanaticism. They are temples of pluralism where books that contradict one another stand peacefully side by side just as intellectual antagonists work peacefully next to each other in reading rooms. It is legitimate and in our nation's interest that the new technology be used internationally, both by the private sector to promote economic enterprise and by the public sector to promote democratic institutions. But it is also necessary that America have a more inclusive foreign cultural policy -- and not just to blunt charges that we are insensitive cultural imperialists. We have an opportunity and an obligation to form a private-public partnership to use this new technology to celebrate the cultural variety of the world.

What's interesting about this quote (among other things) is that Billington seems to be suggesting that a World Digital Library would function in much the same manner as a real-world library, and yet he's also arguing for the importance of actual physical proximity. He writes, after all, about books literally, not virtually, touching each other, and about researchers meeting up in a shared reading room. There seems to be a tension here, in other words, between Billington's embrace of the idea of a world digital library, and a real anxiety about what a "library" becomes when it goes online.

I also feel like there's some tension here -- in Billington's editorial and in the whole World Digital Library project -- between "inclusiveness" and "imperialism." Granted, if the United States provides Brazilians access to their own national literature online, this might be used by some as an argument against the idea that we are "insensitive cultural imperialists." But there are many varieties of empire: indeed, as many have noted, the sun stopped setting on Google's empire a while ago.

To be clear, I'm not attacking the idea of the World Digital Library. Having watch the Smithsonian invest in, and waffle on, some of their digital projects, I'm all for a sustained commitment to putting more material online. But there needs to be some careful consideration of the differences between online libraries and virtual ones -- as well as a bit more discussion of just what a privately-funded digital library might eventually morph into.

Posted by lisa lynch at 12:36 PM | Comments (0)
tags: Libraries, Search and the Web , cultural , digital , google , imperialism , internet , libraries

explosion Post date  11.22.2005, 2:10 PM

250px-Nuclear_fireball.jpg A Nov. 18 post on Adam Green's Darwinian Web makes the claim that the web will "explode" (does he mean implode?) over the next year. According to Green, RSS feeds will render many websites obsolete:

The explosion I am talking about is the shifting of a website's content from internal to external. Instead of a website being a "place" where data "is" and other sites "point" to, a website will be a source of data that is in many external databases, including Google. Why "go" to a website when all of its content has already been absorbed and remixed into the collective datastream.

Does anyone agree with Green? Will feeds bring about the restructuring of "the way content is distributed, valued and consumed?" More on this here.

Posted by lisa lynch at 2:10 PM | Comments (5)
tags: Libraries, Search and the Web , Online , Publishing, Broadcast, and the Press , RSS , blogging , blogs , darwin , darwinism , google , internet , singularity , syndication , web , xml

world digital library Post date  11.22.2005, 7:41 AM

library of congress.jpg The Library of Congress has announced plans for the creation of a World Digital Library, "a shared global undertaking" that will make a major chunk of its collection freely available online, along with contributions from other national libraries around the world. From The Washington Post:

...[the] goal is to bring together materials from the United States and Europe with precious items from Islamic nations stretching from Indonesia through Central and West Africa, as well as important materials from collections in East and South Asia.

Google has stepped forward as the first corporate donor, pledging $3 million to help get operations underway. At this point, there doesn't appear to be any direct connection to Google's Book Search program, though Google has been working with LOC to test and refine its book-scanning technology.

Posted by ben vershbow at 7:41 AM | Comments (0)
tags: Libraries, Search and the Web , books , digital , google , library , library_of_congress , literature , preservation , scanning

online retail influencing libraries Post date  11.21.2005, 12:07 PM

The NY Times reports on new web-based services at university libraries that are incorporating features such as personalized recommendations, browsing histories, and email alerts, the sort of thing developed by online retailers like Amazon and Netflix to recreate some of the experience of browsing a physical store. Remember Ranganathan's fourth law of library science: "save the time of the reader." The reader and the customer are perhaps becoming one in the same.

It would be interesting if a social software system were emerging for libraries that allowed students and researchers to work alongside librarians in organizing the stacks. Automated recommendations are just the beginning. I'm talking more about value added by the readers themselves (Amazon has does this with reader reviews, Listmania, and So You'd Like To...). A social card catalogue with a tagging system and other reader-supplied metadata where readers could leave comments and bread crumb trails between books. Each card catalogue entry with its own blog and wiki to create a context for the book. Books are not just surrounded by other volumes on the shelves, they are surrounded by people, other points of view, affinities -- the kinds of thing that up to this point were too vaporous to collect. This goes back to David Weinberger's comment on metadata and Google Book Search.

Posted by ben vershbow at 12:07 PM | Comments (3)
tags: Libraries, Search and the Web , Social Software , books , folksonomy , librarian , library , metadata , reading , social_software , tagging , taxonomy

google print is no more Post date  11.18.2005, 8:06 AM

Not the program, of course, just the name. From now on it is to be known as Google Book Search. "Print" obviously struck a little too close to home with publishers and authors. On the company blog, they explain the shift in emphasis:

No, we don't think that this new name will change what some folks think about this program. But we do believe it will help a lot of people understand better what we're doing. We want to make all the world's books discoverable and searchable online, and we hope this new name will help keep everyone focused on that important goal.

Posted by ben vershbow at 8:06 AM | Comments (1)
tags: Libraries, Search and the Web , books , copyright , google , google_book_search , google_print , publishing , search

the book in the network - masses of metadata Post date  11.15.2005, 6:42 PM

In this weekend's Boston Globe, David Weinberger delivers the metadata angle on Google Print:

...despite the present focus on who owns the digitized content of books, the more critical battle for readers will be over how we manage the information about that content-information that's known technically as metadata.

...we're going to need massive collections of metadata about each book. Some of this metadata will come from the publishers. But much of it will come from users who write reviews, add comments and annotations to the digital text, and draw connections between, for example, chapters in two different books.

As the digital revolution continues, and as we generate more and more ways of organizing and linking books-integrating information from publishers, libraries and, most radically, other readers-all this metadata will not only let us find books, it will provide the context within which we read them.

The book in the network is a barnacled spirit, carrying with it the sum of its various accretions. Each book is also its own library by virtue not only of what it links to itself, but of what its readers are linking to, of what its readers are reading. Each book is also a milk crate of earlier drafts. It carries its versions with it. A lot of weight for something physically weightless.

Posted by ben vershbow at 6:42 PM | Comments (0)
tags: ISBN , Libraries, Search and the Web , books , ebook , electronic_literature , folksonomy , google , google_print , hypertext , library , literature , marginalia , metadata , social_software , tagging , weinberger

having browsed google print a bit more... Post date  11.14.2005, 4:53 AM

...I realize I was over-hasty in dismissing the recent additions made since book scanning resumed earlier this month. True, many of the fine wines in the cellar are there only for the tasting, but the vintage stuff can be drunk freely, and there are already some wonderful 19th century titles, at this point mostly from Harvard. The surest way to find them is to search by date, or by title and date. Specify a date range in advanced search or simply enter, for example, "date: 1890" and a wealth of fully accessible texts comes up, any of which can be linked to from a syllabus. An astonishing resource for teachers and students.

The conclusion: Google Print really is shaping up to be a library, that is, of the world pre-1923 -- the current line of demarcation between copyright and the public domain. It's a stark reminder of how over-extended copyright is. Here's an 1899 english printing of The Mahabharata:

mahabharata.jpg

A charming detail found on the following page is this old Harvard library stamp that got scanned along with the rest:

mahabharata harvard stamp.jpg

Posted by ben vershbow at 4:53 AM | Comments (0)
tags: Copyright and Copyleft , Libraries, Search and the Web , OCR , copyright , ebook , fair_use , google , google_print , library , mahabharata , scan

google print's not-so-public domain Post date  11.03.2005, 4:16 PM

wealthy new york google.jpg Google's first batch of public domain book scans is now online, representing a smattering of classics and curiosities from the collections of libraries participating in Google Print. Essentially snapshots of books, they're not particularly comfortable to read, but they are keyword-searchable and, since no copyright applies, fully accessible.

The problem is, there really isn't all that much there. Google's gotten a lot of bad press for its supposedly cavalier attitude toward copyright, but spend a few minutes browsing Google Print and you'll see just how publisher-centric the whole affair is. The idea of a text being in the public domain really doesn't amount to much if you're only talking about antique manuscripts, and these are the only books that they've made fully accessible. Daisy Miller's copyright expired long ago but, with the exception of Harvard's illustrated 1892 copy, all the available scanned editions are owned by modern publishers and are therefore only snippeted. This is not an online library, it's a marketing program. Google Print will undeniably have its uses, but we shouldn't confuse it with a library.

(An interesting offering from the stacks of the New York Public Library is this mid-19th century biographic registry of the wealthy burghers of New York: "Capitalists whose wealth is estimated at one hundred thousand dollars and upwards...")

Posted by ben vershbow at 4:16 PM | Comments (0)
tags: Copyright and Copyleft , Libraries, Search and the Web , OCR , books , copyright , ebook , google , google_print , library , literature , public_domain , scan

a better wikipedia will require a better conversation Post date  10.28.2005, 1:04 PM

There's an interesting discussion going on right now under Kim's Wikibooks post about how an open source model might be made to work for the creation of authoritative knowledge -- textbooks, encyclopedias etc. A couple of weeks ago there was some dicussion here about an article that, among other things, took some rather cheap shots at Wikipedia, quoting (very selectively) a couple of shoddy passages. Clearly, the wide-open model of Wikipedia presents some problems, but considering the advantages it presents (at least in potential) -- never out of date, interconnected, universally accessible, bringing in voices from the margins -- critics are wrong to dismiss it out of hand. Holding up specific passages for critique is like shooting fish in a barrel. Even Wikipedia's directors admit that most of the content right now is of middling quality, some of it downright awful. It doesn't then follow to say that the whole project is bunk. That's a bit like expelling an entire kindergarten for poor spelling. Wikipedia is at an early stage of development. Things take time.

Instead we should be talking about possible directions in which it might go, and how it might be improved. Dan for one, is concerned about the market (excerpted from comments):

What I worry about...is that we're tearing down the old hierarchies and leaving a vacuum in their wake.... The problem with this sort of vacuum, I think, is that capitalism tends to swoop in, simply because there are more resources on that side....

...I'm not entirely sure if the world of knowledge functions analogously, but Wikipedia does presume the same sort of tabula rasa. The world's not flat: it tilts precariously if you've got the cash. There's something in the back of my mind that suspects that Wikipedia's not protected against this - it's kind of in the state right now that the Web as a whole was in 1995 before the corporate world had discovered it. If Wikipedia follows the model of the web, capitalism will be sweeping in shortly.

Unless... the experts swoop in first. Wikipedia is part of a foundation, so it's not exactly just bobbing in the open seas waiting to be swept away. If enough academics and librarians started knocking on the door saying, hey, we'd like to participate, then perhaps Wikipedia (and Wikibooks) would kick up to the next level. Inevitably, these newcomers would insist on setting up some new vetting mechanisms and a few useful hierarchies that would help ensure quality. What would these be? That's exactly the kind of thing we should be discussing.

The Guardian ran a nice piece earlier this week in which they asked several "experts" to evaluate a Wikipedia article on their particular subject. They all more or less agreed that, while what's up there is not insubstantial, there's still a long way to go. The biggest challenge then, it seems to me, is to get these sorts of folks to give Wikipedia more than just a passing glance. To actually get them involved.

For this to really work, however, another group needs to get involved: the users. That might sound strange, since millions of people write, edit and use Wikipedia, but I would venture that most are not willing to rely on it as a bedrock source. No doubt, it's incredibly useful to get a basic sense of a subject. Bloggers (including this one) link to it all the time -- it's like the conversational equivalent of a reference work. And for certain subjects, like computer technology and pop culture, it's actually pretty solid. But that hits on the problem right there. Wikipedia, even at its best, has not gained the confidence of the general reader. And though the Wikimaniacs would be loathe to admit it, this probably has something to do with its core philosophy.

Karen G. Schneider, a librarian who has done a lot of thinking about these questions, puts it nicely:

Wikipedia has a tagline on its main page: "the free-content encyclopedia that anyone can edit." That's an intriguing revelation. What are the selling points of Wikipedia? It's free (free is good, whether you mean no-cost or freely-accessible). That's an idea librarians can connect with; in this country alone we've spent over a century connecting people with ideas.

However, the rest of the tagline demonstrates a problem with Wikipedia. Marketing this tool as a resource "anyone can edit" is a pitch oriented at its creators and maintainers, not the broader world of users. It's the opposite of Ranganathan's First Law, "books are for use." Ranganathan wasn't writing in the abstract; he was referring to a tendency in some people to fetishize the information source itself and lose sight that ultimately, information does not exist to please and amuse its creators or curators; as a common good, information can only be assessed in context of the needs of its users.

I think we are all in need of a good Wikipedia, since in the long run it might be all we've got. And I'm in now way opposed to its spirit of openness and transparency (I think the preservation of version histories is a fascinating element and one which should be explored further -- perhaps the encyclopedia of the future can encompass multiple versions of the "the truth"). But that exhilarating throwing open of the doors should be tempered with caution and with an embrace of the parts of the old system that work. Not everything need be thrown away in our rush to explore the new. Some people know more than other people. Some editors have better judgement than others. There is such a thing as a good kind of gatekeeping.

If these two impulses could be brought into constructive dialogue then we might get somewhere. This is exactly the kind of conversation the Wikimedia Foundation should be trying to foster.

Posted by ben vershbow at 1:04 PM | Comments (9)
tags: Education , Libraries, Search and the Web , Online , authority , encyclopedia , library , open_source , web , wiki , wikibooks , wikimedia , wikipedia

microsoft joins open content alliance Post date  10.26.2005, 9:06 AM

Microsoft's forthcoming "MSN Book Search" is the latest entity to join the Open Content Alliance, the non-controversial rival to Google Print. ZDNet says: "Microsoft has committed to paying for the digitization of 150,000 books in the first year, which will be about $5 million, assuming costs of about 10 cents a page and 300 pages, on average, per book..."

Apparently having learned from Google's mistakes, OCA operates under a strict "opt-in" policy for publishers vis-a-vis copyrighted works (whereas with Google, publishers have until November 1 to opt out). Judging by the growing roster of participants, including Yahoo, the National Archives of Britain, the University of California, Columbia University, and Rice University, not to mention the Internet Archive, it would seem that less hubris equals more results, or at least lower legal fees. Supposedly there is some communication between Google and OCA about potential cooperation.

Also story in NY Times.

Posted by ben vershbow at 9:06 AM | Comments (2)
tags: Libraries, Search and the Web , Microsoft , OCA , books , brewster_kahle , copyright , google , google_print , library , open_content_alliance , search , web , yahoo

to some writers, google print sounds like a sweet deal Post date  10.25.2005, 9:25 AM

Wired has a piece today about authors who are in favor of Google's plans to digitize millions of books and make them searchable online. Most seem to agree that obscurity is a writer's greatest enemy, and that the exposure afforded by Google's program far outweighs any intellectual property concerns. Sometimes to get more you have to give a little.

The article also mentions the institute.

Posted by ben vershbow at 9:25 AM | Comments (0)
tags: Libraries, Search and the Web , Publishing, Broadcast, and the Press , books , copyright , google , google_print , publishing , search , web , writing