Listing entries tagged with search


google gets mid-evil Post date  01.30.2006, 3:46 PM

At the World Economic Forum in Davos last Friday, Google CEO Eric Schmidt assured a questioner in the audience that his company had in fact thoroughly searched its soul before deciding to roll out a politically sanitized search engine in China:

We concluded that although we weren't wild about the restrictions, it was even worse to not try to serve those users at all... We actually did an evil scale and decided not to serve at all was worse evil.

(via Ditherati)

Posted by ben vershbow at 03:46 PM | Comments (0)
tags: Libraries, Search and the Web , Network_Freedom , censorship , china , evil , free_speech , google , internet , search , web

illusions of a borderless world Post date  01.27.2006, 3:57 PM

china google falun gong.jpg

A number of influential folks around the blogosphere are reluctantly endorsing Google's decision to play by China's censorship rules on its new Google.cn service -- what one local commentator calls a "eunuch version" of Google.com. Here's a sampler of opinions:

Ethan Zuckerman ("Google in China: Cause For Any Hope?"):

It’s a compromise that doesn’t make me happy, that probably doesn’t make most of the people who work for Google very happy, but which has been carefully thought through...

In launching Google.cn, Google made an interesting decision - they did not launch versions of Gmail or Blogger, both services where users create content. This helps Google escape situations like the one Yahoo faced when the Chinese government asked for information on Shi Tao, or when MSN pulled Michael Anti’s blog. This suggests to me that Google’s willing to sacrifice revenue and market share in exchange for minimizing situations where they’re asked to put Chinese users at risk of arrest or detention... This, in turn, gives me some cause for hope.

Rebecca MacKinnon ("Google in China: Degrees of Evil"):

At the end of the day, this compromise puts Google a little lower on the evil scale than many other internet companies in China. But is this compromise something Google should be proud of? No. They have put a foot further into the mud. Now let's see whether they get sucked in deeper or whether they end up holding their ground.

David Weinberger ("Google in China"):

If forced to choose — as Google has been — I'd probably do what Google is doing. It sucks, it stinks, but how would an information embargo help? It wouldn't apply pressure on the Chinese government. Chinese citizens would not be any more likely to rise up against the government because they don't have access to Google. Staying out of China would not lead to a more free China.

Doc Searls ("Doing Less Evil, Possibly"):

I believe constant engagement — conversation, if you will — with the Chinese government, beats picking up one's very large marbles and going home. Which seems to be the alternative.

Much as I hate to say it, this does seem to be the sensible position -- not unlike opposing America's embargo of Cuba. The logic goes that isolating Castro only serves to further isolate the Cuban people, whereas exposure to the rest of the world -- even restricted and filtered -- might, over time, loosen the state's monopoly on civic life. Of course, you might say that trading Castro for globalization is merely an exchange of one tyranny for another. But what is perhaps more interesting to ponder right now, in the wake of Google's decision, is the palpable melancholy felt in the comments above. What does it reveal about what we assume -- or used to assume -- about the internet and its relationship to politics and geography?

A favorite "what if" of recent history is what might have happened in the Soviet Union had it lasted into the internet age. Would the Kremlin have managed to secure its virtual borders? Or censor and filter the net into a state-controlled intranet -- a Union of Soviet Socialist Networks? Or would the decentralized nature of the technology, mixed with the cultural stirrings of glasnost, have toppled the totalitarian state from beneath?

Ten years ago, in the heady early days of the internet, most would probably have placed their bets against the Soviets. The Cold War was over. Some even speculated that history itself had ended, that free-market capitalism and democracy, on the wings of the information revolution, would usher in a long era of prosperity and peace. No borders. No limits.

jingjing_1.jpg chacha.jpg
"Jingjing" and "Chacha." Internet police officers from the city of Shenzhen who float over web pages and monitor the cyber-traffic of local users.

It's interesting now to see how exactly the opposite has occurred. Bubbles burst. Towers fell. History, as we now realize, did not end, it was merely on vacation; while the utopian vision of the internet -- as a placeless place removed from the inequities of the physical world -- has all but evaporated. We realize now that geography matters. Concrete features have begun to crystallize on this massive information plain: ports, gateways and customs houses erected, borders drawn. With each passing year, the internet comes more and more to resemble a map of the world.

Those of us tickled by the "what if" of the Soviet net now have ourselves a plausible answer in China, who, through a stunning feat of pipe control -- a combination of censoring filters, on-the-ground enforcement, and general peering over the shoulders of its citizens -- has managed to create a heavily restricted local net in its own image. Barely a decade after the fall of the Iron Curtain, we have the Great Firewall of China.

And as we've seen this week, and in several highly publicized instances over the past year, the virtual hand of the Chinese government has been substantially strengthened by Western technology companies willing to play by local rules so as not to be shut out of the explosive Chinese market. Tech giants like Google, Yahoo! , and Cisco Systems have proved only too willing to abide by China's censorship policies, blocking certain search returns and politically sensitive terms like "Taiwanese democracy," "multi-party elections" or "Falun Gong". They also specialize in precision bombing, sometimes removing the pages of specific users at the government's bidding. The most recent incident came just after New Year's when Microsoft acquiesced to government requests to shut down the My Space site of popular muckraking blogger Zhao Jing, aka Michael Anti.

MS_and_China.jpg
One of many angry responses that circulated the non-Chinese net in the days that followed.

We tend to forget that the virtual is built of physical stuff: wires, cable, fiber -- the pipes. Whoever controls those pipes, be it governments or telecomms, has the potential to control what passes through them. The result is that the internet comes in many flavors, depending in large part on where you are logging in. As Jack Goldsmith and Timothy Wu explain in an excellent article in Legal Affairs (adapted from their forthcoming book Who Controls the Internet? : Illusions of a Borderless World), China, far from being the boxed-in exception to an otherwise borderless net, is actually just the uglier side of a global reality. The net has been mapped out geographically into "a collection of nation-state networks," each with its own politics, social mores, and consumer appetites. The very same technology that enables Chinese authorities to write the rules of their local net enables companies around the world to target advertising and gear services toward local markets. Goldsmith and Wu:

...information does not want to be free. It wants to be labeled, organized, and filtered so that it can be searched, cross-referenced, and consumed....Geography turns out to be one of the most important ways to organize information on this medium that was supposed to destroy geography.

Who knows? When networked devices truly are ubiquitous and can pinpoint our location wherever we roam, the internet could be censored or tailored right down to the individual level (like the empire in Borges' fable that commissions a one-to-one map of its territory that upon completion perfectly covers every corresponding inch of land like a quilt).

The case of Google, while by no means unique, serves well to illustrate how threadbare the illusion of the borderless world has become. The company's famous credo, "don't be evil," just doesn't hold up in the messy, complicated real world. "Choose the lesser evil" might be more appropriate. Also crumbling upon contact with air is Google's famous mission, "to make the world's information universally accessible and useful," since, as we've learned, Google will actually vary the world's information depending on where in the world it operates.

Google may be behaving responsibly for a corporation, but it's still a corporation, and corporations, in spite of well-intentioned employees, some of whom may go to great lengths to steer their company onto the righteous path, are still ultimately built to do one thing: get ahead. Last week in the States, the get-ahead impulse happened to be consonant with our values. Not wanting to spook American users, Google chose to refuse a Dept. of Justice request for search records to aid its anti-pornography crackdown. But this week, not wanting to ruffle the Chinese government, Google compromised and became an agent of political repression. "Degrees of evil," as Rebecca MacKinnon put it.

The great irony is that technologies we romanticized as inherently anti-tyrannical have turned out to be powerful instruments of control, highly adaptable to local political realities, be they state or market-driven. Not only does the Chinese government use these technologies to suppress democracy, it does so with the help of its former Cold War adversary, America -- or rather, the corporations that in a globalized world are the de facto co-authors of American foreign policy. The internet is coming of age and with that comes the inevitable fall from innocence. Part of us desperately wanted to believe Google's silly slogans because they said something about the utopian promise of the net. But the net is part of the world, and the world is not so simple.

Posted by ben vershbow at 03:57 PM | Comments (3)
tags: ISP , Libraries, Search and the Web , Network_Freedom , broadband , capitalism , china , cyberspace , democracy , evil , falun_gong , free_speech , geography , globalization , glocalization , good , google , human_rights , search , spectrum , technology

the book is reading you Post date  01.19.2006, 1:42 PM

I just noticed that Google Book Search requires users to be logged in on a Google account to view pages of copyrighted works.

google book search account.jpg

They provide the following explanation:

Why do I have to log in to see certain pages?

Because many of the books in Google Book Search are still under copyright, we limit the amount of a book that a user can see. In order to enforce these limits, we make some pages available only after you log in to an existing Google Account (such as a Gmail account) or create a new one. The aim of Google Book Search is to help you discover books, not read them cover to cover, so you may not be able to see every page you're interested in.

So they're tracking how much we've looked at and capping our number of page views. Presumably a bone tossed to publishers, who I'm sure will continue suing Google all the same (more on this here). There's also the possibility that publishers have requested information on who's looking at their books -- geographical breakdowns and stats on click-throughs to retailers and libraries. I doubt, though, that Google would share this sort of user data. Substantial privacy issues aside, that's valuable information they want to keep for themselves.

That's because "the aim of Google Book Search" is also to discover who you are. It's capturing your clickstreams, analyzing what you've searched and the terms you've used to get there. The book is reading you. Substantial privacy issues aside, (it seems more and more that's where we'll be leaving them) Google will use this data to refine Google's search algorithms and, who knows, might even develop some sort of personalized recommendation system similar to Amazon's -- you know, where the computer lists other titles that might interest you based on what you've read, bought or browsed in the past (a system that works only if you are logged in). It's possible Google is thinking of Book Search as the cornerstone of a larger venture that could compete with Amazon.

There are many ways Google could eventually capitalize on its books database -- that is, beyond the contextual advertising that is currently its main source of revenue. It might turn the scanned texts into readable editions, hammer out licensing agreements with publishers, and become the world's biggest ebook store. It could start a print-on-demand service -- a Xerox machine on steroids (and the return of Google Print?). It could work out deals with publishers to sell access to complete online editions -- a searchable text to go along with the physical book -- as Amazon announced it will do with its Upgrade service. Or it could start selling sections of books -- individual pages, chapters etc. -- as Amazon has also planned to do with its Pages program.

Amazon has long served as a valuable research tool for books in print, so much so that some university library systems are now emulating it. Recent additions to the Search Inside the Book program such as concordances, interlinked citations, and statistically improbable phrases (where distinctive terms in the book act as machine-generated tags) are especially fun to play with. Although first and foremost a retailer, Amazon feels more and more like a search system every day (and its A9 engine, though seemingly always on the back burner, is also developing some interesting features). On the flip side Google, though a search system, could start feeling more like a retailer. In either case, you'll have to log in first.

Posted by ben vershbow at 01:42 PM | Comments (5)
tags: Copyright and Copyleft , Libraries, Search and the Web , POD , amazon , books , e-commerce , e-publishing , ebooks , google , google_book_search , google_print , internet , print_on_demand , privacy , publishing , search , web

questions about blog search and time Post date  01.06.2006, 8:17 AM

Does anyone know of a good way to search for old blog entries on the web? I've just been looking at some of the available blog search resources and few of them appear to provide any serious advanced search options. The couple of major ones I've found that do (after an admittedly cursory look) are Google and Ice Rocket. Both, however, appear to be broken, at least when it comes to dates. I've tried them on three different browsers, on Mac and PC, and in each case the date menus seem to be frozen. It's very weird. They give you the option of entering a specific time range but won't accept the actual dates. Maybe I'm just having a bad tech day, but it's as if there's some conceptual glitch across the web vis a vis blogs and time.

Most blog search engines are geared toward searching the current blogosphere, but there should be a way to research older content. My first thought was that blog search engines crawl RSS feeds, most of which do not transmit the entirety of a blog's content, just the more recent. That would pose a problem for archival search.

Does anyone know what would be the best way to go about finding, say, old blog entries containing the keywords "new orleans superdome" from late August to late September 2005? Is it best to just stick with general web search and painstakingly comb through for blogs? If we agree that blogs have become an important kind of cultural document, than surely there should be a way to find them more than a month after they've been written.

Posted by ben vershbow at 08:17 AM | Comments (5)
tags: Blogosphere , Libraries, Search and the Web , archives , blog_search , blogging , blogs , history , research , search

where we've been, where we're going Post date  12.09.2005, 12:54 PM

Roundup-weed5L.gif

This past week at if:book we've been thinking a lot about the relationship between this weblog and the work we do. We decided that while if:book has done a fine job reflecting and provoking the conversations we have at the Institute, we wanted to make sure that it also seems as coherent to our readers as it does to us. With that in mind, we've decided to begin posting a weekly roundup of our blog posts, in which we synthesize (as much a possible) what we've been thinking and talking about from Monday to Friday.

So here goes. This week we spent a lot of time reflecting on simulation and virtuality. In part, this reflection grew out of our collective reading of a Tom Zengotita's book Mediated, which discusses (among other things) the link between alienation from the "real" through digital mediation and increased solipsism. Bob seemed especially interested in the dialectic relationship between, on one hand, the opportunity for access afforded by ever-more sophisticated form of simulation, and, on the other, the sense that something must be lost when as the encounter with the "real" recedes entirely.

This, in turn, led to further conversation about what we might think of as the "loss of the real" in the transition from books on paper to books on a computer screen. On one hand, there seems to be a tremendous amount of anxiety that Google Book Search might somehow make actual books irrelevant and thus destroy reading and writing practices linked to the bound book. On the other hand, one could take the position of Cory Doctorow that books as objects are overrated, and challenge the idea that a book needs to be digitally embodied to be "real."

As the debate over Google Book Search continually reminds us, one of the most challenging things in sifting through discussions of emerging media forms is learning to tell the difference between nostalgia and useful critical insight. Often the two are hopelessly intertwined; in this week's debates about Wikipedia, for example, discussion of how to make the open-source encyclopedia more useful was often tempered by the suggestion that encyclopedias of the past were always be superior to Wikipedia, an assertion easily challenged by a quick browse through some old encyclopedias.

Finally, I want to mention that we finally got around to setting up a del.icio.us account. There will be a formal link on the blog up soon, but you can take a look now. It will expand quickly.

Posted by lisa lynch at 12:54 PM | Comments (0)
tags: Roundup , book , google , search , simulation , wikipedia

google on the air Post date  12.06.2005, 12:34 AM

librarybrazil.jpg

Open Source's hour on the Googlization of libraries was refreshingly light on the copyright issue and heavier on questions about research, reading, the value of libraries, and the public interest. With its book-scanning project, Google is a private company taking on the responsibilities of a public utility, and Siva Vaidhyanathan came down hard on one of the company's chief legal reps for the mystery shrouding their operations (scanning technology, algorithms and ranking system are all kept secret). The rep reasonably replied that Google is not the only digitization project in town and that none of its library partnerships are exclusive. But most of his points were pretty obvious PR boilerplate about Google's altruism and gosh darn love of books. Hearing the counsel's slick defense, your gut tells you it's right to be suspicious of Google and to keep demanding more transparency, clearer privacy standards and so on. If we're going to let this much information come into the hands of one corporation, we need to be very active watchdogs.

Our friend Karen Schneider then joined the fray and as usual brought her sage librarian's perspective. She's thrilled by the possibilities of Google Book Search, seeing as it solves the fundamental problem of library science: that you can only search the metadata, not the texts themselves. But her enthusiasm is tempered by concerns about privatization similar to Siva's and a conviction that a research service like Google can never replace good librarianship and good physical libraries. She also took issue with the fact that Book Search doesn't link to other library-related search services like Open Worldcat. She has her own wrap-up of the show on her blog.

Rounding out the discussion was Matthew G. Kirschenbaum, a cybertext studies blogger and professor of english at the University of Maryland. Kirschenbaum addressed the question of how Google, and the web in general, might be changing, possibly eroding, our reading practices. He nicely put the question in perspective, suggesting that scattershot, inter-textual, "snippety" reading is in fact the older kind of reading, and that the idea of sustained, deeply immersed involvement with a single text is largely a romantic notion tied to the rise of the novel in the 18th century.

A satisfying hour, all in all, of the sort we should be having more often. It was fun brainstorming with Brendan Greeley, the Open Source on "blogger-in-chief," on how to put the show together. Their whole bit about reaching out to the blogosphere for ideas and inspiration isn't just talk. They put their money where their mouth is. I'll link to the podcast when it becomes available.

image: Real Gabinete Português de Literatura, Rio de Janeiro - Claudio Lara via Flickr

Posted by ben vershbow at 12:34 AM | Comments (2)
tags: Libraries, Search and the Web , copyright , digitization , ebook , google , google_book_search , google_print , library , literature , metadata , reading , search

the role of note taking in the information age Post date  12.03.2005, 3:19 PM

An article by Ann Blair in a recent issue of Critical Inquiry (vol 31 no 1) discusses the changing conceptions of the function of note-taking from about the sixth century to the present, and ends with a speculation on the way that textual searches (such as Google Book Search) might change practices of note-taking in the twenty-first century. Blair argues that "one of the most significant shifts in the history of note taking" occured in the beginning of the twentieth century, when the use of notes as memorization aids gave way to the use of notes as a aid to replace the memorization of too-abundant information. With the advent of the net, she notes:

Today we delegate to sources that we consider authoritative the extraction of information on all but a few carefully specialized areas in which we cultivate direct experience and original research. New technologies increasingly enable us to delegate more tasks of remembering to the computer, in that shifting division of labor between human and thing. We have thus mechanized many research tasks. It is possible that further changes would affect even the existence of note taking. At a theoretical extreme, for example, if every text one wanted were constantly available for searching anew, perhaps the note itself, the selection made for later reuse, might play a less prominent role.

The result of this externalization, Blair notes, is that we come to think of long-term memory as something that is stored elsewhere, in "media outside the mind." At the same time, she writes, "notes must be rememorated or absorbed in the short-term memory at least enough to be intelligently integrated into an argument; judgment can only be applied to experiences that are present to the mind."

Blair's article doesn't say that this bifurcation between short-term and long-term memory is a problem: she simply observes it as a phenomenon. But there's a resonance between Blair's article and Naomi Baron's recent Los Angeles Times piece on Google Book Search: both point to the fact that what we commonly have defined as scholarly reflection has increasingly become more and more a process of database management. Baron seems to see reflection and database management as being in tension, though I'm not completely convinced by her argument. Blair, less apocalyptic than Baron, nonetheless gives me something to ponder. What happens to us if (or when) all of our efforts to make the contents of our extrasomatic memory "present to our mind" happen without the mediation of notes? Blair's piece focuses on the epistemology rather than the phenomenology of note taking — still, she leads me to wonder what happens if the mediating function of the note is lost, when the triangular relation between book, scholar and note becomes a relation between database and user.

Posted by lisa lynch at 03:19 PM | Comments (1)
tags: Libraries, Search and the Web , book , google , internet , note_taking , search

google print on deck at radio open source Post date  12.01.2005, 8:07 AM

Open Source, the excellent public radio program (not to be confused with "Open Source Media") that taps into the blogosphere to generate its shows, has been chatting with me about putting together an hour on the Google library project. Open Source is a unique hybrid, drawing on the best qualities of the blogosphere -- community, transparency, collective wisdom -- to produce an otherwise traditional program of smart talk radio. As host Christopher Lydon puts it, the show is "fused at the brain stem with the world wide web." Or better, it "uses the internet to be a show about the world."

The Google show is set to air live this evening at 7pm (ET) (they also podcast). It's been fun working with them behind the scenes, trying to figure out the right guests and questions for the ideal discussion on Google and its bookish ambitions. My exchange has been with Brendan Greeley, the Radio Open Source "blogger-in-chief" (he's kindly linked to us today on their site). We agreed that the show should avoid getting mired in the usual copyright-focused news peg -- publishers vs. Google etc. -- and focus instead on the bigger questions. At my suggestion, they've invited Siva Vaidhyanathan, who wrote the wonderful piece in the Chronicle of Higher Ed. that I talked about yesterday (see bigger questions). I've also recommended our favorite blogger-librarian, Karen Schneider (who has appeared on the show before), science historian George Dyson, who recently wrote a fascinating essay on Google and artificial intelligence, and a bunch of cybertext studies people: Matthew G. Kirschenbaum, N. Katherine Hayles, Jerome McGann and Johanna Drucker. If all goes well, this could end up being a very interesting hour of discussion. Stay tuned.

UPDATE: Open Source just got a hold of Nicholas Kristof to do an hour this evening on Genocide in Sudan, so the Google piece will be pushed to next week.

Posted by ben vershbow at 08:07 AM | Comments (0)
tags: Libraries, Search and the Web , Online , copyright , google , google_book_search , google_print , library , open_source , podcast , publishing , radio , radio_open_source , search , web

google print is no more Post date  11.18.2005, 8:06 AM

Not the program, of course, just the name. From now on it is to be known as Google Book Search. "Print" obviously struck a little too close to home with publishers and authors. On the company blog, they explain the shift in emphasis:

No, we don't think that this new name will change what some folks think about this program. But we do believe it will help a lot of people understand better what we're doing. We want to make all the world's books discoverable and searchable online, and we hope this new name will help keep everyone focused on that important goal.

Posted by ben vershbow at 08:06 AM | Comments (1)
tags: Libraries, Search and the Web , books , copyright , google , google_book_search , google_print , publishing , search

gawker blogs to appear on yahoo Post date  11.16.2005, 7:11 AM

Gawker Media, the Conde Nast of the blogosphere, has just sold distribution rights for five of its blogs to Yahoo. Selected posts from Gawker, Wonkette, Gizmodo, Lifehacker and Defamer will soon appear daily on the Yahoo news portal.

Not so worrisome (or surprising) to see blogs like these going corporate. From the beginning, they've sort of pitched themselves as commodities -- the tabloids and gadget rags of the blogosphere. But when blogging comes fully front and center as the next hip business strategy -- that authentic unfiltered element with which to adorn your comapany's image (hang some humans on the doorpost) -- then we may see a massive rush to rake up the brighter talents with lucrative little hosting deals. I'd hate to see bloggers foresake their independence like this. Then again, it might clear the way for a whole new generation of authentic voices.

Posted by ben vershbow at 07:11 AM | Comments (1)
tags: Blogosphere , blogging , blogs , gawker , media , news , search , syndication , yahoo , yahoo!

all your base are belong to google Post date  11.16.2005, 7:04 AM

Google Base is live and ready for our stuff.

In AP: "New Project Will Expand Google's Reach"

Posted by ben vershbow at 07:04 AM | Comments (0)
tags: Online , advertising , classifieds , craigslist , ebay , etail , google , google_base , search , web

microsoft joins open content alliance Post date  10.26.2005, 9:06 AM

Microsoft's forthcoming "MSN Book Search" is the latest entity to join the Open Content Alliance, the non-controversial rival to Google Print. ZDNet says: "Microsoft has committed to paying for the digitization of 150,000 books in the first year, which will be about $5 million, assuming costs of about 10 cents a page and 300 pages, on average, per book..."

Apparently having learned from Google's mistakes, OCA operates under a strict "opt-in" policy for publishers vis-a-vis copyrighted works (whereas with Google, publishers have until November 1 to opt out). Judging by the growing roster of participants, including Yahoo, the National Archives of Britain, the University of California, Columbia University, and Rice University, not to mention the Internet Archive, it would seem that less hubris equals more results, or at least lower legal fees. Supposedly there is some communication between Google and OCA about potential cooperation.

Also story in NY Times.

Posted by ben vershbow at 09:06 AM | Comments (2)
tags: Libraries, Search and the Web , Microsoft , OCA , books , brewster_kahle , copyright , google , google_print , library , open_content_alliance , search , web , yahoo

to some writers, google print sounds like a sweet deal Post date  10.25.2005, 9:25 AM

Wired has a piece today about authors who are in favor of Google's plans to digitize millions of books and make them searchable online. Most seem to agree that obscurity is a writer's greatest enemy, and that the exposure afforded by Google's program far outweighs any intellectual property concerns. Sometimes to get more you have to give a little.

The article also mentions the institute.

Posted by ben vershbow at 09:25 AM | Comments (0)
tags: Libraries, Search and the Web , Publishing, Broadcast, and the Press , books , copyright , google , google_print , publishing , search , web , writing

google is sued... again Post date  10.20.2005, 8:08 AM

This time by publishers. Penguin Group USA, McGraw-Hill, Pearson Education, Simon & Schuster and John Wiley & Sons. The gripe is the same as with the Authors' Guild, which filed suit last month alleging "massive copyright infringement." Publishers fear a dangerous precedent is set by Google's scanning of books to construct what amounts to a giant card catalogue on the web. Google claims "fair use" (see rationale), again pointing out that for copyrighted works only tiny "snippets" of text are displayed around keywords (though perhaps this is not yet fully in effect - I was searching around in this book and was able to look at quite a lot).

Google calls the publishers' suit "near-sighted." And it probably is. The benefit to readers and researchers will be tremendous, as will (Google is eager to point out) the exposure for authors and publishers. But Google Print is undoubtedly an earth-shaking program. Look at the reaction in Europe, where alarm bells rung by France warned of cultural imperialism, an english-drenched web. Heads of state and culture convened and initial plans for a European digital library have been drawn up.

What the transatlantic flap makes clear is that Google's book scanning touches a deep nerve, and the argument over intellectual property, signficant though it is, distracts from a more profound human anxiety -- an anxiety about the form of culture and the shape of thoughts. If we try to grope back through the millennia, we can find find an analogy in the invention of writing.

The shift from oral to written language froze speech into stable strings that could be transmitted and stored over distance and time. This change not only affected the modes of communication, it dramatically refigured the cognitive makeup of human beings (as McLuhan, Ong and others have described). We are currently going through another such shift. The digital takes the freezing medium of text and throws it back into fluidity. Like the melting of polar ice caps, it unsettles equilibriums, changes weather patterns. It is a lot to adjust to, and we wonder if our great-great-grandchildren will literally think differently from us.

But in spite of this disorienting new fluidity, we still have print, we still have the book. And actually, Google Print in many ways affirms this since its search returns will point to print retailers and brick-and-mortar libraries. Yet the fact remains that the canon is being scanned, with implications we can't fully perceive, and future uses we can't fully predict, and so it is understandable that many are unnerved. The ice is really beginning to melt.

In Phaedrus, Plato expresses a similar anxiety about the invention of writing. He tells the tale of Theuth, an Egyptian deity who goes around spreading the new technology, and one day encounters a skeptic in King Thamus:

...you who are the father of letters, from a paternal love of your own children have been led to attribute to them a power opposite to that which they in fact possess. For this discovery of yours will create forgetfulness in the minds of those who learn to use it; they will not exercise their memories, but, trusting in external, foreign marks, they will not bring things to remembrance from within themselves. You have discovered a remedy not for memory, but for reminding. You offer your students the appearance of wisdom, not true wisdom. They will be hearers of many things and will have learned nothing; they will appear to be omniscient and will generally know nothing; they will be tiresome company, having the show of wisdom without the reality.

As I type, I'm exhibiting wisdom without the reality. I've read Plato, but nowhere near exhaustively. Yet I can slash and weave texts on the web in seconds, throw together a blog entry and send it screeching into the commons. And with Google Print I can get the quote I need and let the rest of the book rot behind the security fence. This fluidity is dangerous because it makes connections so easy. Do we know what we are connecting?

Posted by ben vershbow at 08:08 AM | Comments (5)
tags: Copyright and Copyleft , Libraries, Search and the Web , Transliteracies , copyright , google , literacy , mcluhan , ong , plato , publishing , search , web

google expands book-scanning project to europe Post date  10.18.2005, 8:56 AM

This week Google will be paying a visit to the Frankfurt Book Fair to talk with European publishers and chief librarians (including arch nemesis Jean-Nöel Jeanneney) about eight new local incarnations of Google Print. (more)

Posted by ben vershbow at 08:56 AM | Comments (0)
tags: Libraries, Search and the Web , Online , books , copyright , ebook , europe , frankfurt , google , internet , library , publishing , search , web

news and blogs to live under one roof at yahoo! Post date  10.11.2005, 10:19 AM

Yahoo's revamped news search will present news and blogs side by side on the same page. In addition, the site will feature related images from Flickr, the social photo-sharing site that Yahoo purchased earlier this year, as well as user-contributed links from My Web (a feature that allows you to save and store web pages, and share them with others).

As before, the front news page will promote only stories from mainstream media sources, while the blog-news combo appears on a second-tier page that you arrive at when you conduct a specific search, or click for more details or more stories. No doubt, this was done, at least in part, to mollify angry news outlets who will likely call foul for making hard news share space with blogs. Still, the webscape has changed. All but the most cursory glance at the headlines will yield a richly confusing array of mainstream and grassroots sources.

(story, Yahoo Search Blog)

(thoughtful analysis from Tim Porter)

Posted by ben vershbow at 10:19 AM | Comments (1)
tags: Publishing, Broadcast, and the Press , RSS , aggregation , blog , blogging , blogs , citizen_journalism , journalism , media , msm , news , newspaper , portal , search , syndication , yahoo , yahoo!

google dystopia Post date  10.10.2005, 10:06 AM

Google as big brother -- the paranoia certainly seems to be creeping into the mainstream. "Op-Art" by Randy Siegel from today's NY Times:

google 2084.jpg

Posted by ben vershbow at 10:06 AM | Comments (0)
tags: 1984 , 2084 , Libraries, Search and the Web , NYTimes , Online , algorithm , art , cartoon , dystopia , editorial , google , information , internet , newspaper , orwell , paranoia , privacy , satire , search , technology , web

human versus algorithm Post date  09.29.2005, 3:40 PM

I just came across Common Times, a new community-generated news aggregation page, part of something called the Common Media Network, that takes the social bookmarking concept of del.icio.us and applies it specifically to news gathering. Anyone can add a story from any source to a series of sections (which seem pre-set and non-editable) arranged on a newspaper-style "front page." You add links through a bookmarklet on the links bar on your browser. Whenever you come across an article you'd like to submit, you just click the button and a page comes up where you can enter the metadata like tags and comments. Each user has a "channel" - basically a stripped-down blog - where all their links are displayed chronologically with an RSS feed, giving individuals a venue to show their chops as news curators and annotators. You can set it up so links are posted simultaneously to a del.icio.us account (there's also a Firefox extension that allows you to post stories directly from Bloglines).

commontimes.jpg

Human aggregation is often more interesting than what the Google News algorithm can turn up, but it can easily mould to the biases of the community. Of course, search algorithms are developed by people, and source lists don't just manufacture themselves (Google is notoriously tight-lipped about its list of news sources). In the case of something like Common Times, a slick new web application hyped on Boing Boing and other digital culture sites, the communities can be rather self-selecting. Still, this is a very interesting experiment in multi-player annotation. When I first arrived at the front page, not yet knowing how it all worked, I was impressed by the fairly broad spread of stories. And the tag cloud to the right is an interesting little snapshot of the zeitgeist.

(via Infocult)

Posted by ben vershbow at 03:40 PM | Comments (0)
tags: Publishing, Broadcast, and the Press , aggregator , algorithm , bibliography , blog , blogging , bookmarking , del.icio.us , delicious , folksonomy , google , journalism , media , news , newspaper , search , socialsoftware , tag , tagging , tags

yahoo! hires finance writers Post date  09.27.2005, 11:45 AM

Following Kevin Sites in the Hot Zone, Yahoo! takes another step in its transformation into original content provider (see Wall Street Journal - free). Though they say they have no intention of becoming a full-fledged news service.

Yahoo's move suggests increased specialization and atomization of news media on the web, as full-fledged news services find it increasingly hard to stay afloat (as the recent wave of staff cuts at major papers suggests). As newspapers agonize over how to make more money from their websites (e.g. Times Select), companies with diverse revenue bases (like the big search portals) will find it a lot easier to deliver the news. But it will be a stripped down service, heavy on features. Can the news media as public trust survive this process of atomization? Or was the idea of a public trust always a fairy tale?

Posted by ben vershbow at 11:45 AM | Comments (0)
tags: Publishing, Broadcast, and the Press , finance , financial , journalism , media , news , newspaper , portal , search , syndication , trust , writing , yahoo , yahoo!

the database of intentions Post date  09.16.2005, 11:16 AM

Interesting edition of Open Source last week on "Google Sociology" with David Weinberger and John Battelle, author of the just-published "The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture". Listen here.

Weinberger has some interesting things to say about Google (and the other search engines) as "publishers." I have some thoughts on that too. More to come later.

Battelle has done a great deal of thinking on search from a variety of angles: the technology of search, the economics of search, and the more esoteric dimensions of a "search" culture. He touches briefly on this last point, laying out a construct that is probably treated more extensively in his book: the "database of intentions." By this he means the archive, or "artifact," of the world's search queries. A picture of the collective consciousness formed by the questions everyone is asking. Even now, when logged in to Google, a history of all your search query strings is kept - your own database of intentions. The potential value of this database is still being determined, but obvious uses are targeted advertising, and more relevant search results based on analysis of search histories.

As regards the collective database of intentions, Battelle speculates that future advances in artificial intelligence will likely draw on this enormous crop of information about how humans think and seek.

Posted by ben vershbow at 11:16 AM | Comments (0)
tags: Libraries, Search and the Web , Online , algorithm , audio , battelle , database , google , internet , listen , opensource , podcast , radio , radioopensource , search , searchengine , web , weinberger

google blog search - still a long way to go Post date  09.14.2005, 5:01 PM

Google's new blog search engine reminds me of how far we still have to go with blog search. The engine works much the same way as Google's general web search - with keywords and page ranking - only here it's searching RSS feeds. Recent posts with keyword matches fill the column, and a few links to related blogs come up at the top. But there's the rub. These so-called "related" blogs are only related by direct keyword matches in their title tagline. I just searched "poetry" and came up with only three related blogs. C'mon. A search for "gossip" turns up only one related blog - "Starbucks Gossip". There has to be some kind of promotion going on here, though their "about" page mentions nothing of the kind.

A good engine would be capable of searching blogs by their subject, their preoccupation, their obsession. Many blogs could be considered "general," but just as many have a special focus, and readers are often searching with a particular theme in mind. They don't just want a list of transient posts, but whole sites that might potentially become regular destinations. Many blogs are valuable publications that prove themselves day after day. But blog search hasn't yet grown beyond the trendy "what's the latest chatter on the blogosphere" mode.

I do have to give credit to Technorati. Glitchy as it is, they're trying to think of creative ways - tagging, author-determined keywords - to help readers find interesting blogs and authors their audience. Then again, my greatest finds have usually been from other blogs. Humans will always be the smartest aggregators.

People out there, what do you use?

Posted by ben vershbow at 05:01 PM | Comments (2)
tags: RSS , blog , blogger , blogging , blogs , blogsearch , feeds , feedster , googlblogsearch , google , pubsub , search , technorati , xml

yahoo! experiments with multimedia journalism Post date  09.12.2005, 10:36 AM

Yahoo! has enlisted tele-journalist and blogger Kevin Sites to produce a one-year web program chronicling the world's conflict zones in multimedia format.

hotzone.jpg

Sites has become known for his jaunts as a "solo journalist," trundling from hot spot to hot spot with a backpack full of gadgetry, beaming reports from his one-man broadcast station. It's a formula that is tailor-made for the web. Clearly, Yahoo! was paying attention. The NY Times reports on "Kevin Sites In the Hot Zone":

As he travels to these places, Mr. Sites will write a 600- to 800-word dispatch each day and produce a slide show of 5 to 10 digital photographs. He will also narrate audio travelogues. There will be several forms of video - relatively unedited footage posted several times a week, and once a week, a more traditional video report, edited in the style of a network news broadcast.

Mr. Sites will also be the host of regular online chats with Yahoo users who will be able to post comments on message boards. And he will post quick text messages on the site updating his activities throughout the day.

Counting on war and carnage as a surefire crowd draw, Yahoo! makes a rather tawdry entrance into independent journalism. But this is a very significant move nonetheless, evidence that Yahoo! is evolving into a full-fledged media company, and suggesting that the one-man-band approach to journalism and webcast might become a regular thing. If the Sites show finds an audience, they should try out serious investigative reporting or medium-length documentary.

Posted by ben vershbow at 10:36 AM | Comments (0)
tags: Online , Publishing, Broadcast, and the Press , blogger , blogging , broadcast , conflict , hotzone , internet , journalism , kevinsites , media , news , reporter , search , sites , war , web , yahoo , yahoo!

fingerprinting text in the age of cut-and-paste Post date  09.06.2005, 8:05 AM

Lexis Nexis has installed new software for detecting plagiarism. As described on their site:

LexisNexis CopyGuard uses pattern-matching technology to identify suspect passages in submitted documents. An easy-to-read report underlines and color codes questionable sentences, with links to the original sources.

This could be an important tool for assuring integrity not only in professional journalism, but also in the emerging class of amateur reporters. But apply it to blogs and CopyGuard might overload and shut down. Bloggers are constantly recycling text, often without clear attribution, or obvious demarcation between quote and original commentary. The bounds of plagiarism seem a bit less clear when you consider that cutting and pasting is one of the main ways we converse online.

(NY Times has story)

Posted by ben vershbow at 08:05 AM | Comments (1)
tags: DRM , archive , copyleft , copyright , journalism , lexisnexis , nexis , plagiarism , plagiarize , search

tower of babel or trivial pursuit? Post date  12.20.2004, 3:59 PM

Read New York Times Article
In an article in yesterday’s NY Times, Alberto Manguel compares the Genesis story of Babel and the library at Alexandria with their alleged modern-day counterpart—Google’s commitment to digitize all human knowledge. Are we constructing a modern-day tower of Babel? A monument to the hubris of what might be possible if we could just get a little smarter. Will Google help us find answers to the big questions: where did we come from, and what’s the meaning of it all? I went online to find out. I Googled the question “What is the meaning of it all?” and got the following:

In an article in yesterday’s NY Times, Alberto Manguel compares the Genesis story of Babel and the ambitions of the library at Alexandria with their alleged modern-day counterpart—Google’s commitment to digitize all human knowledge. Are we constructing a modern-day tower of Babel—a monument to the hubris of what might be possible if we could just get a little smarter? Will Google help us find answers to perennial puzzlers like: where did we come from? Is anyone or anything in charge? And, what’s the meaning of it all? I went online to find out. I Googled the question “What is the meaning of it all?” and got the following:

The Meaning of Emmanuel
... "What is the meaning of it all?" "What is its purpose?" The human tendency always is to forget origins. And now that Christmas has grown to be such a ...

The Kubrick Site: John Morgan on 2001 vs. 2010
... What is the meaning of it all? Is there a God? What is the purpose of Art? Is there a merging of Art and Science?' Where Clarke in comparison only asks ...

The meaning of life, the universe and everything
... What is the meaning of it all? 'Antennae' colliding galaxies. When we contemplatethe unimaginable vastness of the universe, the incredible diversity ...

London theater musical on stage in London's West End Shaftesbury ...
... But what is the meaning of it all? Well, mainly that the dreamy idealist, Boney, had all he needed in Anastasia Barzee’s sweetly trilling Jo and never ...

'Rings' actor: 'It'll be the biggest film of all time'
... What is the meaning of it all? In some ways, that sort of inquiry is completely unfashionable. "I often think one of the reasons people are dismissive ...

Becoming a Wise Elder
... Questions such as "What is the meaning of it all?" and "Does my life make any kind of difference to anyone?" were very unlikely to arise. ...

Psychology Today: Still news
... PT: What is the meaning of it all now? BB: There was a recklessness in Kennedy's life that I didn't see, a sexual recklessness I don't understand. ...

None of these offerings brought me closer to a substantive answer. Demoralized by the thought of having to go through the other 517 possibilities. I decided to respond to the suggestion at the top of my page:
Tip: Have a question? Ask the researchers at Google Answers.

I clicked "Google Answers" and entered my question: What is the meaning of it all?

Then I had to set a price for my question between $2 and $200. I clicked on “How do I price my question?” And found the following guidelines:
*The more you pay, the more time and effort a Researcher will likely spend on your answer. However, this depends somewhat on the nature of your question.
*Above all - try to pay what the information is worth to you, not what you think you can get it for - that is the best way to get a good answer - but only you can know the value of the information you seek.

Hmm, what is the information worth to me?

I took a look at Google’s examples to get an idea of where my question might fit on the pay scale. Fifty dollars is the “minimum price appropriate for complex, multi-part questions. Researchers will typically spend at least one hour on $50 questions and be very responsive to follow-up questions.” One hundred dollar questions merit two to four hours of “highly thorough research.” Examples of hundred dollar questions included “Parking in New York City, and How does infant-family bonding develop?” The two hundred dollar question required researchers to “spend extensive amounts of time (4 hours plus).” Examples of $200 questions included: Searching for Barrett's Ginger Beer, Applications using databases, What is the impact of a baby with Down's Syndrome on its family?

None of those examples seemed to be in the same league with “what’s the meaning of it all?” Can a Google researcher find the answer in 4 hours? probably not, although I do wonder what they would come up with. Anyway, the point of all this is that Google is set up to search out trivial, quotidian sorts of things and it will be interesting to see how/if they can make the transition from those who can tell you how to “search for Barrett’s Ginger Beer,” to gatekeepers of all human knowledge.

Posted by Kim White at 03:59 PM | Comments (0)
tags: Libraries, Search and the Web , babel , google , internet , meaning , search , semantic_web , web

enter the cybrarian Post date  12.18.2004, 3:02 PM

inside1-googling-libraries.jpg The recent buzz surrounding Google's library intitiative has everyone talking about the future of research, which inevitably raises the question: how will the digitization of library collections change the role of the librarian? I would guess that, far from becoming obsolete, their role will in fact be elevated in importance, if not necessarily in status. They could very well come to be our indispensible guides through the labyrinth - if perhaps invisible, engineering behind the digital walls.

It's also important to consider the question of visualization. When you run a search on Google you are given an enormous list. This is already deeply ingrained in the day-to-day business of finding information. But these lists are basically the electronic equivelant of scrolls, with the items algorithmically determined to be most relevant placed at the top. But sooner or later we have to admit that using scrolls for this kind of business is ludicrous. There has to be a better way of arraying these vast harvests of information in a way that allows the researcher to zoom across degrees of specificity and through associative chains of context and meaning. I see no reason why a search shouldn't take place in some kind of virtual library, emulating the physical architecture of research settings, and allowing for some of the associative or accidental echoes that so often enrich a paper trail blazed through a brick-and-mortar library. Or cannot knowledge resemble a tree, or an arterial matrix? Must we be bound to the scroll?

Returning to the question of the librarian's role, I recalled this passage from James J O'Donnell's 1996 paper The Pragmatics of the New: Trithemius, McLuhan, Cassiodorus:

"The librarians of the world have, moreover, already led the way, for academics at least, into the new information environment, not least because they are caught between rising demand from their customers (faculty and students) and rising supply and prices from their suppliers, and so have already been making reality-based decisions about ownership versus access, print versus electronics, and so on. In short, they are just now our leading pragmatists. Can we imagine a time in our universities when the librarians are the well-paid principals and the teachers their mere acolytes in a distribution chain? I do not think we can or should rule out that possibility for a moment"

oldgoogle.jpg

Related articles:

"Questions and Praise for Google Web Library" - NY Times
"Google's library plan 'a huge help'" - USA Today
"Making books readable on computer proves trying task" - USA Today

Also, I found this on Searchblog. For a trip down memory lane, check out the original Google in the Stanford archives (click on picture to right). Unfortunately, although it seems interactive, a search just brings up a bunch of stylesheets.

Posted by ben vershbow at 03:02 PM | Comments (0)
tags: Libraries, Search and the Web , google , google_print , information_architecture , infoviz , librarian , library , michigan , search

google and big brother Post date  12.15.2004, 7:35 PM

Can Google remain true to its promise to "do no evil," now that it has shareholders to worry about, advertisers to please, and an ever-increasing reach into the repositories of human knowledge? Google still gives you that warm and fuzzy feeling. It's got the goofy name, those cute seasonal tailorings of its masthead, the lava lamps. And this is not to mention the various amusing pastimes - the "Google Whack" game in which you try to find two words that cohabit only one of the search engine's eight billion web pages; or every writer's guilty pleasure, the Googling of the self, the "auto-Google," that delicious act of cyber-onanism.

But where might it lead? One day, when I open my fridge, might a sensor not read my searching eye and know that I am looking for milk? And knowing that I have run out, suggest an array of retailers who might be able to replenish my supply? Could Google come to mediate every exchange of information, no matter how inane, or how carnal?

Or could it come to resemble something like the Central Intelligence Corporation in Neal Stephenson's Snow Crash - a cross between the CIA, the Library of Congress, and DARPA's "Total Information Awareness" program?

MercuryNews.com | 12/14/2004 | Does Google move augur commercialization of libraries?

Posted by ben vershbow at 07:35 PM | Comments (0)
tags: Libraries, Search and the Web , evil , google , internet , library , library_of_congress , neal_stephenson , privacy , search , surveillance , web