Listing entries tagged with search
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
a few rough notes on knols
12.17.2007, 5:06 PM
Think you've got an authoritative take on a subject? Write up an article, or "knol," and see how the Web judgeth. If it's any good, you might even make a buck.
Google's new encyclopedia will go head to head with Wikipedia in the search rankings, though in format it more resembles other ad-supported, single-author info sources like the About.com or Squidoo. The knol-verse (how the hell do we speak of these things as a whole?) will be a Darwinian writers' market where the fittest knols rise to the top. Anyone can write one. Google will host it for free. Multiple knols can compete on a single topic. Readers can respond to and evaluate knols through simple community rating tools. Content belongs solely to the author, who can license it in any way he/she chooses (all rights reserved, Creative Commons, etc.). Authors have the option of having contextual ads run to the side, revenues from which are shared with Google. There is no vetting or editorial input from Google whatsoever.
Except... Might not the ads exert their own subtle editorial influence? In this entrepreneurial writers' fray, will authors craft their knols for AdSense optimization? Will they become, consciously or not, shills for the companies that place the ads (I'm thinking especially of high impact topic areas like health and medicine)? Whatever you may think of Wikipedia, it has a certain integrity in being ad-free. The mission is clear and direct: to build a comprehensive free encyclopedia for the Web. The range of content has no correlation to marketability or revenue potential. It's simply a big compendium of stuff, the only mention of money being a frank electronic tip jar at the top of each page. The Googlepedia, in contrast, is fundamentally an advertising platform. What will such an encyclopedia look like?
In the official knol announcement, Udi Manber, a VP for engineering at Google, explains the genesis of the project: "The challenge posed to us by Larry, Sergey and Eric was to find a way to help people share their knowledge. This is our main goal." You can see embedded in this statement all the trademarks of Google's rhetoric: a certain false humility, the pose of incorruptible geek integrity and above all, a boundless confidence that every problem, no matter how gray and human, has a technological fix. I'm not saying it's wrong to build a business, nor that Google is lying whenever it talks about anything idealistic, it's just that time and again Google displays an astonishing lack of self-awareness in the way it frames its services -? a lack that becomes especially obvious whenever the company edges into content creation and hosting. They tend to talk as though they're building the library of Alexandria or the great Encyclopédie, but really they're describing an advanced advertising network of Google-exclusive content. We shouldn't allow these very different things to become as muddled in our heads as they are in theirs. You get a worrisome sense that, like the Bushies, the cheerful software engineers who promote Google's products on the company's various blogs truly believe the things they're saying. That if we can just get the algorithm right, the world can bask in the light of universal knowledge.
The blogosphere has been alive with commentary about the knol situation throughout the weekend. By far the most provocative thing I've read so far is by Anil Dash, VP of Six Apart, the company that makes the Movable Type software that runs this blog. Dash calls out this Google self-awareness gap, or as he puts it, its lack of a "theory of mind":
Theory of mind is that thing that a two-year-old lacks, which makes her think that covering her eyes means you can't see her. It's the thing a chimpanzee has, which makes him hide a banana behind his back, only taking bites when the other chimps aren't looking.Theory of mind is the awareness that others are aware, and its absence is the weakness that Google doesn't know it has. This shortcoming exists at a deep cultural level within the organization, and it keeps manifesting itself in the decisions that the company makes about its products and services. The flaw is one that is perpetuated by insularity, and will only be remedied by becoming more open to outside ideas and more aware of how people outside the company think, work and live.
He gives some examples:
Connecting PageRank to economic systems such as AdWords and AdSense corrupted the meaning and value of links by turning them into an economic exchange. Through the turn of the millennium, hyperlinking on the web was a social, aesthetic, and expressive editorial action. When Google introduced its advertising systems at the same time as it began to dominate the economy around search on the web, it transformed a basic form of online communication, without the permission of the web's users, and without explaining that choice or offering an option to those users.
He compares the knol enterprise with GBS:
Knol shares with Google Book Search the problem of being both indexed by Google and hosted by Google. This presents inherent conflicts in the ranking of content, as well as disincentives for content creators to control the environment in which their content is published. This necessarily disadvantages competing search engines, but more importantly eliminates the ability for content creators to innovate in the area of content presentation or enhancement. Anything that is written in Knol cannot be presented any better than the best thing in Knol. [his emphasis]
And lastly concludes:
An awareness of the fact that Google has never displayed an ability to create the best tools for sharing knowledge would reveal that it is hubris for Google to think they should be a definitive source for hosting that knowledge. If the desire is to increase knowledge sharing, and the methods of compensation that Google controls include traffic/attention and money/advertising, then a more effective system than Knol would be to algorithmically determine the most valuable and well-presented sources of knowledge, identify the identity of authorites using the same journalistic techniques that the Google News team will have to learn, and then reward those sources with increased traffic, attention and/or monetary compensation.
For a long time Google's goal was to help direct your attention outward. Increasingly we find that they want to hold onto it. Everyone knows that Wikipedia articles place highly in Google search results. Makes sense then that they want to capture some of those clicks and plug them directly into the Google ad network. But already the Web is dominated by a handful of mega sites. I get nervous at the thought that www.google.com could gradually become an internal directory, that Google could become the alpha and omega, not only the start page of the Internet but all the destinations.
It will be interesting to see just how and to what extent knols start creeping up the search results. Presumably, they will be ranked according to the same secret metrics that measure all pages in Google's index, but given the opacity of their operations, who's to say that subtle or unconscious rigging won't occur? Will community ratings factor in search rankings? That would seem to present a huge conflict of interest. Perhaps top-rated knols will be displayed in the sponsored links area at the top of results pages. Or knols could be listed in order of community ranking on a dedicated knol search portal, providing something analogous to the experience of searching within Wikipedia as opposed to finding articles through external search engines. Returning to the theory of mind question, will Google develop enough awareness of how it is perceived and felt by its users to strike the right balance?
One last thing worth considering about the knol -? apart from its being possibly the worst Internet neologism in recent memory -? is its author-centric nature. It's interesting that in order to compete with Wikipedia Google has consciously not adopted Wikipedia's model. The basic unit of authorial action in Wikipedia is the edit. Edits by multiple contributors are combined, through a complicated consensus process, into a single amalgamated product. On Google's encyclopedia the basic unit is the knol. For each knol (god, it's hard to keep writing that word) there is a one to one correspondence with an individual, identifiable voice. There may be multiple competing knols, and by extension competing voices (you have this on Wikipedia too, but it's relegated to the discussion pages).
Viewed in this way, Googlepedia is perhaps a more direct rival to Larry Sanger's Citizendium, which aims to build a more authoritative Wikipedia-type resource under the supervision of vetted experts. Citizendium is a strange, conflicted experiment, a weird cocktail of Internet populism and ivory tower elitism -? and by the look of it, not going anywhere terribly fast. If knols take off, could they be the final nail in the coffin of Sanger's awkward dream? Bryan Alexander wonders along similar lines.
While not explicitly employing Sanger's rhetoric of "expert" review, Google seems to be banking on its commitment to attributed solo authorship and its ad-based incentive system to lure good, knowledgeable authors onto the Web, and to build trust among readers through the brand-name credibility of authorial bylines and brandished credentials. Whether this will work remains to be seen. I wonder... whether this system will really produce quality. Whether there are enough checks and balances. Whether the community rating mechanisms will be meaningful and confidence-inspiring. Whether self-appointed experts will seem authoritative in this context or shabby, second-rate and opportunistic. Whether this will have the feeling of an enlightened knowledge project or of sleezy intellectual link farming (or something perfectly useful in between).
The feel of a site -? the values it exudes -? is an important factor though. This is why I like, and in an odd way trust Wikipedia. Trust not always to be correct, but to be transparent and to wear its flaws on its sleeve, and to be working for a higher aim. Google will probably never inspire that kind of trust in me, certainly not while it persists in its dangerous self-delusions.
A lot of unknowns here. Thoughts?
Posted by ben vershbow at 5:06 PM
| Comments (3)
tags: authorship , citizendium , copyright , encyclopedia , google , publishing , search , wikipedia , writing
"digitization and its discontents"
11.06.2007, 8:14 AM
Anthony Grafton's New Yorker piece "Future Reading" paints a forbidding picture of the global digital library currently in formation on public and private fronts around the world (Google et al.). The following quote sums it up well - ?a refreshing counterpoint to the millenarian hype we so often hear w/r/t mass digitization:
The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we'll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.
Grafton begins and ends in a nostalgic tone, with a paean to the New York Public Library and the critic Alfred Kazin: the poor son of immigrants, City College-educated, who researched his seminal study of American literature On Native Grounds almost entirely with materials freely available at the NYPL. Clearly, Grafton is a believer in the civic ideal of the public library - ?a reservoir of knowledge, free to all - ?and this animates his critique of the balkanized digital landscape of search engines and commercial databases. Given where he appears to stand, I wish he could have taken a stab at what a digital public library might look like, and what sorts of technical, social, political and economic reorganization might be required to build it. Obviously, these are questions that would have required their own article, but it would have been valuable for Grafton, whose piece is one of those occasional journalistic events that moves the issue of digitization and the future of libraries out of the specialist realm into the general consciousness, to have connected the threads. Instead Grafton ends what is overall a valuable and intelligent article with a retreat into print fetishism - ?"crowded public rooms where the sunlight gleams on varnished tables....millions of dusty, crumbling, smelly, irreplaceable documents and books" - ?which, while evocative, obscures more than it illuminates.
Incidentally, those questions are precisely what was discussed at our Really Modern Library meetings last month. We're still compiling our notes but expect a report soon.
Posted by ben vershbow at 8:14 AM
| Comments (4)
tags: books , digitization , google , google_book_search , libraries , reallymodernlibrary , search
all the news that's fit to search
09.18.2007, 12:15 AM
Placing a long-term bet on online advertising and the power of search engines, the New York Times will, effective tomorrow, close down its two-year-old "Select" subscription service (which was actually making money for the paper) and opened up access to columnists, Select blogs, and archives from 1987 to the present, and 1851 to 1922. Nice!
From PaidContent, quoting the Times' own coverage:
The change is because of what's happened in the internet in the past two years - ?particularly the power of search." She [Vivian Schiller, senior vp and general manager of nytimes.com] added later: "Think about this recipe - ?millions and millions of new documents, all seo'd [search engine optimized], double-digit advertising growth." The Times expects "the scale and the power of the revenue that would come from that over time" to replace the subscriptions revenue and then some.
Posted by ben vershbow at 12:15 AM
| Comments (1)
tags: advertising , journalism , newspaper , newyorktimes , search
google news adds an interesting (and risky) editorial layer
08.10.2007, 9:19 AM
Starting this week, Google News will publish comments alongside linked stories from "a special subset of readers: those people or organizations who were actual participants in the story in question."
John Murrell and Steve Rubel have good analyses of why moving beyond pure aggregation is a risky move for Google, whose relationship with news content owners is already tense to say the least.
Posted by ben vershbow at 9:19 AM
| Comments (0)
| TrackBack
tags: copyright , editorial , google , journalism , search
six blind men and an elephant
07.03.2007, 9:23 AM
Thomas Mann, author of The Oxford Guide to Library Research, has published an interesting paper (pdf available) examining the shortcomings of search engines and the continued necessity of librarians as guides for scholarly research. It revolves around the case of a graduate student investigating tribute payments and the Peloponnesian War. A Google search turns up nearly 80,000 web pages and 700 books. An overwhelming retrieval with little in the way of conceptual organization and only the crudest of tools for measuring relevance. But, with the help of the LC Catalog and an electronic reference encyclopedia database, Mann manages to guide the student toward a manageable batch of about a dozen highly germane titles.
Summing up the problem, he recalls a charming old fable from India:
Most researchers - at any level, whether undergraduate or professional - who are moving into any new subject area experience the problem of the fabled Six Blind Men of India who were asked to describe an elephant: one grasped a leg and said "the elephant is like a tree"; one felt the side and said "the elephant is like a wall"; one grasped the tail and said "the elephant is like a rope"; and so on with the tusk ("like a spear"), the trunk ("a hose") and the ear ("a fan"). Each of them discovered something immediately, but none perceived either the existence or the extent of the other important parts - or how they fit together.Finding "something quickly," in each case, proved to be seriously misleading to their overall comprehension of the subject.
In a very similar way, Google searching leaves remote scholars, outside the research library, in just the situation of the Blind Men of India: it hides the existence and the extent of relevant sources on most topics (by overlooking many relevant sources to begin with, and also by burying the good sources that it does find within massive and incomprehensible retrievals). It also does nothing to show the interconnections of the important parts (assuming that the important can be distinguished, to begin with, from the unimportant).
Mann believes that books will usually yield the highest quality returns in scholarly research. A search through a well tended library catalog (controlled vocabularies, strong conceptual categorization) will necessarily produce a smaller, and therefore less overwhelming quantity of returns than a search engine (books do not proliferate at the same rate as web pages). And those returns, pound for pound, are more likely to be of relevance to the topic:
Each of these books is substantially about the tribute payments - i.e., these are not just works that happen to have the keywords "tribute" and "Peloponnesian" somewhere near each other, as in the Google retrieval. They are essentially whole books on the desired topic, because cataloging works on the assumption of "scope-match" coverage - that is, the assigned LC headings strive to indicate the contents of the book as a whole....In focusing on these books immediately, there is no need to wade through hundreds of irrelevant sources that simply mention the desired keywords in passing, or in undesired contexts. The works retrieved under the LC subject heading are thus structural parts of "the elephant" - not insignificant toenails or individual hairs.
If nothing else, this is a good illustration of how libraries, if used properly, can still be much more powerful than search engines. But it's also interesting as a librarian's perspective on what makes the book uniquely suited for advanced research. That is: a book is substantial enough to be a "structural part" of a body of knowledge. This idea of "whole books" as rungs on a ladder toward knowing something. Books are a kind of conceptual architecture that, until recently, has been distinctly absent on the Web (though from the beginning certain people and services have endeavored to organize the Web meaningfully). Mann's study captures the anxiety felt at the prospect of the book's decline (the great coming blindness), and also the librarian's understandable dread at having to totally reorganize his/her way of organizing things.
It's possible, however, to agree with the diagnosis and not the prescription. True, librarians have gotten very good at organizing books over time, but that's not necessarily how scholarship will be produced in the future. David Weinberg ponders this:
As an argument for maintaining human expertise in manually assembling information into meaningful relationships, this paper is convincing. But it rests on supposing that books will continue to be the locus of worthwhile scholarly information. Suppose more and more scholars move onto the Web and do their thinking in public, in conversation with other scholars? Suppose the Web enables scholarship to outstrip the librarians? Manual assemblages of knowledge would retain their value, but they would no longer provide the authoritative guide. Then we will have either of two results: We will have to rely on "'lowest common denominator'"and 'one search box/one size fits all' searching that positively undermines the requirements of scholarly research"...or we will have to innovate to address the distinct needs of scholars....My money is on the latter.
As I think is mine. Although I would not rule out the possibility of scholars actually participating in the manual assemblage of knowledge. Communities like MediaCommons could to some extent become their own libraries, vetting and tagging a wide array of electronic resources, developing their own customized search frameworks.
There's much more in this paper than I've discussed, including a lengthy treatment of folksonomies (Mann sees them as a valuable supplement but not a substitute for controlled taxonomies). Generally speaking, his articulation of the big challenges facing scholarly search and librarianship in the digital age are well worth the read, although I would argue with some of the conclusions.
Posted by ben vershbow at 9:23 AM
| Comments (5)
| TrackBack
tags: academic , books , folksonomies , google , libraries , library , mediacommons , research , search
the people's card catalog (a thought)
05.30.2007, 1:31 PM
New partners and new features. Google has been busy lately building up Book Search. On the institutional end, Ghent, Lausanne and Mysore are among the most recent universities to hitch their wagons to the Google library project. On the user end, the GBS feature set continues to expand, with new discovery tools and more extensive "about" pages gathering a range of contextual resources for each individual volume.
Recently, they extended this coverage to books that haven't yet been digitized, substantially increasing the findability, if not yet the searchability, of thousands of new titles. The about pages are similar to Amazon's, which supply book browsers with things like concordances, "statistically improbably phrases" (tags generated automatically from distinct phrasings in a text), textual statistics, and, best of all, hot-linked lists of references to and from other titles in the catalog: a rich bibliographic network of interconnected texts (Bob wrote about this fairly recently). Google's pages do much the same thing but add other valuable links to retailers, library catalogues, reviews, blogs, scholarly resources, Wikipedia entries, and other relevant sites around the net (an example). Again, many of these books are not yet full-text searchable, but collecting these resources in one place is highly useful.
It makes me think, though, how sorely an open source alternative to this is needed. Wikipedia already has reasonably extensive articles about various works of literature. Library Thing has built a terrific social architecture for sharing books. There are a great number of other freely accessible resources around the web, scholarly database projects, public domain e-libraries, CC-licensed collections, library catalogs.
Could this be stitched together into a public, non-proprietary book directory, a People's Card Catalog? A web page for every book, perhaps in wiki format, wtih detailed bibliographic profiles, history, links, citation indices, social tools, visualizations, and ideally a smart graphical interface for browsing it. In a network of books, each title ought to have a stable node to which resources can be attached and from which discussions can branch. So far Google is leading the way in building this modern bibliographic system, and stands to turn the card catalogue of the future into a major advertising cash nexus. Let them do it. But couldn't we build something better?
Posted by ben vershbow at 1:31 PM
| Comments (3)
| TrackBack
tags: books , ebooks , google , google_book_search , libraries , library , opensource , search , the_networked_book , wiki , wikipedia
emerging libraries at rice: day one
03.06.2007, 1:16 AM
For the next few days, Bob and I will be at the De Lange "Emerging Libraries" conference hosted by Rice University in Houston, TX, coming to you live with occasional notes, observations and overheard nuggets of wisdom. Representatives from some of the world's leading libraries are here: the Library of Congress, the British Library, the new Bibliotheca Alexandrina, as well as the architects of recent digital initiatives like the Internet Archive, arXiv.org and the Public Library of Science. A very exciting gathering indeed.
We're here, at least in part, with our publisher hat on, thinking quite a lot these days about the convergence of scholarly publishing with digital research infrastructure (i.e. MediaCommons). It was fitting then that the morning kicked off with a presentation by Richard Baraniuk, founder of the open access educational publishing platform Connexions. Connexions, which last year merged with the digitally reborn Rice University Press, is an innovative repository of CC-licensed courses and modules, built on an open volunteer basis by educators and freely available to weave into curricula and custom-designed collections, or to remix and recombine into new forms.
Connexions is designed not only as a first-stop resource but as a foundational layer upon which richer and more focused forms of access can be built. Foremost among those layers of course is Rice University Press, which, apart from using the Connexions publishing framework will still operate like a traditional peer review-driven university press. But other scholarly and educational communities are also encouraged to construct portals, or "lenses" as they call them, to specific areas of the Connexions corpus, possibly filtered through post-publication peer review. It will be interesting to see whether Connexions really will end up supporting these complex external warranting processes or if it will continue to serve more as a building block repository -- an educational lumber yard for educators around the world.
Constructive crit: there's no doubt that Connexions is one of the most important and path-breaking scholarly publishing projects out there, though it still feels to me more like backend infrastructure than a fully developed networked press. It has a flat, technical-feeling design and cookie cutter templates that give off a homogenous impression in spite of the great diversity of materials. The social architecture is also quite limited, and what little is there (ways to suggest edits and discussion forums attached to modules) is not well integrated with course materials. There's an opportunity here to build more tightly knit communities around these offerings -- lively feedback loops to improve and expand entries, areas to build pedagogical tutorials and to collect best practices, and generally more ways to build relationships that could lead to further collaboration. I got to chat with some of the Connexions folks and the head of the Rice press about some of these social questions and they were very receptive.
* * * * *
Michael A. Keller of Stanford spoke of emerging "cybraries" and went through some very interesting and very detailed elements of online library search that I'm too exhausted to summarize now. He capped off his talk with a charming tour through the Stanford library's Second Life campus and the library complex on Information Island. Keller said he ultimately doesn't believe that purely imitative virtual worlds will become the principal interface to libraries but that they are nonetheless a worthwhile area for experimentation.
Browsing during the talk, I came across an interesting and similarly skeptical comment by Howard Rheingold on a long-running thread on Many 2 Many about Second Life and education:
I've lectured in Second Life, complete with slides, and remarked that I didn't really see the advantage of doing it in SL. Members of the audience pointed out that it enabled people from all over the world to participate and to chat with each other while listening to my voice and watching my slides; again, you don't need an immersive graphical simulation world to do that. I think the real proof of SL as an educational medium with unique affordances would come into play if an architecture class was able to hold sessions within scale models of the buildings they are studying, if a biochemistry class could manipulate realistic scale-model simulations of protein molecules, or if any kind of lesson involving 3D objects or environments could effectively simulate the behaviors of those objects or the visual-auditory experience of navigating those environments. Just as the techniques of teleoperation that emerged from the first days of VR ended up as valuable components of laparascopic surgery, we might see some surprise spinoffs in the educational arena. A problem there, of course, is that education systems suffer from a great deal more than a lack of immersive environments. I'm not ready to write off the educational potential of SL, although, as noted, the importance of that potential should be seen in context. In this regard, we're still in the early days of the medium, similar to cinema in the days when filmmakers nailed a camera tripod to a stage and filmed a play; SL needs D.W. Griffiths to come along and invent the equivalent of close-ups, montage, etc.
Rice too has some sort of Second Life presence and apparently was beaming the conference into Linden land.
* * * * *
Next came a truly mind-blowing presentation by Noha Adly of the Bibliotheca Alexandrina in Egypt. Though only five years old, the BA casts itself quite self-consciously as the direct descendant of history's most legendary library, the one so frequently referenced in contemporary utopian rhetoric about universal digital libraries. The new BA glories in this old-new paradigm, stressing continuity with its illustrious past and at the same time envisioning a breathtakingly modern 21st century institution unencumbered by the old thinking and constrictive legacies that have so many other institutions tripping over themselves into the digital age. Adly surveyed more fascinating-sounding initiatives, collections and research projects than I can possibly recount. I recommend investigating their website to get a sense of the breadth of activity that is going on there. I will, however, note that that they are the only library in the world to house a complete copy of the Internet Archive: 1.5 petabytes of data on nearly 900 computers.
(Speaking of the IA, Brewster Kahle is also here and is closing the conference Wednesday afternoon. He brought with him a test model of the hundred dollar laptop, which he showed off at dinner (pic to the right) in tablet mode sporting an e-book from the Open Content Alliance's children's literature collection (a scanned copy of The Owl and the Pussycat)).
And speaking of old thinking and constrictive legacies, following Adly was Deanna B. Marcum, an associate librarian at the Library of Congress. Marcum seemed well aware of the big picture but gave off a strong impression of having hands tied by a change-averse institution that has still not come to grips with the basic fact of the World Wide Web. It was a numbing hour and made one palpably feel the leadership vacuum left by the LOC in the past decade, which among other things has allowed Google to move in and set the agenda for library digitization.
Next came Lynne J. Brindley, Chief Executive of the British Library, which is like apples to the LOC's oranges. Slick, publicly engaged and with pockets deep enough to really push the technological envelope, the British Library is making a very graceful and sometimes flashy (Turning the Pages) migration to the digital domain. Brindley had many keen insights to offer and described a several BL experiments that really challenge the conventional wisdom on library search and exhibitions. I was particularly impressed by these "creative research" features: short, evocative portraits of a particular expert's idiosyncratic path through the collections; a clever way of featuring slices of the catalogue through the eyes of impassioned researchers (e.g. here). Next step would be to open this up and allow the public to build their own search profiles.
* * * * *
That more or less covers today with the exception of a final keynote talk by John Seely Brown, which was quite inspiring and included a very kind mention of our work at MediaCommons. It's been a long day, however, and I'm fading. So I'll pick that up tomorrow.
Posted by ben vershbow at 1:16 AM
| Comments (3)
| TrackBack
tags: LOC , academic , brewster_kahle , conference , connexions , libraries , library , million_dollar_laptop , peer_review , publishing , search




