Listing entries tagged with xml
atomisation, part two 02.18.2007, 8:41 AM
In the last few weeks a number of people have sent me a link to Michael Wesch's video meditation on the evolution of media and its likely impact on all aspects of human interaction. One of Wesch's main points is that the development of XML enables the separation of form from content which in turn is fueling the transition to new modes of communication.
Paradoxically Wesch's video works precisely because of the integration of form and content . . . possibly one of the best uses of animated text and moving images in the service of a new kind of expository essay. If you simply read the text in an RSS reader it wouldn't have anywhere near the impact it does. Although Wesch's essay depends on the unity of form and content, he is certainly right about the increasing trend on the web to decontextualize content by making it independent of form. If Mcluhan was right about the medium being a crucial part of the message, then, if we are looking at content in different forms are we getting the same message? If not, what does this mean for social discourse going forward?
ITIN place | 2007 redux: design journal, part 3 12.13.2006, 7:12 AM
 I'd just begun hard coding navigational elements for the new ITIN archives, when I suspected Through the Looking-Glass might be an apt, fun read to offset the growing angst around coding. Maybe something in literature would provide the gestalt I felt missing from the minutia of writing lines of functions, booleans, and parameters. Sounds holistic maybe, but this suspicion plus a Wikipedia entry I'd read on Lewis Carol convinced me it'd be the perfect read just now. So, when I was walking through Penn Staten earlier last month, I found a bookseller in the LIRR station and, all excited, I picked up a copy of Alice's Adventures, with the intentions of breezing through it in order to move onto Looking-Glass. It was nice to open ITIN place the next day to find Stormy Blues For Alice In The Looking Glass. Somehow, the two had already met.
Sally: I've been trying to figure out some of the back-end stuff for the past few days, namely, how to get your entire archive to link up to something like this. Do you have any programming / web design wizard friends who might be able to offer me some technical advice?
Alex: God know.... I guess we'll have to build them manually...some 700 links? yipes.
Alex: I mean, god no....LOL
Sally: Hey, I'm working with a programmer now on a script that will allow the archive to thumbnail images from your entries and automatically load them (& URLs to the corresponding entries) into the Flash file. I don't know PHP, which is likely the language needed to thumbnail your images automatically, so I'm getting help on that. Once that's in place, we should be able to (a) play further with layout aspects! and (b) the archive should automatically update every time you publish an entry. Getting closer...
Alex: and it will still do that animated scale up and down trick?
Sally: my PHP programmer who would work on the thumbnail-ing flaked out on me, seems programmers can be as flaky as drummers... So, I set it upon myself to teach myself Flash-based blog applications. At its simplest, it requires a little PHP, a little XML and Flash, all in conversation with what you post online.
Ben: As for PHP gurus... We do in fact have someone working with us right now who's an experience PHP coder. We're keeping him pretty busy right now with MediaCommons stuff, but I think he could help you out with this stuff in a few weeks.
Sally: I also imagine there should be more than one way to search / browse the archives. One might be a linear "wall" from month to month that we could click/scroll through, another might be a drop-down menu of months say, to the right of the "wall" of images. Any thoughts on that?
Meanwhile, I'd plotted out on my whiteboard a map of the flash file. It looked to me that there were two methods of approach, interface-wise. Either the zoom function would scale up the size of an entire month's calendar, and a re-center or panning function would allow the user to focus on a particular entry - or - the zoom function would simply scale up one entry at a time onrollOver (the original idea).
I am (still) drawn to the first idea, even though I've put it aside, since that would best recreate the sense of approaching a gallery wall, or landing on the (x,y) of Alex's blog. But, caveats abound -- if an onPress fires the zoom and re-center, then how do you click the entry's permalink and/or zoom out? Is this overcomplicating things? Here is an example of an unweildy new zoom (an attempt to manage dragging and zooming).
Then I started to think about loading in individual blog entries from the XML. I talked to my friend Mike about this for a while and in exchange for some brownies (although really only out of his extreme kindness and generosity) he constructed an XML format, sample.xml, and guided me on a way to load in the HTML of each individual entry into a small clip.
The great thing about using the HTML of each entry in the previous example is that it would allow the archives to build completely dynamically. Any changes Alex made in an archived post would reflect in real time in the flash file. Unfortunately, this doesn't cut down on load time and I can't coax the videos and animated .gifs to appear (of which there are considerable number). Here is an example of one entry pulled into the Flash file with HTML. CSS can be incorporated, but it's obviously slow loading.
Mike brought up something I'd wondered too too: are we going to have one XML file for the entire archive? It seems to make more sense for each month to have it's own.
So, after a few weeks, I caught up with Future of the Book's expert developer Eddie Tejeda, and we decided to put an XML document within each month. On an exciting note, Eddie devised a great scheme (script) to take screen shots of all of ITIN place's entries. He's working on getting the image size down, so as to minimize loading time.
Eddie's screen shots would load much faster than pure HTML, but it could possibly cut the dynamism. This would build something like this, only faster:
Most of the hard coding of the archive is done. Design matters remain: At the moment, the entries load in rather like a retro computer solitaire game, and drop down menus are disconnected and unskinned. It's a task to go back and forth between design and developing -- I'm just cutting my teeth on some of this and the dryness of programming can dilute creative inspiration (if this is anything to go by). The archive is very close to complete; it will be a thrill to use this gentler beast.
if not rdf, then what?: part II 03.30.2006, 1:24 PM
I had an exchange about my previous post with an RDF expert who explained to me that API's are not like RDF and it would be incorrect to try to equate them. She's right - API's do not replace the need for RDF, nor do they replicate the functionality of RDF. API's do provide access to data, but that data can be in many forms, including XML bound RDF. This is one of the pleasures and priviledges of writing on this blog: the audience contributes at a very high level of discourse, and is endowed with extremely deep knowledge about the topics under discussion.
I want to reiterate my point with a new inflection. By suggesting that API's were an alternative to RDF, I was trying to get at a point that had more to do with adoption than functionality. I admit, I did not make the point well. So let me make a second attempt: API's are about data access, and that, currently (and from my anecdotal experience) is where the value proposition lies for the new breed of web services. You have your data in someone's database. That data is accessible to developers to manipulate and represent back to you in new, innovative, and useful ways. Most of the attention in the webdev community is turning towards the development of new interfaces—not towards the development of new tools to manage and enrich the data (again, anecdotal evidence only). Yes, there are people still interested in semantic data; we are indebted to them for continuing to improve the way our systems interact at a data level. But the focus of development has shifted to the interface. API's make the gathering of data as simple as setting parameters, leaving only the work of designing the front-end experience.
Another note on RDF from my exchange: it was pointed out that practitioners of RDF prefer not to read it in XML, but instead use Notation 3 (N3), which is undeniably easier to read than XML. I don't know enough about N3 to make a proper example, but I think you can get the idea if you look at the examples here and here.
if not rdf, then what? 03.28.2006, 11:35 AM
I posted about RDF and the difficulty the web development community has had fully adopting RDF and ontologies as a method of metadata organization. I said that one of the reasons was the relative complexity of RDF and the cost of generating useful metadata (as opposed to just enough information to solve the current problem). Simon St. Laurent has a nice redux of the matter. I won't try to duplicate that, but I do want to explain some of the details about RDF. Though I made a case for how complex RDF is when used to create fully relational data sets, I didn't do a very good job of explaining how simple RDF is in principle. RDF proponents believe they are building the future. I'm not entirely convinced, but I want to take a close look at RDF before I consider other solutions.
RDF seems overwhelming, but in the inimitable words of Squire Patsy, "It's only a model!" A model, in this case, that can representat digital and real things and their relationships. The promise of RDF is that it can describe everything using a combination of unique identifiers, properties and property values.
The heart of RDF is the unique identifier. Your name is a unique identifier, but only as long as there is no one else in the room who answers to [your-name-here]. This, clearly, is not a good way to create a universal identification system. Your social security number is a unique identifier in this country, but it doesn't signify much in China, and the system is not extensible (we'd run out of numbers if we tried to SSN the Chinese). Your email address is a unique identifier on the Internet—it works pretty well as a unique identifier. A Universal Resource Indicator (URI) is a little more extensible, and, since it's longer than an email, can provide more information. You can use a URI to identify something, even if it can't be retrieved through the web. A product at Amazon.com, for example, could have a unique URI, even though you still need a truck to bring it to you.
If we look at objects in the real world, they have physical properties, like size, color, and hardness. An example: my kitchen table. It's a three dimensional object, so it has height, width, length. It's made of wood, it has been stained. It also has informational properties: the date I purchased it, the person who sold it to me, the area of the country it came from, the level of personal attachment I have for the thing. Each of these properties can be put into RDF, by linking it to a schema that defines the property in a normative fashion. It'll make a little more sense when I give an example. But for that to happen I need to describe...
Property values are the names, numbers, and dates that make properties make sense. My kitchen table is 78" long x 28" wide x 34" tall, dark-walnut stained, and soft (as wood goes). I bought it in February, 2002 from Joe Komenda, and I'm never going to part with it (even though it isn't really NYC apartment sized). Property values are the easy part of the metadata. Associating property values to properties, and properties to normative schemas, that's when things get tricky.
Here's the example I promised (bound in an XML format):
<kt:seller rdf:resource="http://www.komenda.fake/Joseph%20Komenda#" />
<kt:sellit>Never ever ever</kt:sellit>
http://www.jdwilbur.fake/furniture/kitchen-table: The URI of my kitchen table
kt:height: The property height from my schema defined here: http://www.jdwilbur.fake/furniture#
34: The property value that tells me how tall my table is. I would infer from the schema that the value is in inches, not millimeters or light years
For the purposes of this example, I've made up my own fake schema (which would be a bunch of lines of xml similar to the example above) and included three real ones: Dublin Core dc, Geomap 2d geom2d for mapping coordinates, and map to relate the coordinates to physical locations. My schema, kt (which is a stand for the words kitchen table) includes some special properties like seller and sellit. The seller, Joe Komenda, has his own URI (it appears after rdf:resource). The others are fairly standard, but have a specific meaning in my personal context. The only other tricky part is the geographic coordinates, because I'm using three different schemas to define a geographic point. (It's just an example taken from mapbureau. It could resolve to the middle of the Pacific Ocean for all I know)
The obvious point here is that writing RDF is hard. We need automated tools to help us compose in this syntax, which is convoluted but requires perfection to work. Humans are not perfect; RDF is not our language. RDF also requires front-loading: developing schemas and choosing terms, URI's, finding prior art so that terms can be reused. We need tools to help us manage that aspect. And we need applications that demand RDF. Currently, the demand for RDF is low because it is mostly for the sake of maintaing the richness of a data set for some future application—not the ones I work with every day.
So if RDF, syntactically difficult, but conceptually easy, cannot get adopted, what is the alternative? The web API. A wide variety of new web applications and services are accompanied by an API. It seems like you can hardly be part of Web 2.0 without one. What does the API have that RDF doesn't? Simplicity. Famililarity. You cannot interact with an API unless you follow the rules. Fine. Same with RDF. But the rules of an API fall into the familiar realm of setting parameters, grabbing previously named functions, and following the documentation. This is like a caffeinated beverage for developers: they instinctively know how to consume it. More than that, API's mean that people can innovate on an interface level, even if they don't have serious coding chops. I've seen the Google API implemented in twenty minutes. This is a more fluid way to develop; one that feels more comfortable even if it sacrifices information richness. We'll get to RDF one day, maybe in Web 3.5, but until then we will take small steps towards data sharing and interoperability with API's.
explosion 11.22.2005, 2:10 PM
A Nov. 18 post on Adam Green's Darwinian Web makes the claim that the web will "explode" (does he mean implode?) over the next year. According to Green, RSS feeds will render many websites obsolete:
The explosion I am talking about is the shifting of a website's content from internal to external. Instead of a website being a "place" where data "is" and other sites "point" to, a website will be a source of data that is in many external databases, including Google. Why "go" to a website when all of its content has already been absorbed and remixed into the collective datastream.
Does anyone agree with Green? Will feeds bring about the restructuring of "the way content is distributed, valued and consumed?" More on this here.
Posted by lisa lynch at 2:10 PM
| Comments (5)
tags: Libraries, Search and the Web , Online , Publishing, Broadcast, and the Press , RSS , blogging , blogs , darwin , darwinism , google , internet , singularity , syndication , web , xml
ted nelson & the ideologies of documents 10.31.2005, 4:59 PM
I. Nelson's criticism
Ted Nelson (introduced last week by Ben) is a lonely revolutionary marching a lonely march, and whenever he's in the news mockery is heard. Some of this is with good reason: nobody's willing to dismantle the Internet we have for his improved version of the Internet (which doesn't quite work yet). You don't have to poke around too long on his website to find things that reek of crackpottery. But the problems that Nelson has identified in the electronic world are real, even if the solutions he's proposing prove to be untenable. I'd like to expand on on one particular aspect of Nelson's thought prominent in his latest missive: his ideas about the inherent ideologies of document formats. While this sounds very blue sky, I think his ideas do have some repercussions for what we're doing at the Institute, and it's worth investigating them, if not necessarily buying off on Xanadu.
Nelson starts from the position that attempting to simulate paper with computers is a mistaken idea. (He's not talking about e-ink & the idea of electronic paper, though a related criticism could be made of that: e-ink by itself won't solve the problem of reading on screens.) This is correct: we could do many more things with virtual space than we can with a static page. Look at this Flash demonstration of Jef Raskin's proposed zooming interface (previously discussed here), for example. But we don't usually go that far because we tend to think of electronic space in terms of the technology that preceded it – paper space. This has carried over into the way in which we structure documents for online reading.
There are two major types of electronic documents online. In one, the debt to paper space is explicit: PDFs, one of the major formats currently used for electronic books, are a compressed version of Postscript, a specification designed to tell a printer exactly what should be on a printed page. While a PDF has more functionality than a printed page – you can search it, for example, and if you're tricky you can embed hyperlinks and tables of content in them – it's built on the same paradigm. A PDF is also like a printed page in that it's a finalized product: while content in a PDF can be written over with annotations, it's difficult to make substantial changes to it. A PDF is designed to be an electronic reproduction of the printed page. More functionality has been welded on to it by Adobe, who created the format, but it is, at its heart attempting to maintain fidelity to the printed page.
The other dominant paradigm is that of the markup language. A quick, not too technical introduction: a markup language is a way of encoding instructions for how a text is to be structured and formatted in the text. HTML is a markup language; so is XML. This web page is created in a markup language; if you look at it with the "View Source" option on your browser, you'll see that it's a text file divided up by a lot of HTML tags, which are specially designed to format web pages: putting <i> and </i> around a word, for example, makes it italic. XML is a broader concept than HTML: it's a specification that allows people to create their own tags to do other things: some people are using their own version of XML to represent ebooks.
There's a lot of excitement about XML – it's a technology that can be (and is)bent to many different uses. A huge percentage of the system files on your computer, for example, probably use some flavor of XML, even if you've never thought of composing an XML documents. Nelson's point, however, is that there's a central premise to all XML: that all information can be divided up into a logical hierarchy – an outline, if you will. A lot of documents do work this way: book is divided into chapters; a chapter is divided into paragraphs; paragraphs are divided into words. A newspaper is divided into stories; each story has a headline and body copy; the body copy is divided into paragraphs; a paragraph is divided into sentences; a sentence is divided into words; and words are divided into letters, the atom of the markup universe.
II. A Victorian example
But while this is the dominant way we arrange information, this isn't necessarily a natural way to arrange things, Nelson points out, or the only way. It's one way of many possible ones. Consider this spread of pages (double-click to enlarge them):
This is a title page from a book printed by William Morris, another self-identified humanist. We mostly think of William Morris (when we're not confusing him with the talent agency) as a source of wallpaper, but his work as a book designer can't be overvalued. The book was printed in 1893; it's entitled The Tale of King Florus and the Fair Jehane. Like all of Morris's books, it's sumptuous to the point of being unreadable: Morris was dead set on bringing beauty back into design's balance of aesthetics & utility, and maybe over-corrected to offset the Victorian fixation on the latter.
I offer this spread of pages as an example because the elements that make up the page don't break down easily into hierarchical units. Let's imagine that we wanted to come up with an outline for what's on these pages – let's consider how we would structure them if we wanted to represent them in XML. I'm not interested in how we could represent this on the Web or somewhere else – it's easy enough to do that as an image. I'm more interested in how we would make something like this if we were starting from scratch & wanted to emulate Morris's type and woodcuts – a more theoretical proposition.
First, we can look at the elements that comprise the page. We can tell each page is individually important. Each page has a text box, with decorative grapevines around the text box; inside the text box, the title gets its own page; on the second page, there's the title repeated, followed by two body paragraphs, separated by a fleuron. The first paragraph gets an illustrated dropcap. Each word, if you want to go down that far, is composed of letters.
But if you look closer, you'll find that the elements on the page don't decompose into categories quite so neatly. If you look at the left-hand page, you can see that the title's not all there – this is the second title page in the book. The title isn't part of the page – as would almost certainly be assumed under XML – rather, they're overlapping units. And the page backgrounds aren't mirror images of each other: each has been created uniquely. Look at the title at the top of the right-hand page: it's followed by seven fleurons because it takes seven of them to nicely fill the space. Everything here's been minutely adjusted by hand. Notice the characters in the title on the right and how they interact with the flourishes around them: the two A's are different, as are the two F's, the two N's, the two R's, the two E's. You couldn't replicate this lettering with a font. You can't really build a schema to represent what's on these two pages. A further argument: to make this spread of pages rigorous, as you'd have to to represent it in XML, would be to ruin them aesthetically. The vines are the way they are because the letters are the way they are: they've been created together.
The inability of XML to adequately handle what's shown on these pages isn't a function of the screen environment. It's a function of the way we build electronic documents right now. Morris could build pages this way because he didn't have to answer to the particular restraints we do now.
III. The ideologies of documents
Let's go back to Ted:
Nearly every form of electronic document- Word, Acrobat, HTML, XML- represents some business or ideological agenda. Many believe Word and Acrobat are out to entrap users; HTML and XML enact a very limited kind of hypertext with great internal complexity. All imitate paper and (internally) hierarchy.
For years, hierarchy simulation and paper simulation have been imposed throughout the computer world and the world of electronic documents. Falsely portrayed as necessitated by "technology," these are really just the world-view of those who build software. I believe that for representing human documents and thought, which are parallel and interpenetrating– some like to say "intertwingled"– hierarchy and paper simulation are all wrong.
It's possible to imagine software that would let us follow our fancy and create on the screen pages that look like William Morris's – a tool that would let a designer make an electronic woodcut with ease. Certainly there are approximations. But the sort of tool I imagine doesn't exist right now. This is the sort of tool we should have – there's no reason not to have it already. Ted again:
I propose a different document agenda: I believe we need new electronic documents which are transparent, public, principled, and freed from the traditions of hierarchy and paper. In that case they can be far more powerful, with deep and rich new interconnections and properties- able to quote dynamically from other documents and buckle sideways to other documents, such as comments or successive versions; able to present third-party links; and much more.
Most urgently: if we have different document structures we can build a new copyright realm, where everything can be freely and legally quoted and remixed in any amount without negotiation.
Ben does a fine job of going into the ramifications of Nelson's ideas about transclusion, which he proposes as a solution. I think it's an interesting idea which will probably never be implemented on a grand scale because there's not enough of an impetus to do so. But again: just because Nelson's work is unpragmatic doesn't mean that his critique is baseless.
I feel there's something similar in the grandiosity of Nelson's ideas and Morris's beautiful but unreadable pages. William Morris wasn't just a designer: he saw his program of arts and crafts (of which his books were a part) as a way to emphasize the beauty of individual creation as a course correction to the increasingly mechanized & dehumanized Victorian world. Walter Benjamin declares (in "The Author as Producer") that there is "a difference between merely supplying a production apparatus and trying to change the production apparatus". You don't have to make books exactly like William Morris's or implement Ted Nelson's particular production apparatus to have your thinking changed by them. Morris, like Nelson, was trying to change the production apparatus because he saw that another world was possible.
And a postscript: as mentioned around here occasionally, the Institute's in the process of creating new tools for electronic book-making. I'm in the process of writing up an introduction to Sophie (which will be posted soon) which does its best to justify the need for something new in an overcrowded world: Nelson's statement neatly dovetailed with my own thinking on the subject on why we need something new: so that we have the opportunity to make things in other ways. Sophie won't be quite as radical as Nelson's vision, but we will have something out next year. It would be nice if Nelson could do the same.
google blog search - still a long way to go 09.14.2005, 5:01 PM
Google's new blog search engine reminds me of how far we still have to go with blog search. The engine works much the same way as Google's general web search - with keywords and page ranking - only here it's searching RSS feeds. Recent posts with keyword matches fill the column, and a few links to related blogs come up at the top. But there's the rub. These so-called "related" blogs are only related by direct keyword matches in their title tagline. I just searched "poetry" and came up with only three related blogs. C'mon. A search for "gossip" turns up only one related blog - "Starbucks Gossip". There has to be some kind of promotion going on here, though their "about" page mentions nothing of the kind.
A good engine would be capable of searching blogs by their subject, their preoccupation, their obsession. Many blogs could be considered "general," but just as many have a special focus, and readers are often searching with a particular theme in mind. They don't just want a list of transient posts, but whole sites that might potentially become regular destinations. Many blogs are valuable publications that prove themselves day after day. But blog search hasn't yet grown beyond the trendy "what's the latest chatter on the blogosphere" mode.
I do have to give credit to Technorati. Glitchy as it is, they're trying to think of creative ways - tagging, author-determined keywords - to help readers find interesting blogs and authors their audience. Then again, my greatest finds have usually been from other blogs. Humans will always be the smartest aggregators.
People out there, what do you use?