Listing entries tagged with RDF
if not rdf, then what?: part II
03.30.2006, 1:24 PM
I had an exchange about my previous post with an RDF expert who explained to me that API's are not like RDF and it would be incorrect to try to equate them. She's right - API's do not replace the need for RDF, nor do they replicate the functionality of RDF. API's do provide access to data, but that data can be in many forms, including XML bound RDF. This is one of the pleasures and priviledges of writing on this blog: the audience contributes at a very high level of discourse, and is endowed with extremely deep knowledge about the topics under discussion.
I want to reiterate my point with a new inflection. By suggesting that API's were an alternative to RDF, I was trying to get at a point that had more to do with adoption than functionality. I admit, I did not make the point well. So let me make a second attempt: API's are about data access, and that, currently (and from my anecdotal experience) is where the value proposition lies for the new breed of web services. You have your data in someone's database. That data is accessible to developers to manipulate and represent back to you in new, innovative, and useful ways. Most of the attention in the webdev community is turning towards the development of new interfaces—not towards the development of new tools to manage and enrich the data (again, anecdotal evidence only). Yes, there are people still interested in semantic data; we are indebted to them for continuing to improve the way our systems interact at a data level. But the focus of development has shifted to the interface. API's make the gathering of data as simple as setting parameters, leaving only the work of designing the front-end experience.
Another note on RDF from my exchange: it was pointed out that practitioners of RDF prefer not to read it in XML, but instead use Notation 3 (N3), which is undeniably easier to read than XML. I don't know enough about N3 to make a proper example, but I think you can get the idea if you look at the examples here and here.
Posted by jdwilbur at 01:24 PM
| Comments (0)
| TrackBack
tags: N3 , RDF , api , metadata , notation_3 , web_development , xml
if not rdf, then what?
03.28.2006, 11:35 AM
I posted about RDF and the difficulty the web development community has had fully adopting RDF and ontologies as a method of metadata organization. I said that one of the reasons was the relative complexity of RDF and the cost of generating useful metadata (as opposed to just enough information to solve the current problem). Simon St. Laurent has a nice redux of the matter. I won't try to duplicate that, but I do want to explain some of the details about RDF. Though I made a case for how complex RDF is when used to create fully relational data sets, I didn't do a very good job of explaining how simple RDF is in principle. RDF proponents believe they are building the future. I'm not entirely convinced, but I want to take a close look at RDF before I consider other solutions.
RDF seems overwhelming, but in the inimitable words of Squire Patsy, "It's only a model!" A model, in this case, that can representat digital and real things and their relationships. The promise of RDF is that it can describe everything using a combination of unique identifiers, properties and property values.
Unique Identifiers
The heart of RDF is the unique identifier. Your name is a unique identifier, but only as long as there is no one else in the room who answers to [your-name-here]. This, clearly, is not a good way to create a universal identification system. Your social security number is a unique identifier in this country, but it doesn't signify much in China, and the system is not extensible (we'd run out of numbers if we tried to SSN the Chinese). Your email address is a unique identifier on the Internet—it works pretty well as a unique identifier. A Universal Resource Indicator (URI) is a little more extensible, and, since it's longer than an email, can provide more information. You can use a URI to identify something, even if it can't be retrieved through the web. A product at Amazon.com, for example, could have a unique URI, even though you still need a truck to bring it to you.
Properties
If we look at objects in the real world, they have physical properties, like size, color, and hardness. An example: my kitchen table. It's a three dimensional object, so it has height, width, length. It's made of wood, it has been stained. It also has informational properties: the date I purchased it, the person who sold it to me, the area of the country it came from, the level of personal attachment I have for the thing. Each of these properties can be put into RDF, by linking it to a schema that defines the property in a normative fashion. It'll make a little more sense when I give an example. But for that to happen I need to describe...
Property Values
Property values are the names, numbers, and dates that make properties make sense. My kitchen table is 78" long x 28" wide x 34" tall, dark-walnut stained, and soft (as wood goes). I bought it in February, 2002 from Joe Komenda, and I'm never going to part with it (even though it isn't really NYC apartment sized). Property values are the easy part of the metadata. Associating property values to properties, and properties to normative schemas, that's when things get tricky.
Here's the example I promised (bound in an XML format):
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:kt="http://www.jdwilbur.fake/furniture#"
xmlns:geom2d="http://nurl.org/0/geom2d/1.0/"
xmlns:map="http://nurl.org/0/geography/map/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<kt:height>34</kt:height>
<kt:width>28</kt:width>
<kt:length>78</kt:length>
<kt:price>150</kt:price>
<kt:month>February</kt:month>
<kt:year>2002</kt:year>
<dc:coverage>
<geom2d:Point>
<map:srs resource="http://nurl.org/0/geography/SRSCatalog/wgs84">
<geom2d:x>-123.817</geom2d:x>
<geom2d:y>46.183</geom2d:y>
</geom2d:Point>
</dc:coverage>
<kt:seller rdf:resource="http://www.komenda.fake/Joseph%20Komenda#" />
<kt:sellit>Never ever ever</kt:sellit>
</rdf:Description>
</rdf:RDF>
http://www.jdwilbur.fake/furniture/kitchen-table: The URI of my kitchen table
kt:height: The property height from my schema defined here: http://www.jdwilbur.fake/furniture#
34: The property value that tells me how tall my table is. I would infer from the schema that the value is in inches, not millimeters or light years
For the purposes of this example, I've made up my own fake schema (which would be a bunch of lines of xml similar to the example above) and included three real ones: Dublin Core dc, Geomap 2d geom2d for mapping coordinates, and map to relate the coordinates to physical locations. My schema, kt (which is a stand for the words kitchen table) includes some special properties like seller and sellit. The seller, Joe Komenda, has his own URI (it appears after rdf:resource). The others are fairly standard, but have a specific meaning in my personal context. The only other tricky part is the geographic coordinates, because I'm using three different schemas to define a geographic point. (It's just an example taken from mapbureau. It could resolve to the middle of the Pacific Ocean for all I know)
The obvious point here is that writing RDF is hard. We need automated tools to help us compose in this syntax, which is convoluted but requires perfection to work. Humans are not perfect; RDF is not our language. RDF also requires front-loading: developing schemas and choosing terms, URI's, finding prior art so that terms can be reused. We need tools to help us manage that aspect. And we need applications that demand RDF. Currently, the demand for RDF is low because it is mostly for the sake of maintaing the richness of a data set for some future application—not the ones I work with every day.
So if RDF, syntactically difficult, but conceptually easy, cannot get adopted, what is the alternative? The web API. A wide variety of new web applications and services are accompanied by an API. It seems like you can hardly be part of Web 2.0 without one. What does the API have that RDF doesn't? Simplicity. Famililarity. You cannot interact with an API unless you follow the rules. Fine. Same with RDF. But the rules of an API fall into the familiar realm of setting parameters, grabbing previously named functions, and following the documentation. This is like a caffeinated beverage for developers: they instinctively know how to consume it. More than that, API's mean that people can innovate on an interface level, even if they don't have serious coding chops. I've seen the Google API implemented in twenty minutes. This is a more fluid way to develop; one that feels more comfortable even if it sacrifices information richness. We'll get to RDF one day, maybe in Web 3.5, but until then we will take small steps towards data sharing and interoperability with API's.
Posted by jdwilbur at 11:35 AM
| Comments (7)
| TrackBack
tags: RDF , api , data , dublin_core , interoperability , property , schema , syntax , uri , value , web_2.0 , xml
RDF = bigger piles
03.06.2006, 4:30 PM
Last week at a meeting of all the Mellon funded projects I heard a lot of discussion about RDF as a key technology for interoperability. RDF (Resource Description Framework) is a data model for machine readable metadata and a necessary, but not sufficient requirement for the semantic web. On top of this data model you need applications that can read RDF. On top of the applications you need the ability to understand the meaning in the RDF structured data. This is the really hard part: matching the meaning of two pieces of data from two different contexts still requires human judgement. There are people working on the complex algorithmic gymnastics to make this easier, but so far, it's still in the realm of the experimental.
So why pursue RDF? The goal is to make human knowledge, implicit and explicit, machine readable. Not only machine readable, but automatically shareable and reusable by applications that understand RDF. Researchers pursuing the semantic web hope that by precipitating an integrated and interoperable data environment, application developers will be able to innovate in their business logic and provide better services across a range of data sets.
Why is this so hard? Well, partly because the world is so complex, and although RDF is theoretically able to model an entire world's worth of data relationships, doing it seamlessly is just plain hard. You can spend time developing a RDF representation of all the data in your world, then someone else will come along with their own world, with their own set of data relationships. Being naturally friendly, you take in their data and realize that they have a completely different view of the category "Author," "Creator," "Keywords," etc. Now you have a big, beautiful dataset, with a thousand similar, but not equivalent pieces. The hard part—determining relationships between the data.
We immediately considered how RDF and Sophie would work. RDF importing/exporting in Sophie could provide value by preparing Sophie for integration with other RDF capable applications. But, as always, the real work is figuring out what it is that people could do with this data. Helping users derive meaning from a dataset begs the question: what kind of meaning are we trying to help them discover? A universe of linguistic analysis? Literary theory? Historical accuracy? I think a dataset that enabled all of these would be 90% metadata, and 10% data. This raises another huge issue: entering semantic metadata requires skill and time, and is therefore relatively rare.
In the end, RDF creates bigger, better piles of data—intact with provenance and other unique characteristics derived from the originating context. This metadata is important information that we'd rather hold on to than irrevocably discard, but it leaves us stuck with a labyrinth of data, until we create the tools to guide us out. RDF is ten years old, yet it hasn't achieved the acceptance of other solutions, like XML Schemas or DTD's. They have succeeded because they solve limited problems in restricted ways and require relatively simple effort to implement. RDF's promise is that it will solve much larger problems with solutions that have more richness and complexity; but ultimately the act of determining meaning or negotiating interoperability between two systems is still a human function. The undeniable fact of it remains— it's easy to put everyone's data into RDF, but that just leaves the hard part for last.
Posted by jdwilbur at 04:30 PM
| Comments (2)
| TrackBack
tags: Libraries, Search and the Web , Mellon , RDF , Sophie , interoperability , semantic_web , the_networked_book




