RDF = bigger piles 03.06.2006, 4:30 PM
posted by jdwilbur
Last week at a meeting of all the Mellon funded projects I heard a lot of discussion about RDF as a key technology for interoperability. RDF (Resource Description Framework) is a data model for machine readable metadata and a necessary, but not sufficient requirement for the semantic web. On top of this data model you need applications that can read RDF. On top of the applications you need the ability to understand the meaning in the RDF structured data. This is the really hard part: matching the meaning of two pieces of data from two different contexts still requires human judgement. There are people working on the complex algorithmic gymnastics to make this easier, but so far, it's still in the realm of the experimental.
So why pursue RDF? The goal is to make human knowledge, implicit and explicit, machine readable. Not only machine readable, but automatically shareable and reusable by applications that understand RDF. Researchers pursuing the semantic web hope that by precipitating an integrated and interoperable data environment, application developers will be able to innovate in their business logic and provide better services across a range of data sets.
Why is this so hard? Well, partly because the world is so complex, and although RDF is theoretically able to model an entire world's worth of data relationships, doing it seamlessly is just plain hard. You can spend time developing a RDF representation of all the data in your world, then someone else will come along with their own world, with their own set of data relationships. Being naturally friendly, you take in their data and realize that they have a completely different view of the category "Author," "Creator," "Keywords," etc. Now you have a big, beautiful dataset, with a thousand similar, but not equivalent pieces. The hard part—determining relationships between the data.
We immediately considered how RDF and Sophie would work. RDF importing/exporting in Sophie could provide value by preparing Sophie for integration with other RDF capable applications. But, as always, the real work is figuring out what it is that people could do with this data. Helping users derive meaning from a dataset begs the question: what kind of meaning are we trying to help them discover? A universe of linguistic analysis? Literary theory? Historical accuracy? I think a dataset that enabled all of these would be 90% metadata, and 10% data. This raises another huge issue: entering semantic metadata requires skill and time, and is therefore relatively rare.
In the end, RDF creates bigger, better piles of data—intact with provenance and other unique characteristics derived from the originating context. This metadata is important information that we'd rather hold on to than irrevocably discard, but it leaves us stuck with a labyrinth of data, until we create the tools to guide us out. RDF is ten years old, yet it hasn't achieved the acceptance of other solutions, like XML Schemas or DTD's. They have succeeded because they solve limited problems in restricted ways and require relatively simple effort to implement. RDF's promise is that it will solve much larger problems with solutions that have more richness and complexity; but ultimately the act of determining meaning or negotiating interoperability between two systems is still a human function. The undeniable fact of it remains— it's easy to put everyone's data into RDF, but that just leaves the hard part for last.
bowerbird on March 6, 2006 6:38 PM:
> I think a dataset that enabled all of these
> would be 90% metadata, and 10% data.
> This raises another huge issue:
> entering semantic metadata
> requires skill and time, and
> is therefore relatively rare.
and once we've spent all that skill and time
entering that semantic metadata, then we
_still_ have to "do the hard part" after that,
which is build tools that can _process_ it...
and once we've succeed building _those_,
we'll realize those tools are smart enough
that they could've understood the "data"
even if we _hadn't_ coded its "metadata",
so the skill and time we spent was wasted.
indeed, the most interesting aspect of those
tools will be the way they point to the _flaws_
in the simplistic and blindered way that we'd
originally coded all of our data's "metadata",
drawing attention to our cognitive blindspots.
Christian Wach on March 7, 2006 7:25 AM:
Dan Brickley, who works on RDF and Semantic Web technology at the W3C has just written an interesting article on his blog called The Persian for London is Tehran - in essence, he points out the dark side of this technology when used by governments etc to prevent access to information:
"Resource description is a double-edged sword; we can use it to find and locate relevant content, and to personalise, filter and prioritise incoming information. But the same technology can rather easily be used in ways that damage the Web as an international and universal communications platform"
I think it's a point well made - we may end up doing the censor's job for them by making it easier for them to identify content they don't want us to see. It gives a whole new context to what you're suggesting when you say: "the act of determining meaning or negotiating interoperability between two systems is still a human function".