Posted by: Roger King
Dublin Core, MODS, namespaces, Semantic Web, the Metadata Object Description Schema
As we have seen, namespaces are a core element of the emerging Semantic Web. By posting namespaces on the Web, we can share precise vocabularies that will hopefully enable us to automate the process of searching the Web.
Searching with today’s search engines, like Google, is an inaccurate and highly iterative process. Searches are based on matching our search words with words in the documents that have been found and indexed in advance by the search engine. It can be a very painstaking process: we have to click on the URLs that are returned, and for each one, make a decision as to whether or not the page is relevant. We typically end up changing our search words gradually, as we hone our search criteria.
Namespaces are intended as a key element of a long term goal to make search engines of the future smarter. If the terms we used to formulate our searches came from widely-adopted, standardized namespaces, there would be far less painstaking iteration involved in finding the right webpages. We would accompany our search requests with links to the namespaces that define terms we are using. And in fact, searching would become at least partly automatic, with the browser able to narrow the set of returned URLs by making use of its knowledge of namespaces.
The Dublin Core.
Let’s take a look at one of the most widely known namespaces. It’s called the Dublin Core. But, as it turns out, it proved too simple and has since been eclipsed, at least in part, by a somewhat more sophisticated namespace called the Metadata Object Description Schema.
To get started, here’s another way to look at a namespace: it is used to create metadata that describes some data source. In particular, the Dublin Core was engineered to provide metadata for resources that can be found on the Web, including text-based documents, images, and video, and in particular, web pages. Want to know what a web page is all about? Look at its metadata, specified with the Dublin Core standard.
By the way, the namespace is named after Dublin, Ohio, not the other Dublin. The namespace was the result of a workshop held in Dublin in 1995. It is not an XML extension, like SMIL, the language used for building multimedia presentations. However, the Dublin Core can be used to create metadata for documents that are specified with XML or one of its many extensions.
So, what is in the Dublin Core? Basically it is a set of terms such as Contributor, Publisher, and Language. Some of the terms generally refer to very simple values, like Contributer, which is the person or organization that created a document.
To look at one of the potentially more complex Dublin Core terms, Coverage can describe the 3D (x,y,z) coordinates, or the time period, or the nation referenced by the document being described. It could refer to all of these. Note that this is not the time the document was written, or where it was written. Coverage refers specifically to the content of the document itself.
So, if we tell a smart browser of the future to find all documents that pertain to the year 1865, it will not return documents that were written in 1865, but are about the year 1012.
One drawback of the Dublin Core is that it is very loosely defined. So, it often fails in its true purpose: to provide precisely-defined terms that all of us can use, and where we can be confident they will be uniformly interpreted.
A More Sophisticated Standard: MODS.
A newer proposed standard, called the Meta Object Description Schema, or MODS, is an XML language that has been very actively promoted as a successor to the Dublin Core. MODS has more terms, and more precisely-defined terms. Since it leverages the ability of XML to express nested or embedded structures, it can convey much more information than a list of Dublin Core terms can convey.
Here’s a little piece of MODS:
This only gives a hint of the rich metadata that can be specified by using MODS. (The MODS website provides some far more detailed examples.)
Still, compare this to the Dublin Core Contributor term, which might have the value “Bugs King”. Is this a human name? Is it a pest control company?
But – even though it seems like an odd name, in the MODS example, we know that this is a person who goes by the name Bugs King.
Dublin Core might die and blow away – but it will always be recognized as a pivotal point in the development of the Semantic Web.