The Semantic Web. We introduced it in the entry before this one, which was the first entry in this blog.
This very aggressive goal, but if it were to ever exist, the Web would become something far more powerful than it is now. Today, we can only search the Web manually, interactively. We pull up Google, type in some keywords, and see what comes back. Then we begin to iterate. There is one obvious problem and two that might not be quite so obvious – and all three of them would be fixed if the Semantic Web really existed.
The key is that word “semantic”. It means that programs that search the Web, i.e., search engines of the future, would be able to search by the meaning or semantic content of the information we are looking for, and not simply by looking for keywords in the text of pages indexed by the search engine.
What are the three problems?
First, obviously, we would be able to perform a search with little or no iterating. This would radically reduce the need for a human to be in the loop, constantly guiding the search engine with more and more refined keyword searches. On the Semantic Web, a search engine would simply go out there and find whatever it is we need, and then deliver it up. If we search the Web by keywords and are looking for a treatment for tapeworms, we might not get the right results because we don’t know enough medical terminology to realize that we are looking for treatment for a disease caused by tapeworms, a disease called Taeniasis?
Second, not so obviously, the Semantic Web could come a lot closer to assuring us that our search was complete, in that the information returned was not only relevant, but that there wasn’t anything important that had been missed. If we are searching for treatments for tapeworms, and we find four possible treatments, how do we know there isn’t a fifth one out there that is more effective, quicker acting, and safer than the other four?
Third, and even more subtly, the Semantic Web would largely solve the huge problem of heterogeneity of data, of mixing information that isn’t truly comparable – essentially of mixing apples and oranges. Right now, when we search the Web interatively with Google, we might find one site that says that a “high end” notebook computer on one site would cost $3000, while we might find another site that says that a high end notebook computer costs $2300. Wouldn’t it be nice if the search engine could automatically ensure that we are comparing two computers that are truly similar in all significant ways?
So what’s behind the Semantic Web, what will power it if it ever emerges? A keystone technology will be that of “namespaces”. The idea is simple, and while it is only very much a partial solution, it is surprisingly powerful, given its simplicty. Essentially, a namespace is a collection of terms that multiple people agree to share, and furthermore, they agree on specific meanings for those terms. The Web, as it turns out, provides a powerful way of sharing namespaces: we can plant them on websites and anyone who wants to use those terms knows where to find them, along with their meanings.
One of the first namespaces to explode on the Web is called the Dubln Core. (Sorry, but the name refers not to the Dublin in Ireland, but to the Dublin in Ohio, where a group of people met to establish this namespace.) It is a collection of terms that can be used to describe resources that can be found on the Web, or in paper libraries, or in any other place where we store information. These terms include Contributor, Date, Publisher, Subject, and many more. And if you want to find the Dublin Core, it is publicly available at:
We’ll look a lot more at namespaces in future entries in this blog. We’ll also consider such technologies as XML – the standard for specifying namespaces.