Buzz’s Blog: On Web 3.0 and the Semantic Web

Aug 16 2009   4:16AM GMT

Dangers of the Semantic Web: Assertions, Inferences, and Surrogates

Roger King Roger King Profile: Roger King

This blog deals with advanced Web technology. Each posting should be quite understandable on its own, but the blog as a whole is a continuing story. We’ve been looking at the Semantic Web, which is a global effort to automate the searching of the Web, so that applications (we might call them smart search engines) can find, interpret, interrelate, and aggregate information stored in multiple, independent websites.

Assertions and Inferences.

A key concept is that of an “inference”, a fact that is created by putting together two or more pieces of information that we might call “assertions”. We used the following example in the example in a previous posting. The two assertions might be posted on the Web somewhere.

Assertion 1: THE BALL is ORANGE.
Assertion 2: ORANGE is an UGLY COLOR.
An inference created by putting the two assertions together: THE BALL is an UGLY COLOR.

We have also discussed the fact that terminology used in inferences must be very carefully defined and widely shared.

What is a Surrogate?

The word surrogate, in the programming world, refers to a measure or model that is being used to approximate the “real” measure or model. If I am trying to estimate the depth of the ocean at some point, but don’t have a direct way of measuring the distance to the ocean floor, I might judge the depth by using a table that associates the distance from the shore to the depth of the ocean. The assumption is that all points that are a particular distance from the shore will have the same depth more or less.

Here’s the important point for us: The Semantic Web will make very heavy use of surrogates. Let’s be precise about this. We’re not talking about approximations. We might search the Web for all banks that provide accounts that earn 5%, and our smart search engine might point us to banks that on the average, over the past two years, have paid at least 5.0% on their accounts. A surrogate is something different. Suppose we wanted to find all banks that never cheated their customers. This might be impossible to answer precisely, so we might look for banks that are in the bottom 10% when it comes to the number of formal complaints filed against them. That would be a surrogate.

Surrogates on the New Web.

Now, let’s consider the Web. It doesn’t matter if we are talking about the Web today or the emerging Semantic Web.

In fact, what we are concerned with here is global to computing in general: when we take a chore normally performed by a human using an interactive interface and turn that chore over to a computer program, we often turn a real world decision into a decision based on very simplified surrogates. A human can look at a bunch of information and, although it may take a very, very long time, make a “perfect” decision based on that data. But computer programs cannot think like a human. We can only crudely simulate with software the process of thinking that goes on in the mind of a real person.

Now, back to the Web, the new Semantic Web. Suppose we build a next generation website and use an official namespace (which is a structured set of terms) to specify assertions using terms from this namespace. What we’re doing is providing a surrogate for the smart search engine to use so that it can do the filtering of URLs and the integrating of information from multiple sites.

Consider our two assertions from above, along with the inference derived from them:

Assertion 1: THE BALL is ORANGE.
Assertion 2: ORANGE is an UGLY COLOR.
An inference created by putting the two assertions together: THE BALL is an UGLY COLOR.

Maybe we are shopping for a ball online. We mght have to follow hundreds of URLs and search hundreds of websites to find just the right ball. But who said the ball is orange? It’s an approximation made by the vendor of the ball in question. It has been labeled orange. But maybe it’s a shade of orange that we would actually have liked if we had looked at the picture of the ball ourselves instead of leaving it to the search engine.

Well, we might argue that the word orange, if it is precisely defined, won’t be confused with some other color. We can be confident that our notion of orange is the same as the vendor’s notion of orange. We do know how to express colors very precisely by using numbers.

So, let’s change the assertions and the inference a bit:

Assertion 2: WE want a PRETTY DOLL.
An inference created by putting the two assertions together: WE might want DOROTHY THE DOLL.

Now, how could the notion of pretty ever be globally and uniformly defined?

It cannot.

Maybe we should shop for our own dolls and not leave it to a next generation search engine.

The Lesson.

The Semantic Web will trade speed for accuracy. No way around it.

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: