This is the third in a continuing series of blogs about the Semantic Web and Web 2.0/3.0. Our focus here is on the Semantic Web.
Let’s look carefully at that word. What do we mean by “semantic”?
Even though it is very far from completely existing, the Semantic Web effort is a number of years old now. But the heavy use of this word in computing is far older, dating back to at least the late 70′s.
So, what do we mean when we use this word, in particular, with regard to the Semantic Web?
Like a human or “natural” language, a programming language has two key aspects: syntax and semantics. The syntax of a language refers to the structural rules that tell us what constitutes a legal program, just as the syntax of English tells us how to speak correctly. But syntax ignores the meaning of the program or English statement. The semantic rules of a language are what tells us the meaning.
Interestingly, a human statement can be syntactically correct, while its semantics might be ambiguous. If “Time flies”, does it mean that time goes by quickly, or that your buddy, Freddy Time, likes to fly his plane on weekends? But in general, a computer program must have only one set of semantics; otherwise, the computer doesn’t know what to do with it.
There is a broader – and far more ill-defined – use of the word “semantics” in computing. It’s used heavily, especially by researchers writing academic papers, as a sort of bragging term. We like to claim that our way of reprenting data captures more of the “semantics” of the data. In other words, the more expressive our way of representing data, the more semantics that can be deduced from its structure, and this is clearly a good thing.
Very important: when we look at the structure of the data, it includes all the terms used to describe the data. If I have a relational table called “Insurance Claims”, with a character attribute called “Subscriber Name”, and an integer attribute called “Amount Charged”, can a human with a modest knowledge of insurance deduce what it means?
Yes, in fact.
In the computing world, we are constantly creating new and more powerful ways of representing information in computers. Java and C# and C++ use object structures to represent data. MySQL and Oracle and Microsoft SQL Server use relational schemas to represent data; these consist of “relations” (also known as “tables”), along with “attributes” (also known as “columns”), along with other properties, like “primary keys”. With XML, we use things called “elements” and “attributes”, and other constructs, to model data.
It’s not really accurate for me to say “more powerful”; really, we just mean different. So, more precisely, our claim is that our way of reprenting data, given the sorts of data we are manipulating, makes it easier for us to deduce its meaning from its structure, i.e., its semantics from its syntax. XML documents are inherently very different from relational tables; they are used to model very different stuff. Neither is really more poweful than the other.
Note that we do not include the data itself when we talk about the ability of the syntax to imply the semantics of the data. The rows in a relational table are irrlevant when we are judging the power of the relational model to represent data. And often, we don’t include whatever code or logic is used to manipulate the data. When I described the relational table above, I didn’t say what SQL queries are used to manipulate the Insurance Claims table. But certainly, we could have, and it would have made perfect sense to consider this part of its structure. In fact, we include the methods of an object-oriented class in its structural definition, and of course, the syntax of Java specifies how to write legal methods. And so, the methods of a Java class are part of what we use to deduce the semantics of the data represented by that class.
So here’s one way to look at the Semantic Web: we try to use ways of structuring data that are so powerful, so rich in the way they can be used to imply the semantics of the data, that this interpretation can be done largely automatically. This would make the web far more powerful.
Let’s step back for a moment and consider the terms that are used to specify the name of a relational table (“Insurance Claims”), the names of the attributes (“Subscriber Name” and “Amount Charged”), and the names of the domains of those attributes (characters and integers). In the previous blog in this series, we looked at namespaces. We could consider these terms from our relational schema to form a namespace.
Importantly, namespaces are a major aspect of the Semantic Web, and are aimed at giving us web-wide standards for using terms as a way of describing part of the structure of data. In my relational database, I might use terms that tend to be common across all insurance companies, but are not necessarily common. And sometimes, the terms might have conflicting meanings from one insurance company to another.
But on the Semantic Web, we would specify a namespace and ask that all insurance companies use these same terms with the same meanings.
What about the rest of the definition of data on the Semantic Web? How do we put terms together in a way that is analogous to putting terms together to form a relational schema? One large research community thinks we should all use “triples”. Here’s one: <Tolstoy> <author> <War and Peace>. We’ve taken three terms and put them into a triple.
Here’s the exciting part: The left node could consist of a URL that points to a website dedicated to Tolstoy. The middle part could consist of a URL that contains a set of agreed-upon terms for describing books, in other words, a namespace. The right part could consist of a URL that has the text of War and Peace on it.
In other words, we can use namesspaces, combined with triples to glue together data on the world wide web. Then, we could imagine that a program could go out on this new “Semantic Web” and find the authors of a large set of books. One critical subtlety is that we would be guaranteed that “author” means the same thing in each case, because it has been take from a shared namespace that is used by any site that represents books and their authors on the web.
This is a key aspect of why the semantic web could be so powerful: shared namespaces guarantee common usage of terms, and triples can be used to glue information together into pieces that could be located automatically, i.e., without a human having to interactively verify and interpret every piece of data returned.