Posted by: Roger King
data, information, knowledge, ontologies, RDF, the Semantic Web, triples
This posting is a continuation of the previous posting. We are discussing RDF, the “triples” language that is serving as a cornerstone of the Semantic Web effort. The goal of the Semantic Web is to partly automate the searching of the Web, by using RDF to capture deeper semantics of information and SPARQL to query that information. This is in comparison to today’s search engine technology, which does not allow us to do much more than search for individual words in the text of webpages.
Let’s step back for a moment.
Just how universal is this notion of RDF-style triples? Will we ever have something substantially more useful, more powerful in the semantics it can express?
Data, Information, Knowledge, and Ontologies.
Academic and industrial researchers in computing like to trivialize big words. Let’s briefly look at the problem. “Data” is an old word, and most of us have a sense that virtually anything stored digitally can be considered data. This includes applications and other pieces of software, too. If you back up some applications to free up space on your hard drive, you’ve just turned applications into data, right?
“Information” is a word that came into play when researchers wanted something that was smarter than data. The word was broader, and vaguer, but information was essentially data that was ready to be used by interactive users. If I pull down a page from the Encyclopedia Britannica site, it’s filled with information.
Then, there were demands for an even richer word, one that suggests data that is beyond information, stuff that is rich in semantics that can be easily extracted. Often, knowledge was data or information that had been interconnected, turned into trees or graphs. Traversing the links in the structure told us how various things were interrelated and thereby exposing powerful semantics. The Web in a sense is knowledge. I can follow links between pages to discover how various pages on the Web are interrelated. I can follow connections on the Britannica site to connect a scientific discovery to the story of the discoverer’s life.
Here’s something significant. This blog and all its postings are related to new web technology, such as the Semantic Web. Our central concern has been the partial automation of the searching of the Web, so that users aren’t limited to typing words into Google and getting back stuff no richer than pages that happen to have these words in them. As it turns out, the term “knowledge” dates way back before the days of the Web, but back then, our notion of what it meant to be knowledge and not just data or information was pretty much the same as it is now. Knowledge can be processed by programs, thereby automating the task of finding the right knowledge and applying it to our problem domain.
Then came “ontology”. This is a relatively new word, but it’s perhaps the most embarrassing. The word, until recently, was reserved for philosophers to use. An ontological argument is an argument about the existence of something. Over the centuries, one common subject of ontological discussions has been the existence of God.
The same old, same old.
Flash forward to the Internet age: Computer researchers use the term to refer to a precise specification of the objects and properties (of these objects) in some well studied domain. I guess the idea is to suggest that we can capture the true nature of the existence of some domain.
These domains could be large, like banking, health insurance, or the stock market. Laying out all of the objects involved in one of these is a daunting task. Consider an insurance claim and all of its properties: type of claim, provider of medical service, patient name, etc., and then imagine laying this all out for insurance policies, underwriting tables, actuarial data, etc. To include all of the objects and properties involved in building software for an insurance company would lead us to thousands of interconnected terms. Triples, in other words.
Or our ontology could be the specification of a pencil object, which has properties like being made of wood and graphite and metal, of having yellow paint and a little pink eraser. Triples like this:
The pencil has a pink eraser.
The pencil is painted yellow.
This characterizes the nature of the challenge we have taken on in our efforts to build ontologies. We take on the problems of scale, not the problems involved in really capturing, in some formal fashion, the nature of the world around us. We build gigantic, but very simple, models of the things that concern us in the software world.
We have trivialized this term, ontology. In fact, for the most part, we’re simply referring to the same old, same old modeling construct: triples. Yes, that simple tool called RDF can be used to build a vast “ontology”.
There is something about the nature of triples that has conquered computing. It is a concept that, as we have seen in previous postings of this blog, underlies object-oriented data structures. It predates object-oriented languages, going back to the early days of AI and the attempts to model the real world.
So, what is an ontology?
An ontology is supposed to be the end of the Semantic Web rainbow: our ability to fully automate the specification and searching of the real world. But the next time some computer person tries to impress you by tossing this term at you, remember to just shake your head and say “Quit being a puff toad. You’re just talking about triples.”