Information archives - Buzz’s Blog: On Web 3.0 and the Semantic Web

Buzz’s Blog: On Web 3.0 and the Semantic Web:

information

Oct 18 2009   10:37PM GMT

Personal Information Management Applications and Web 3.0



Posted by: Roger “Buzz” King
advanced Web apps, databases, information, media applications, Multimedia, note-taking, notebooks, rich internet apps, tagging, Web 2.0, Web 3.0, web applications

This blog is devoted to the discussion of Semantic Web and Web 2.0/3.0 technology.

Managing personal and small group information.

When it comes to so-called Web 2.0 and 3.0 technology, one of the most proliferate marketplaces involves the explosion of applications for managing information for individuals and small groups. Looking only at applications developed for Macs, we see an array of information management technologies.

Notebooks.

One of the most popular formats for managing information uses the paradigm of a notebook. The user can create a notebook, often selecting from multiple canned formats, such as a diary, class notes, or a novel, complete perhaps with a notebook cover and a spiral wire down the left side. The application creates a table of contents, and users can create sections and pages - and stuff virtually any kind of information on each page. Two very good examples of this approach are NoteShare and Notebook.

Interestingly, and perhaps because many of the applications in this category have been around for a number of years, these tend to not be true web applications. Often you can share notebooks, including full read/write access, via a URL and a simple browser interface, and you can publish a notebook at a URL. But the products are primarily for single-user, desktop use.

A good example of a notebook application that is a true web application is Zoho Notebook. (Zoho actually provides a large set of web based applications, of which the note program is just one.)

Buckets.

The other very popular note format uses the bucket or folder approach. The application may or may not support the nesting of these buckets and/or the creation of conceptual buckets, so that a given note can exist in more than one bucket. Two very good applications that use this approach are SOHO Notes and Yojimbo. These two applications are desktop-based, although most applications in this category support the synching of notes over multiple machines, using the Apple web-synching technology.

A hybrid desktop/web application is Evernote, which has elegant desktop applications for Windows machines, Macs, and a variety of handhelds and cell phones. It also has a very effective web interface. The user can sync multiple Evernote desktop instances via Evernote’s web server. Users can thus avoid ever using the web interface.

Outlines.

One specialized sort of information management application involves the creation of embedded outlines and bulleted lists. These applications, such as OmniOutliner, actually provide a full notebook functionality as well. OmniOutliner notebooks can be published on the web, but it is very definitely a desktop application.

Task lists.

An even more specialized class of information management applications support To-Do lists. Great examples are Zenbe Lists (they also provide integrated email and collaborative software) and rememberthemilk.com. These are web applications.

Photos and video.

There are a rapidly growing number of applications that allow users to collect, sort, tag, edit, and share photographs and video. Apple’s iPhoto is a great example. It is very much a desktop app, although applications in this class typically support the publication of images and video on the web, and sometimes, even read/write access via the web.

Stories, scripts, novels, and storyboards.

There are a number of highly specialized applications that support the development of fiction, including Final Draft and Montage (scripts), Scrivener and StoryMill (fiction prose), and Toon Boom storyboard (which is actually an impressive drawing program). Again, users can often publish to the web. Interestingly, many of these applications can easily be used as full blown, generic note applications, and can manage many forms of media.

Diary Applications.

Perhaps the most popular diary application on Macs is MacJournal (by the Montage and StoryMill folks). An interesting twist is that it is also an excellent blogging program. I use it to write this blog. This is, of course, one of the most widely used vehicles for sharing information on the web, and you can expect other sorts of personal information management systems to have blogging capabilities added to them.

Small, forms-based database management systems.

These applications are desktop apps. Apple’s Bento is a very good example. It actually is a sort of hybrid database/spreadsheet application. The most recent release allows multiple instances of Bento to share databases running on computers on a shared network.

Mind-Mapping.

The “circles and lines” applications have become highly specialized. The most well known one is MindManager, and there are versions for Windows machines and Macs. These are desktop apps. The vender, MindJet, recently introduced both web interfaces for sharing and updating desktop mind maps, as well as a web-based application that has a fresh, smooth interface, and provides team collaboration tools. Many forms of media can be placed in MindManager, including data from a wide variety of relational database management systems.

Screen and audio capture.

There are a number of applications that allow users to capture desktop video, along with audio voice-overs. Camtasia (which has Windows and Mac products) and Screenium are popular products.

These applications are, in a way, successors to slide applications like Microsoft Powerpoint and Apple Keynote. More and more presentations are being engineered with screen capture and audio applications, and these applications often support text and image data, as well as the insertion of video capture of the speaker. Sometimes, Powerpoint slides can be imported.

Conferencing apps.

There are several applications that provide hybrid desktop/browser live communication, including video, sound, and collaborative white-boarding. The best known one is probably Cisco WebEx, which comes in varieties for Macs and Windows machines. Skype supports a similar, limited product - which is free. One of the nice things about these products is that they come with their own voice lines. Other products, like Adobe ConnectNow, require the use of a cell phone to carry voice. With most of these products, a conference can be recorded for later use.

Finally…

Importantly, we note that in this rapidly-exploding marketplace, the borders between these various categories are being broken down, and applications often support a number of these capabilities at once. A good example is Curio, a desktop application that supports notes, lists, video, audio, white-boarding, mind-mapping, and limited web publishing.

Oct 11 2009   11:07PM GMT

Making information management scale: leveraging metadata on the new Web



Posted by: Roger “Buzz” King
3D modeling, automating Web searches, databases, DB2, information, Multimedia, MySQL, Oracle, PostgreSQL, RDF, Semantic Web, Video, Web 3.0, Web development frameworks, Web3.0

Previous postings of this blog.

This blog is dedicated to advanced Web development tools and concepts. Previous blog postings have focused on the emerging Semantic Web, which promises to make the Web radically easier to search and to greatly enhance the value of the vast sea of currently-disconnected information spread across the Web. We have also looked at Web 3.0 efforts, which promise to make multimedia websites highly usable and capable of conveying far more information than the current generation of websites. Previous postings describe breadth and depth of cutting edge Web technology.

Metadata: making that ratio small.

Here’s something that’s very important: Much of the ongoing research and development that is loosely categorized as Semantic Web and Web 3.0 efforts is focused on a specific technical goal, one that has been at the core of information management technology since the mainframe era that was epitomized by the IBM 360 series. That goal is to leverage metadata as much as possible.

It’s our best weapon against the truly staggering amount of information on the Web. This includes traditional text-based and numeric data, as well as books, medical advice, photographs, entertainment and training videos, music and recorded books, investment information, educational materials, scientific materials, e-government information, etc., etc. How can we possibly organize information and then search it in a way that scales? The Web is far from a closed world. In traditional data processing environments like banking, insurance, and credit card processing, we could get our arms around all of the data, as vast as it may have seemed. But the world of information today is an open world, effectively infinite in size.

Very informally, if you look at the size of the metadata divided by the size of the data itself, the smaller that fraction the better. In traditional relational databases (built with database management systems, such as Oracle, MS SQL Server, MySQL, PostgreSQL, or DB2), the extreme focus on minimizing this ratio has enabled the fast processing of extremely large volumes of data. The tradeoff is that the table definitions (or the “schema”), which form the heart of the metadata are very, very simplistic.

The old days: relational database schemas.

An insurance claim may be defined as a table with such columns as Subscriber_Name, Medical_Provider, etc., and thus, may consist of little or no more than a series of simple character and numeric fields. But if we need to process fifty thousand of them tonight, we must be able to bring many such table rows into memory at once, and quickly move through them. The database world was an extension of the paper world: a row in an insurance claim table was effectively an electronic successor to the traditional claim form.

Today: a far more challenging problem.

But on the new Web, information can be far more complex in nature, making the metadata to data ratio far larger. We’ve looked at some of the emerging technology and technical trends for embedding metadata in advanced forms of data (and for processing that metadata); this data includes books, images, video, modeling and animation, and sound. This new generation of information formats make up our personal health records and medical records images, industrial training materials, university “distance” courses, and the like. Each instance of these tends to be far more unique than individual insurance claim forms. And, it takes a lot of metadata to properly convey their “meaning”.

The challenge.

What we’re struggling with right now is to succinctly specify the meaning of modern media assets and to automate searching based on this metadata. This is our only hope for leveraging that ratio of metadata size divided by data size.


Oct 3 2009   9:12PM GMT

Multimedia: The Problem of Subtle Semantics



Posted by: Roger “Buzz” King
3D animation, 3D modeling, advanced Web apps, automating Web searches, blob data, continuous data, databases, information, Multimedia, rich internet apps, Semantic Web, smart search engines, tagging, Text, Web 2.0, Web 3.0, web applications, Web development, Web development frameworks, XML

The challenge of the Semantic Web.

We’ve looked at the emerging Semantic Web technology in the previous postings of this blog. The idea is to have a far, far smarter Web, one where the process of finding and interpreting and making use of far flung information can be largely automated. This is in sharp contrast with today’s Web, where these things have to be done in a painful, extremely time-consuming fashion.

So that is the key challenge. It has to do with searching the kinds of information that are important to us in our daily lives. This information, as it turns out, is very difficult to process automatically. Why is this?

The complexity of modern multimedia.

I teach a very basic 3D animation class to mostly computer science students. We use Maya, arguably the most popular 3D animation application, one that is used in the making of many animated features. The interesting thing about animation is that it is truly multimedia. It can give us a lot of insight into what we need the new Web to do for us.

That’s because the number and diversity of applications that are used for drawing, documenting, modeling, animating, motion capture, texturing, video rendering, video editing, video conversion and compression, sound editing, in even small projects, can be very impressive. Correspondingly, the wide variety and complexity of media formats involved in an animation project can be overwhelming.

What happens in an animation project? The workflow might begin with vector storyboard drawings to break the story down into scenes. In a typical animation project, 3D models in a variety of proprietary formats are used. Models must be transformed as they are exported from one application and imported into the next. Multiple video renders of animated models are made, and they must be edited together, along with multiple sound files. Multiple video and audio formats might be used. 2D images are used for textures; photographs of butterfly wings can be used to make an animated butterfly very realistic, and a checkerboard image made with Photoshop can be used to make a Linoleum floor. And along the way, a variety of note taking, screen capture, and conferencing software might be used to facilitate group communication.

There is also a heavy focus on reuse in an animation project. Building every model, editing every texture, creating every environment and background, recording every sound from scratch is frequently intractable. If existing assets cannot be tailored and reused, the project would be far too expensive and time consuming, and would demand too wide a variety of professionals to always be available. This raises the multimedia stakes, as assets of widely differing forms must be constantly reconfigured and used in concert in new ways.

But what’s the real problem? We aren’t all trying to produce complex animated videos. But very interestingly, in our everyday lives we essentially face the animator’s challenge when we try to find and use information on the Web. That’s because we’re often looking for things whose meaning, whose interpretation, demands focused human thought. We are looking not for business data, but for pieces of media, and the problem is that today, most of our searching has to be based on tags or brief textual descriptions that are associated with pieces of media, and not on the true meaning of the media itself.

The needs of the business world are not our needs.

It’s the subjective nature of media assets - this is what is at the heart of the problem facing us. Existing technology for searching the web is based on keywords and very short pieces of text.

There is other technology, though, under active development, stuff that serves as the information storage backbone of most commercial websites. It’s the technology that has for decades been used in-house (not on the Web) by businesses when they process large databases. But this stuff was designed to handle traditional business data forms, like integers, character strings, real numbers, dates, timestamps, and full text.

There is more, though. All of the major database management systems, along with tools for building and searching advanced websites are being retrofitted (or in some cases, built from the ground up) to manage more than keywords and text, more than standard business data.

But up to now, the focus has not been on supporting the kinds of information you and I are most interested in. The focus has been on extending database and Web technology to support xml documents, as well as more complex data objects, like those inside a Java program, as well as other forms of data found inside programs. This includes arrays and lists and short pieces of textual data, like the names of diseases.

In other words, we’ve been busy extending our support of the business world, so they can store complex business data in databases and make that information processable over the Web. You and I have largely been left out.

Finally, we are attacking our needs.

But there now many ongoing efforts to extend database and Web technology to make it useful to us. The new focus is on supporting blob and continuous media like images, video, and audio. This is extremely hard to do.

Why? Because the strongest means by which we deduce the meeting of business data is by looking at its internal structure and the terms that are used to describe that structure. A relational table named Prescriptions, with a character attributes Patient Name, Doctor’s Name, and Medication, and with a numeric attribute Dosage, is pretty easy to interpret.

But what do we do with a photograph, which is just a grid of pixels with no internal structure? Or a long series of images, along with a sound track, put together to form a piece of video?

The U.S. military has been pumping money into image processing for several decades, and so all is not lost. There is a vast body of mathematical research and software development that allows us to write programs that can find a particular face in a crowd and search satellite photos for airplane runways. But in general, we cannot at this time write a program that can process an arbitrary photo or video clip and tell us what it means. That means we can’t quickly search vast media database for useful pieces of information.

The goal behind the Semantic Web effort is to build a new generation of websites whose information can be searched automatically, and where information from multiple sites can be automatically integrated. To do this with numeric and character based data is quite doable. But when it comes to multimedia, like images and sound and video and 3D models and engineering designs, well, we have a long way to go. The meaning - in other words, the semantics - of these forms of data are complex and subtle, and highly dependent upon an individual’s interpretation of that media.

So, we see that we have only just begun our journey to create the new Web.


Sep 25 2009   11:31PM GMT

Semantics and the new Web: Built out of very old ideas.



Posted by: Roger “Buzz” King
automating Web searches, inferences, information, knowledge, Semantic Web, Web development

Describing the real world in computers.

The word “semantic” has been a buzzword in computer science for decades. The youthful Artificial Intelligence world invented these things called Semantic Networks or Semantic Nets a half century ago. The idea was to come up with a crisp, formal language for representing real world things inside a computer. This took the form of a small set of constructs that would be general purpose, in that they could be applied to almost any sort of information. Further, these constructs would somehow be intuitive and natural, in that they would get to the heart of what it means to describe everything from horses to insurance claims to marriages to the contents of the Bill of Rights.

Basic, long-standing, core concepts.

What emerged has certainly stood the test of time. Big time. Opinions differ widely on just what constitutes the core constructs. Different people have used different names for these terms, and, although the idea was to specify something formal, the definitions of these constructs were generally sloppy. But here is a reasonable specification, in its most rudimentary form:

There are objects (which might also be called entities, things, or concepts). Objects have unique names.

Objects are interrelated by attributes (which might also be called relationships or properties). Attributes are directional, and they have names.

In other words, things in the world can be represented as a simple directed graph. We could say that there are objects called Chickens that have an attribute called Are. The value of this might be an object called Birds. Birds might have an attribute called Lives-In, which links Birds to the object Barnyard. There might be an object called Mr. Fried, which has an attribute called IS, which connects Mr. Fried to the object Chickens.

There are many popular various of this basic idea that have emerged, and they tend to be of the following nature:

One idea is to make a sharp distinction between the notion of a subtype (or sub-kind or subset) and other attributes. So, our attribute Are might become a core concept itself, and we might name it Is-A. Chickens IS-A Birds, People IS-A Biped, etc. Other attributes like Lives-In would be considered inherently different from Is-A.

We could introduce another generalization. A general term for attributes Lives-In and other similar attributes might be Has-A. In fact, we could stop using special words for attributes in general, and just use the terms Is-A and Has-A. We would then say that Marriages Has-A Wife, as well as a Husband, as well as a Date.

These general ideas are actually old, and actually significantly predate computing. We have been struggling with the problem of describing real world objects (like Cows), real world concepts (like Marriages and Respects), and their interrelationships and categories since the emergence of the earliest philosophers. Aristotle distinguished between objects and their attributes, and carefully studied and described many animals and plants.

What does it all mean for the new Web?

So, what does all this mean to us, today, and what does it have to do with modern Web technology? Well, first of all, these concepts of objects and attributes have spread throughout all of computer science.

There have been some significant extensions, like distinguishing between an attribute that we might call a relationship, which interconnects complex objects or notions (like a driver owning a car) and attributes that interconnect complex objects and notions with atomic or simple things (like a car having a color or a driver having a name). Generally, these latter, simple kinds of attributes are now what we call attributes, and are considered inherently different from (and simpler than) relationships.

Another extension that has become a core concept in programming languages is something we might call an object identifier, which is a unique number or other identifier for individual objects; this allows us to carefully distinguish between two people who have the same mother, and two people who have mothers who just happen to have the same name.

Programing languages also introduced the concept of methods, or little programs that can give life to objects. You might be able to tell a marriage object to tell us the names of the husband and wife.

But basic concepts have not changed. There seems to be something natural and fundamental about them.

Building a new world out of old concepts.

And the Web? A revolution is happening today. We are developing languages that allow Web designers to embed machine-readable specifications in Web-resident information. This will largely automate the process of searching the Web, as well as the integration of information at multiple sites. This will in turn lead to the discovery of knowledge by putting together diverse information from across the Web. We have discussed these emerging technologies in the previous postings of this blog; they are heavily and deliberately built on top of ideas that date back to the 1950’s, and in fact can trace their roots to ancient Greece.


Sep 9 2009   6:02PM GMT

Real-World Look at the Semantic Web, part 2



Posted by: Roger “Buzz” King
assertions, inferences, information, namespaces, RDF, SPARQL, triples, URI's, wikis

This blog is dedicated to the study of emerging Web technology, in particular, ongoing research and development aimed at building software tools that will underlie the emerging Semantic Web. Last time, we looked at DBpedia, something that a former graduate student at my university, Greg Ziebold, pointed me toward.

The Semantic MediaWiki.

In this posting, we look at the Semantic MediaWiki, something else that Greg told me about. It is an extension of MediaWiki, the application that the Wikipedia is built out of. You can learn all about it at the Semantic MediaWiki website. The idea behind Semantic MediaWiki is to provide a more powerful wiki tool, namely one that supports more than just human-readable things like text and images.

RDF and namespaces: creating machine-readable, web-based information.

The idea is to allow entries in wikis that contain machine-readable information, so that searching can be performed in a largely automatic fashion. Specifically, the Semantic MediaWiki allows users to export information from a wiki in RDF format. An RDF specification consists of “triples” that form “assertions”. Consider the following

Assertion 1: Joe is tall.
Assertion 2: Tall People should try out for Basketball.

The idea is for terms in triples (“Joe”, “tall”, “is”, “Tall People”, etc.) to be taken from predefined and globally accessible namespaces. This would ensure that everyone who uses a given term (like “tall” or “Should try out for”) will have the same meaning in mind. In this way, rather than having to painfully search for information that pertains to Tall People, for example, a smart search engine could do the searching for us.

Building locally, growing globally.

There is more to this. These namespaces can be available on the Web, and RDF statements can point to the relevant namespaces. This means that software searching the Web, and processing these triples, can easily find the relevant namespaces.

Also, the things in the right and left side of a triple (like “Joe” and “tall”) can themselves be Web-based resources. This means that information scattered around the Web can be interconnected - but all the work can be done locally. No one has to manually integrate millions of websites. The job can be done little by little, in a quiet way, as people start to store their information in an RDF compatible fashion.

This is how the Semantic Web will scale. Everyone will use shared namespaces and shared protocols like RDF. This will, in essence, turn the Web into one big website that can be searched in a partly automatic fashion.

SPARQL: querying RDF-based information.

How will we interrelate data scattered around the Web?

There is a query language out there, called SPARQL, that can be used to search the Web. SPARQL can follow RDF connections around the globe. How is this done? It has to do with being able to “infer” new things. Consider a fact that can be automatically deduced from the two assertions above:

A new inference: Joe should try out for Basketball.

Assertion 1 could be on a server in Detroit, and assertion 2 could be on a server in Miami, and SPARQL could do the job of making the leap that leads to the new inference.

This means that we could figure out what Joe should be doing right now without having to find the two pieces of information manually (the fact that he is tall, and that tall people should play basketball), and without having to make the inference ourselves.

This is a big deal. This sort of automation is what the Semantic Web is all about.

So what do real people do with the Semantic MediaWiki? We’ll look at this next.


Aug 31 2009   3:40AM GMT

A Real-World Look at the Semantic Web, part 1



Posted by: Roger “Buzz” King
assertions, databases, inferences, information, knowledge, namespaces, RDF, Semantic Web, SPARQL, triples, wikis, ontologies

This blog is dedicated to the study of emerging Web technology, in particular, ongoing research and development aimed at building software tools that will underlie the emerging Semantic Web. In this posting, we look at a little-known website that has the potential of setting the pace for the developers of the Semantic Web.

DBpedia.

It’s called DBpedia. A former graduate student at my university, Greg Ziebold, pointed me toward it. The goal of the DBpedia is to transform data from the Wikipedia into a chunk of the Semantic Web. To do this, DBpedia is using RDF technology, something we have discussed is past postings of this blog. Behind RDF is an extremely simple concept, but one that has proven extremely powerful and versatile.

The general idea is to break knowledge up into “triples” that describe relationships between pieces of information. These triples can be chained together to discover new relationships. And, importantly, triples must make use of widely shared sets of terminology, called namespaces, in order for knowledge from different places on the Web to be properly chained together.

RDF, triples, assertions, and inferences.

A thorough example can be found in a previous posting of this blog.

Here is a very simple example of triples (also known as “assertions”) and how they can be put together into “inferences”.

Assertion 1: Joe is tall.
Assertion 2: Tall People should try out for Basketball.
A new inference: Joe should try out for Basketball.

Keep in mind that we would want to make sure that the words used in these assertions have precise, global meanings. We might take the terms in these two assertions from a basketball namespace, one that would carefully dictate exactly what “tall” means in the basketball world. Certainly, it would be quite different from the meaning of “tall” in a kindergarten namespace.

More on DBpedia.

There’s a fancy word for sets of triples that use namespaces and represent various areas of knowledge. They are called “ontologies”, taken from the term used by philosophers to argue about the existence of various things, like God. The DBpedia is essentially a vast ontology, formed from triples and namespaces. Most of the knowledge defined by this ontology comes from the Wikipedia. The folks behind the DBpedia have been given direct access to the flow of information into the Wikipedia, so that the DBpedia can stay current.

One way to look at the DBpedia is that it takes the Wikipedia and reforms it into something that can be searched far more effectively. Right now, to search the Wikipedia, most of us simply type in terms (either into Google/Yahoo or into the Wikipedia search page). We try various terms and follow links inside the Wikipedia until we find what we think we are looking for. With the DBpedia, users can search with SPARQL, a language based on the structure of SQL and engineered specifically for searching large bases of triples. SPARQL allows us to traverse networks that consists of triples linked by inferences.

That way, if we were a coach looking for promising candidates for our team, we would use SPARQL to make the connection between Joe being tall and the fact that tall people should try out for basketball. This is clearly much faster and more accurate than googling things like “tall”, “basketball”, etc, until we happened to find Joe in one of the web pages that pop up.

The DBpedia website, by the way, claims to have a triple base that consists of 274 million RDF triples.

More on this in the next posting.


Aug 5 2009   9:39PM GMT

The Semantic Web: RDF and SPARQL, part 5



Posted by: Roger “Buzz” King
the Semantic Web, ontologies, RDF, triples, knowledge, information, data

This posting is a continuation of the previous posting. We are discussing RDF, the “triples” language that is serving as a cornerstone of the Semantic Web effort. The goal of the Semantic Web is to partly automate the searching of the Web, by using RDF to capture deeper semantics of information and SPARQL to query that information. This is in comparison to today’s search engine technology, which does not allow us to do much more than search for individual words in the text of webpages.

Let’s step back for a moment.

Just how universal is this notion of RDF-style triples? Will we ever have something substantially more useful, more powerful in the semantics it can express?

Data, Information, Knowledge, and Ontologies.

Academic and industrial researchers in computing like to trivialize big words. Let’s briefly look at the problem. “Data” is an old word, and most of us have a sense that virtually anything stored digitally can be considered data. This includes applications and other pieces of software, too. If you back up some applications to free up space on your hard drive, you’ve just turned applications into data, right?

“Information” is a word that came into play when researchers wanted something that was smarter than data. The word was broader, and vaguer, but information was essentially data that was ready to be used by interactive users. If I pull down a page from the Encyclopedia Britannica site, it’s filled with information.

Then, there were demands for an even richer word, one that suggests data that is beyond information, stuff that is rich in semantics that can be easily extracted. Often, knowledge was data or information that had been interconnected, turned into trees or graphs. Traversing the links in the structure told us how various things were interrelated and thereby exposing powerful semantics. The Web in a sense is knowledge. I can follow links between pages to discover how various pages on the Web are interrelated. I can follow connections on the Britannica site to connect a scientific discovery to the story of the discoverer’s life.

Here’s something significant. This blog and all its postings are related to new web technology, such as the Semantic Web. Our central concern has been the partial automation of the searching of the Web, so that users aren’t limited to typing words into Google and getting back stuff no richer than pages that happen to have these words in them. As it turns out, the term “knowledge” dates way back before the days of the Web, but back then, our notion of what it meant to be knowledge and not just data or information was pretty much the same as it is now. Knowledge can be processed by programs, thereby automating the task of finding the right knowledge and applying it to our problem domain.

Then came “ontology”. This is a relatively new word, but it’s perhaps the most embarrassing. The word, until recently, was reserved for philosophers to use. An ontological argument is an argument about the existence of something. Over the centuries, one common subject of ontological discussions has been the existence of God.

Hmm.

The same old, same old.

Flash forward to the Internet age: Computer researchers use the term to refer to a precise specification of the objects and properties (of these objects) in some well studied domain. I guess the idea is to suggest that we can capture the true nature of the existence of some domain.

These domains could be large, like banking, health insurance, or the stock market. Laying out all of the objects involved in one of these is a daunting task. Consider an insurance claim and all of its properties: type of claim, provider of medical service, patient name, etc., and then imagine laying this all out for insurance policies, underwriting tables, actuarial data, etc. To include all of the objects and properties involved in building software for an insurance company would lead us to thousands of interconnected terms. Triples, in other words.

Or our ontology could be the specification of a pencil object, which has properties like being made of wood and graphite and metal, of having yellow paint and a little pink eraser. Triples like this:

The pencil has a pink eraser.
The pencil is painted yellow.

This characterizes the nature of the challenge we have taken on in our efforts to build ontologies. We take on the problems of scale, not the problems involved in really capturing, in some formal fashion, the nature of the world around us. We build gigantic, but very simple, models of the things that concern us in the software world.

We have trivialized this term, ontology. In fact, for the most part, we’re simply referring to the same old, same old modeling construct: triples. Yes, that simple tool called RDF can be used to build a vast “ontology”.

There is something about the nature of triples that has conquered computing. It is a concept that, as we have seen in previous postings of this blog, underlies object-oriented data structures. It predates object-oriented languages, going back to the early days of AI and the attempts to model the real world.

So, what is an ontology?

An ontology is supposed to be the end of the Semantic Web rainbow: our ability to fully automate the specification and searching of the real world. But the next time some computer person tries to impress you by tossing this term at you, remember to just shake your head and say “Quit being a puff toad. You’re just talking about triples.”