Previous postings of this blog.
This blog is dedicated to advanced Web development tools and concepts. Previous blog postings have focused on the emerging Semantic Web, which promises to make the Web radically easier to search and to greatly enhance the value of the vast sea of currently-disconnected information spread across the Web. We have also looked at Web 3.0 efforts, which promise to make multimedia websites highly usable and capable of conveying far more information than the current generation of websites. Previous postings describe breadth and depth of cutting edge Web technology.
Metadata: making that ratio small.
Here’s something that’s very important: Much of the ongoing research and development that is loosely categorized as Semantic Web and Web 3.0 efforts is focused on a specific technical goal, one that has been at the core of information management technology since the mainframe era that was epitomized by the IBM 360 series. That goal is to leverage metadata as much as possible.
It’s our best weapon against the truly staggering amount of information on the Web. This includes traditional text-based and numeric data, as well as books, medical advice, photographs, entertainment and training videos, music and recorded books, investment information, educational materials, scientific materials, e-government information, etc., etc. How can we possibly organize information and then search it in a way that scales? The Web is far from a closed world. In traditional data processing environments like banking, insurance, and credit card processing, we could get our arms around all of the data, as vast as it may have seemed. But the world of information today is an open world, effectively infinite in size.
Very informally, if you look at the size of the metadata divided by the size of the data itself, the smaller that fraction the better. In traditional relational databases (built with database management systems, such as Oracle, MS SQL Server, MySQL, PostgreSQL, or DB2), the extreme focus on minimizing this ratio has enabled the fast processing of extremely large volumes of data. The tradeoff is that the table definitions (or the “schema”), which form the heart of the metadata are very, very simplistic.
The old days: relational database schemas.
An insurance claim may be defined as a table with such columns as Subscriber_Name, Medical_Provider, etc., and thus, may consist of little or no more than a series of simple character and numeric fields. But if we need to process fifty thousand of them tonight, we must be able to bring many such table rows into memory at once, and quickly move through them. The database world was an extension of the paper world: a row in an insurance claim table was effectively an electronic successor to the traditional claim form.
Today: a far more challenging problem.
But on the new Web, information can be far more complex in nature, making the metadata to data ratio far larger. We’ve looked at some of the emerging technology and technical trends for embedding metadata in advanced forms of data (and for processing that metadata); this data includes books, images, video, modeling and animation, and sound. This new generation of information formats make up our personal health records and medical records images, industrial training materials, university “distance” courses, and the like. Each instance of these tends to be far more unique than individual insurance claim forms. And, it takes a lot of metadata to properly convey their “meaning”.
What we’re struggling with right now is to succinctly specify the meaning of modern media assets and to automate searching based on this metadata. This is our only hope for leveraging that ratio of metadata size divided by data size.