Relational databases: the dominant technology.
Relational database management systems, such as MySQL, Oracle, MS SQL Server, DB2, and Postgresql, support the relational model. A database is broken up into tables, and each table consists of rows. Each row is a series of values. A row in a table called Insured Drivers in a motor vehicle database might consist of:
Fred, 2010 Toyota Prius, State Farm Insurance, 1112233444.
1112233444 might be a unique identifier that the government assigns to each driver. This would be the “primary key” for the table Insured Drivers. The point is that human names are not at all unique, and so in relational databases, we introduce artificial keys in order to disambiguate queries. We still need the value Fred in the row because we want to know how to address him with a letter or email.
Problems with relational databases.
There are a few critical points to note with this approach. First, such a simple way of representing data allows the database to quickly deliver large sets of rows from this table to the memory of a computer, so that they can be effectively searched in bulk. We might want to know the names of all people who drive a Toyota Prius and are insured by State Farm, for example.
Another thing is that we might like to be able to put more complex items in a row. We might want to have another value in a row, one that gives a driver’s address. But an address has a few parts to it, and is not itself a simple value like a name or a car model or the name of an insurance company.
It is important to also note, however, that relational databases do indeed support the creation of more complex values, such as an address. But the more complex values we put in rows in tables, the harder it is to read in a large number of rows at once.
In fact, we could create a value that represents a very complex object, one that refers to rows in other tables. For example, we might want to replace the value Fred with a reference to a row in another table called Licensed Drivers, because there is a lot we might want to know about Fred, other than just his name. But then it would become very difficult to read in lots of rows of a single table quickly.
It might be that if we follow a link to another table that describes drivers, these rows might themselves have links in them, thus allowing a value in a row to actually consist of an object, like we would in Java or C++. And in general, these links between tables could be chained together, and extend arbitrarily far. Do we chase all of these linked references down for every row of Insured Drivers, or do we not follow any of these links so we can read in a large number of rows? Then we would worry later about getting more information on each driver.
Importantly, relational databases are still very much the dominant database technology in use in businesses and other organizations, as well as on the Web. We need to keep in mind that we have already aggressively extended them by supporting values that have internal structure (like addresses) and with the ability to create complex objects (like drivers). How far do we go in extending them?
Where we stand today.
Indeed, the extensions we have already made to relational databases have created a serious optimization problem.
But it’s worse than that. Here’s something else to consider. Relational databases were born into a world where flat business data was pretty much the only game in town. However, relational databases are being asked to manage far more sophisticated forms of data, like photos and video clips and voice tracks. There are a couple of problems that crop up. First, a row with a video clip as a field could be huge. We might only be able to read in a single row at a time and this could make searching an entire table intractable. Worse, how do we even search for rows that contain certain pieces of video? How can we search for all video clips that show Fred getting into a car accident?
Where to go from here.
In previous postings of this blog we have looked at media databases, and in particular, at techniques that can be used to tag complex forms of blob and continuous media (like photos and video clips). What’s important to note, though, is that there is a major dilemma right now in the world of database software. Can we continue to shoehorn more and more complex forms of data into relational databases, or do we need to throw in the towel and start over?
More on this next time…