<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Buzz’s Blog: On Web 3.0 and the Semantic Web &#187; SQL</title>
	<atom:link href="http://itknowledgeexchange.techtarget.com/semantic-web/tag/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://itknowledgeexchange.techtarget.com/semantic-web</link>
	<description>Defining the necessary skills for future software professionals</description>
	<lastBuildDate>Sun, 16 Dec 2012 04:42:23 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>The Challenge of Complex Media in a Relational World, Part 2</title>
		<link>http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-2/</link>
		<comments>http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-2/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 19:13:52 +0000</pubDate>
		<dc:creator>Roger King</dc:creator>
				<category><![CDATA[continuous data]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[namespaces]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[tagging]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-2/</guid>
		<description><![CDATA[In the previous posting of this blog, we looked at SQL-based relational databases and why they are not well suited to managing advanced forms of media, like images, language, video, and sound. Searching by semantics. Here, we look closely at one specific issue related to managing complex media: How to categorize and search advanced forms [...]]]></description>
				<content:encoded><![CDATA[<p>In the <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-1/">previous posting</a> of this blog, we looked at SQL-based relational databases and why they are not well suited to managing advanced forms of media, like images, language, video, and sound.  </p>
<p><strong>Searching by semantics. </strong></p>
<p>Here, we look closely at one specific issue related to managing complex media: How to categorize and search advanced forms of media by their “meaning” or “semantics”.  This is extraordinarily difficult, and in fact, in general, it is impossible.  This is why we usually rely on relatively low-level heuristics and can only simulate search-by-semantics in simplistic ways.</p>
<p>Consider a library of soundless video clips.  Let’s assume there are many thousands of them, and they vary in length from seconds to hours.  First of all, the only clips we can afford to download and actually view in real time are the ones that are only seconds or minutes in length, and we can do this only if we are somehow able to limit the search space to a small handful of candidates.  Keep in mind that a video can consist of twenty to forty images per second.</p>
<p>So what do we do?</p>
<p><strong>Searching previews.</strong></p>
<p>We could search tiny samples of our video clips, perhaps taken from the beginning, the middle, and the end of each clip, but this doesn’t actually well, either.  We need something that can scale, that is automated.</p>
<p><strong>Searching tags.  </strong></p>
<p>The dominant technique is to extract information concerning low level attributes of the video clips (such as their format and pixel count) automatically, and then have experts add more tagging information by using  widely adopted, formal <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-1/">namespaces</a>.  We might use a geography namespace to mark clips as having rivers and mountains in them.  </p>
<p>These two forms of tagging information might be encoded together using the very popular <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-dublin-core-and-the-metadata-object-description-schema-a-look-at-namespaces/">MPEG-7</a> language.  This creates a very indirect way of searching video clips.  We don’t actually search them.  We search the hierarchically constructed MPEG-7 tag sets that describe the videos.  This at least allows us to use SQL in a reasonably straightforward way to do the searching. </p>
<p><strong>Searching for specific images.  </strong></p>
<p>There is very good technology for processing images for fixed pixel-based subcomponents like individual faces.  We can also search for video clips that have any faces at all in them.  </p>
<p>In general, it’s easier to search for things made by people because they tend to be more angular and regular in shape.  These include specific buildings and types of aircraft.  </p>
<p><strong>Searching for colors and shapes.</strong></p>
<p>We can also search for more abstract subcomponents of images, like polygons, circles, and the like.  Despite the fact that video images are pixel-based (or “raster”), there is good technology for isolating the lines that form the boundaries of subcomponents. </p>
<p>And we can look for colors and compare the relative location and dominance of various colors, like images where 63% of them are a particular shade of orange.  </p>
<p><strong>Searching for change over time.<br />
</strong><br />
We can also search for pattern changes in the series of images that make up a video clip.  </p>
<p>But none of this has much to do with the real meaning or semantics of images and the video clips they form.  Taking this next step is huge challenge.  </p>
<p><strong>Semantics.</strong></p>
<p>How can look for a setting sun or a ball moving across a tennis court, without knowing the details of the sunset or the particular tennis court in advance?  </p>
<p>We can use the colors and shapes approach to look for a big orange ball descending below a possibly-jagged horizontal line.  We could look for a small, white or yellow spherical object move across a big green rectangle.</p>
<p>One way to raise the bar a bit is to use domain-specific knowledge about the images being processed.  It’s a whole lot easier to spot that tennis court if we know that’s what we’re looking for.  Then we can fill our searching software with lots of detailed information about the various sorts of tennis courts.  We can also more easily isolate the tennis court in a larger image if we know it’s there somewhere.  This gives us an extra edge, so we can perhaps find the court, even if it turns out to be brown and not green, or if the surrounding terrain is almost the same color as the court.</p>
<p>We of course never get away from searching by heuristics that only simulate the process of determining the true meaning of a series of images.  We can never truly search by semantics.</p>
<p><strong>But we can do something else: we can get humans into the loop and train our software to do a better job.  We’ll look at this next.</strong></p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Challenge of Complex Media in a Relational World, Part 1</title>
		<link>http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-1/</link>
		<comments>http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-1/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 17:57:19 +0000</pubDate>
		<dc:creator>Roger King</dc:creator>
				<category><![CDATA[blob data]]></category>
		<category><![CDATA[continuous data]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[Multimedia]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[tagging]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-1/</guid>
		<description><![CDATA[Relational databases: the dominant technology. Relational database management systems, such as MySQL, Oracle, MS SQL Server, DB2, and Postgresql, support the relational model. A database is broken up into tables, and each table consists of rows. Each row is a series of values. A row in a table called Insured Drivers in a motor vehicle [...]]]></description>
				<content:encoded><![CDATA[<p><strong>Relational databases: the dominant technology.</strong></p>
<p>Relational database management systems, such as MySQL, Oracle, MS SQL Server, DB2, and Postgresql, support the relational model.  A database is broken up into tables, and each table consists of rows.  Each row is a series of values.  A row in a table called Insured Drivers in a motor vehicle database might consist of:</p>
<p>Fred, 2010 Toyota Prius, State Farm Insurance, 1112233444.</p>
<p>1112233444 might be a unique identifier that the government assigns to each driver.  This would be the “primary key” for the table Insured Drivers.  The point is that human names are not at all unique, and so in relational databases, we introduce artificial keys in order to disambiguate queries.  We still need the value Fred in the row because we want to know how to address him with a letter or email.</p>
<p><strong>Problems with relational databases.</strong></p>
<p>There are a few critical points to note with this approach.  First, such a simple way of representing data allows the database to quickly deliver large sets of rows from this table to the memory of a computer, so that they can be effectively searched in bulk.  We might want to know the names of all people who drive a Toyota Prius and are insured by State Farm, for example.</p>
<p>Another thing is that we might like to be able to put more complex items in a row.  We might want to have another value in a row, one that gives a driver’s address.  But an address has a few parts to it, and is not itself a simple value like a name or a car model or the name of an insurance company.  </p>
<p>It is important to also note, however, that relational databases do indeed support the creation of more complex values, such as an address.  But the more complex values we put in rows in tables, the harder it is to read in a large number of rows at once.  </p>
<p>In fact, we could create a value that represents a very complex object, one that refers to rows in other tables.  For example, we might want to replace the value Fred with a reference to a row in another table called Licensed Drivers, because there is a lot we might want to know about Fred, other than just his name.  But then it would become very difficult to read in lots of rows of a single table quickly.  </p>
<p>It might be that if we follow a link to another table that describes drivers, these rows might themselves have links in them, thus allowing a value in a row to actually consist of an object, like we would in Java or C++.  And in general, these links between tables could be chained together, and extend arbitrarily far.  Do we chase all of these linked references down for every row of Insured Drivers, or do we not follow any of these links so we can read in a large number of rows?  Then we would worry later about getting more information on each driver.</p>
<p>Importantly, relational databases are still very much the dominant database technology in use in businesses and other organizations, as well as on the Web.  We need to keep in mind that we have already aggressively extended them by supporting values that have internal structure (like addresses) and with the ability to create complex objects (like drivers).  How far do we go in extending them?  </p>
<p><strong>Where we stand today.</strong></p>
<p>Indeed, the extensions we have already made to relational databases have created a serious optimization problem.</p>
<p>But it’s worse than that.  Here’s something else to consider.  Relational databases were born into a world where flat business data was pretty much the only game in town.  However, relational databases are being asked to manage far more sophisticated forms of data, like photos and video clips and voice tracks.  There are a couple of problems that crop up.  First, a row with a video clip as a field could be huge.  We might only be able to read in a single row at a time and this could make searching an entire table intractable.  Worse, how do we even search for rows that contain certain pieces of video?  How can we search for all video clips that show Fred getting into a car accident?</p>
<p><strong>Where to go from here.</strong></p>
<p>In previous postings of this blog we have looked at <a href="http://itknowledgeexchange.techtarget.com/semantic-web/multimedia-what-is-it-why-do-we-care/">media databases</a>, and in particular, at techniques that can be used to <a href="http://itknowledgeexchange.techtarget.com/semantic-web/tag/dublin-core/">tag</a> complex forms of blob and continuous media (like photos and video clips).  What’s important to note, though, is that there is a major dilemma right now in the world of database software.  Can we continue to shoehorn more and more complex forms of data into relational databases, or do we need to throw in the towel and start over?</p>
<p><strong>More on this next time&#8230;</strong></p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/semantic-web/the-challenge-of-complex-media-in-a-relational-world-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Parallel Worlds of Media Databases and Media Metadata</title>
		<link>http://itknowledgeexchange.techtarget.com/semantic-web/the-parallel-worlds-of-media-databases-and-media-metadata/</link>
		<comments>http://itknowledgeexchange.techtarget.com/semantic-web/the-parallel-worlds-of-media-databases-and-media-metadata/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 00:38:03 +0000</pubDate>
		<dc:creator>Roger King</dc:creator>
				<category><![CDATA[3D modeling]]></category>
		<category><![CDATA[blob data]]></category>
		<category><![CDATA[continuous data]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[MODS]]></category>
		<category><![CDATA[Multimedia]]></category>
		<category><![CDATA[namespaces]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[tagging]]></category>
		<category><![CDATA[Text]]></category>
		<category><![CDATA[the Metadata Object Description Schema]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/semantic-web/the-parallel-worlds-of-media-databases-and-media-metadata/</guid>
		<description><![CDATA[Searching traditional business data: straight-forward. Managing advanced forms of media, such as images, sound, video, natural language text, and animated models have been discussed a number of times in this blog in the past.  Traditional information systems, such as relational databases, have been engineered largely to handle the sorts of data we have in business applications, [...]]]></description>
				<content:encoded><![CDATA[<p><strong>Searching traditional business data: straight-forward.</strong></p>
<p>Managing advanced forms of <a href="http://itknowledgeexchange.techtarget.com/semantic-web/mega-media-apps-a-huge-challenge-for-web-30/">media</a>, such as images, sound, video, natural language text, and animated models have been discussed a number of times in this blog in the past.  Traditional information systems, such as relational databases, have been engineered largely to handle the sorts of data we have in business applications, primarily simple numeric and character string data.  To the SQL database programmer, the nice part is that the data speaks for itself.  If a field is called Name, and the value is Buzz King, the semantics of &#8220;Buzz King&#8221; is pretty obvious, and it can be processed in a largely automatic fashion.  The same goes for a field called Age, with a value of &#8220;97&#8243;.  </p>
<p><strong>Searching advanced media: far, far more difficult.</strong></p>
<p>But modern media is far more complex than this.  &#8221;Blob&#8221; data like images, and continuous data, like sound, video, and natural language text, are very difficult to search and interpret automatically.  There are two approaches that have been taken to resolve this dilemma.  </p>
<p><strong>Tagging: the simple approach.</strong></p>
<p>The first is <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-dublin-core-and-the-metadata-object-description-schema-a-look-at-namespaces/">tagging</a>.  Descriptive terms, often taken from large, shared vocabularies, at attached to pieces of media.  These vocabularies can be very domain-specific, dedicated to areas like medicine, law, and engineering.  </p>
<p><strong>Intelligent processing software: the second approach.</strong></p>
<p>The second technique is the automatic processing of pieces of media using image processing, natural language, and other highly intelligent software.  These applications are very sophisticated and understood only by experts.  And, these applications often demand a lot of processing time, and this makes bulk processing impossible.  It’s also true that the results can be haphazard.  Some pieces of media can be interpreted precisely, others not so precisely &#8211; and dramatic mistakes are frequent.  A tennis court might be mistaken for an airplane runway.  There&#8217;s a huge trust factor involved in cranking up image or sound processing software or natural language software.  </p>
<p>Often, we can provide feedback so that these applications can learn, over time, the way we want media to be interpreted.  We can help the software learn the difference between a tennis player and a member of a ground crew on a small runway.  All of this is hugely expensive, in terms of the cost of developing the software, and in terms of the physical resources needed to run the software.</p>
<p><strong>A middle ground?  Not really.</strong></p>
<p>So, is there some middle ground?  Something simple, yet more &#8220;intelligent&#8221;?  Yes, and the answer is to take a sophisticated approach to what otherwise might be very simple tagging techniques.  However, the core problem with tagging remains: we search and process tags &#8211; and not the actual data.  It is an indirect, but fast process.  The goal is to come as close as we can to simulating the results of such things as image processing, but to do it with a simple, yet comprehensive, accurate tag-based technology.</p>
<p>We&#8217;ve looked at some of the solutions that have been proposed.  They include <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-dublin-core-and-the-metadata-object-description-schema-a-look-at-namespaces/">Dublin Core, MODS, and MPEG-7</a>.  The first is very simplistic.  The second is more sophisticated, in that the terminology used is broader and far more precise.  The third is very aggressive in that it supports the complex structuring of tag data elements.  </p>
<p><strong>So, what are we really doing?</strong></p>
<p>In essence, we build a hierarchy of metadata and then instantiate it for every piece of media we want to catalogue and later search.  What we are doing is creating a parallel database, one where every piece of blob or continuous data is accompanied by a possibly very large tree of structured tagging information.  The parallel database has its own schema and an instance of it is created for every piece of media in the original media database.</p>
<p>The end result?  Instead of creating some sort of media-centric query language, like an SQL-for-video, we give up on trying to search the media database itself.  The query language remains largely ignorant of the nature of blob and continuous media.  We can continue to refine and expand the schema of the parallel database until search results are satisfactory.</p>
<p><strong>More later&#8230;</strong></p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/semantic-web/the-parallel-worlds-of-media-databases-and-media-metadata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Semantic Web: RDF and SPARQL, part 4</title>
		<link>http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-4/</link>
		<comments>http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-4/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 02:17:34 +0000</pubDate>
		<dc:creator>Roger King</dc:creator>
				<category><![CDATA[RDF]]></category>
		<category><![CDATA[SPARQL]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[the Semantic Web]]></category>
		<category><![CDATA[triples]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-4/</guid>
		<description><![CDATA[This posting is a continuation of the previous posting. We are discussing RDF, the &#8220;triples&#8221; language that is serving as a cornerstone of the Semantic Web effort. In this posting, we will look at SPARQL, the web language designed to search data that has been specified as RDF triples. The goal of the Semantic Web [...]]]></description>
				<content:encoded><![CDATA[<p>This posting is a continuation of the previous posting. We are discussing RDF, the &#8220;triples&#8221; language that is serving as a cornerstone of the Semantic Web effort. In this posting, we will look at SPARQL, the web language designed to search data that has been specified as RDF triples. The goal of the Semantic Web is to partly automate the searching of the Web, by using RDF to capture deeper semantics of information and SPARQL to query that information. This is in comparison to today&#8217;s technology, which does not allow us to do much more than search for individual words in the text of webpages.</p>
<p><strong>From the last posting.</strong></p>
<p>Here is a piece of the RDF code from the previous posting:</p>
<p>&lt;rdf:RDF</p>
<p><span>xmls:rdf=”</span><span>http://www.w3.org/1999/02/22-rdf-syntax-ns#”</span><span>&gt;<br />
xmls:zx=”</span><span>http://www.someurl.org/zx/”</span><span>&gt;</span></p>
<p>&lt;rdf:Description</p>
<p><span>rdf:about=”</span><span>http://www.awebsite.org/index.html”</span><span>&gt;</span></p>
<p>&lt;zx:created-by&gt;<span>http://www.anotherurl.org/buzz</span>&lt;/zx:created-by&gt;</p>
<p>&lt;/rdf:Description&gt;</p>
<p>&lt;/rdf:RDF&gt;</p>
<p>This can be interpreted as the webpage at awesite.org/index.html was created by Buzz.</p>
<p>A<strong>nother representation of RDF-based information: 3 triples.</strong></p>
<p>We see from the above that RDF simply represents triples. We could simplify it even more as:</p>
<p>http://awesite.org/index.html was created by Buzz</p>
<p>Part of the reason that the original RDF code above is so much more complex is that the full syntax lets us specify that we are using terms that are defined at specific web addresses. This allows people to use standardized terms and greatly enhances the specitifity of an RDF specification. The full syntax also allows us to reference pieces of information that reside on the Web. (See the previous three postings, <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-1/">1</a>, <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-3/">2</a>, <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-3/">3</a>.)</p>
<p>Before we launch into a SPARQL example, we need to make an important distinction between syntax and symantics. The code above is written in a particular syntax for RDF, one that uses XML. We note that because syntax needs to be very precise, it tends to be verbose. This can cause syntax to obsure the conceptual simplicity of underlaying semantics, or meaning.</p>
<p>But this isn&#8217;t the only way to specify RDF triples. Let&#8217;s look at some information that is much simpler, and at the same time, let&#8217;s look at using a different syntax for specifying RDF-like triples. Here are three triples:</p>
<p>&lt;<span>http://awebsite.org/</span> &gt; was-created-by &#8220;Buzz&#8221;</p>
<p>&lt;<span>http://awebsite.org/</span> &gt; was-created-by &#8220;Suzy&#8221;</p>
<p><span>&lt;</span><span>http://anotherwebsite.org/</span><span>&gt; was-created-by &#8220;Alice&#8221;</span></p>
<p>This is a very simple program. It consists of a two triples that say that a website named awebsite was created by Buzz and Suzy, and another triple that says that Alice created a website called anotherwebsite. We are not saying that was-created-by is a widely used term; it may have been invented only for particular RDF specification, and its meaning would therefore not be precise. We can only interpret it from our general understanding of English words. We also have no idea who these people Buzz and Suzy and Alice are, and we have no other information about them.</p>
<p><strong>SPARQL: searching triples distributed across the Web.</strong></p>
<p>Now, here is a piece of code:</p>
<p>prefix website1: <span>&lt;</span><span>http://awebsite.org/</span><span> &gt;</span><br />
SELECT ?x<br />
WHERE<br />
{ website1:was-created-by ?x }</p>
<p>We&#8217;re getting very close to real SPARQL, by the way, and if you know SQL, you can see the extremely similarity. But syntax is not our issue here. We&#8217;re trying to look at concepts.</p>
<p>This code will find the creators of http://awebsite.org. You could imagine that there are actually many thousands of these triples, and that they tell us who built a large number of different websites. Now, we see the power of this query. It will search through all of these triples and find the two of interest to us, and then pluck off the names of the creators.</p>
<p>In fact, these triples could be distributed all around the Web, and we could imagine a search engine taking this query and running it everywhere on the Web where was-created-by triples are stored, and then having it bring back all the creators of awebsite, even if there are a hundred developers, and even if these names are spread around the Internet.</p>
<p><strong>Next, the bigger issue.</strong></p>
<p>In the next posting, we&#8217;ll look more closely at SPARQL. One thing we will consider is why it does look so much like SQL. There is a powerful reason for this that has to do with searching information in general.</p>
<p><br class="final-break" /></p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
