<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Buzz’s Blog: On Web 3.0 and the Semantic Web &#187; smart search engines</title>
	<atom:link href="http://itknowledgeexchange.techtarget.com/semantic-web/tag/smart-search-engines/feed/" rel="self" type="application/rss+xml" />
	<link>http://itknowledgeexchange.techtarget.com/semantic-web</link>
	<description>Defining the necessary skills for future software professionals</description>
	<lastBuildDate>Sun, 16 Dec 2012 04:42:23 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Multimedia: The Problem of Subtle Semantics</title>
		<link>http://itknowledgeexchange.techtarget.com/semantic-web/multimedia-the-problem-of-subtle-semantics/</link>
		<comments>http://itknowledgeexchange.techtarget.com/semantic-web/multimedia-the-problem-of-subtle-semantics/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 21:12:46 +0000</pubDate>
		<dc:creator>Roger King</dc:creator>
				<category><![CDATA[3D animation]]></category>
		<category><![CDATA[3D modeling]]></category>
		<category><![CDATA[advanced Web apps]]></category>
		<category><![CDATA[automating Web searches]]></category>
		<category><![CDATA[blob data]]></category>
		<category><![CDATA[continuous data]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[Multimedia]]></category>
		<category><![CDATA[rich internet apps]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[smart search engines]]></category>
		<category><![CDATA[tagging]]></category>
		<category><![CDATA[Text]]></category>
		<category><![CDATA[Web 2.0]]></category>
		<category><![CDATA[Web 3.0]]></category>
		<category><![CDATA[web applications]]></category>
		<category><![CDATA[Web development]]></category>
		<category><![CDATA[Web development frameworks]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/semantic-web/multimedia-the-problem-of-subtle-semantics/</guid>
		<description><![CDATA[The challenge of the Semantic Web. We’ve looked at the emerging Semantic Web technology in the previous postings of this blog. The idea is to have a far, far smarter Web, one where the process of finding and interpreting and making use of far flung information can be largely automated. This is in sharp contrast [...]]]></description>
				<content:encoded><![CDATA[<p><strong>The challenge of the Semantic Web.</strong></p>
<p>We’ve looked at the emerging Semantic Web technology in the previous postings of this blog.  The idea is to have a far, far smarter Web, one where the process of finding and interpreting and making use of far flung information can be largely automated.  This is in sharp contrast with today’s Web, where these things have to be done in a painful, extremely time-consuming fashion.</p>
<p>So that is the key challenge.  It has to do with searching the kinds of information that are important to us in our daily lives.  This information, as it turns out, is very difficult to process automatically.  Why is this?</p>
<p><strong>The complexity of modern multimedia.</strong></p>
<p>I teach a very basic 3D animation class to mostly computer science students.  We use Maya, arguably the most popular 3D animation application, one that is used in the making of many animated features.  The interesting thing about animation is that it is truly multimedia.  It can give us a lot of insight into what we need the new Web to do for us.</p>
<p>That’s because the number and diversity of applications that are used for drawing, documenting, modeling, animating, motion capture, texturing, video rendering, video editing, video conversion and compression, sound editing, in even small projects, can be very impressive.  Correspondingly, the wide variety and complexity of media formats involved in an animation project can be overwhelming.  </p>
<p>What happens in an animation project?  The workflow might begin with vector storyboard drawings to break the story down into scenes. In a typical animation project, 3D models in a variety of proprietary formats are used.  Models must be transformed as they are exported from one application and imported into the next. Multiple video renders of animated models are made, and they must be edited together, along with multiple sound files.  Multiple video and audio formats might be used. 2D images are used for textures; photographs of butterfly wings can be used to make an animated butterfly very realistic, and a checkerboard image made with Photoshop can be used to make a Linoleum floor.  And along the way, a variety of note taking, screen capture, and conferencing software might be used to facilitate group communication.</p>
<p>There is also a heavy focus on reuse in an animation project.  Building every model, editing every texture, creating every environment and background, recording every sound from scratch is frequently intractable.  If existing assets cannot be tailored and reused, the project would be far too expensive and time consuming, and would demand too wide a variety of professionals to always be available.  This raises the multimedia stakes, as assets of widely differing forms must be constantly reconfigured and used in concert in new ways.</p>
<p>But what’s the real problem?  We aren’t all trying to produce complex animated videos.  But very interestingly, in our everyday lives we essentially face the animator’s challenge when we try to find and use information on the Web.  That’s because we’re often looking for things whose meaning, whose interpretation, demands focused human thought.  We are looking not for business data, but for pieces of media, and the problem is that today, most of our searching has to be based on tags or brief textual descriptions that are associated with pieces of media, and not on the true meaning of the media itself.  </p>
<p><strong>The needs of the business world are not our needs.<br />
</strong><br />
It’s the subjective nature of media assets &#8211; this is what is at the heart of the problem facing us.  Existing technology for searching the web is based on keywords and very short pieces of text.  </p>
<p>There is other technology, though, under active development, stuff that serves as the information storage backbone of most commercial websites.  It’s the technology that has for decades been used in-house (not on the Web) by businesses when they process large databases.  But this stuff was designed to handle traditional business data forms, like integers, character strings, real numbers, dates, timestamps, and full text.  </p>
<p>There is more, though.  All of the major database management systems, along with tools for building and searching advanced websites are being retrofitted (or in some cases, built from the ground up) to manage more than keywords and text, more than standard business data.  </p>
<p>But up to  now, the focus has not been on supporting the kinds of information you and I are most interested in.  The focus has been on extending database and Web technology to support xml documents, as well as more complex data objects, like those inside a Java program, as well as other forms of data found inside programs. This includes arrays and lists and short pieces of textual data, like the names of diseases.  </p>
<p>In other words, we’ve been busy extending our support of the business world, so they can store complex business data in databases and make that information  processable over the Web.  You and I have largely been left out.</p>
<p><strong>Finally, we are attacking our needs.</strong></p>
<p>But there now many ongoing efforts to extend database and Web technology to make it useful to us.  The new focus is on supporting blob and continuous media like images, video, and audio.  This is extremely hard to do.</p>
<p>Why?  Because the strongest means by which we deduce the meeting of business data is by looking at its internal structure and the terms that are used to describe that structure.  A relational table named Prescriptions, with a character attributes Patient Name, Doctor’s Name, and Medication, and with a numeric attribute Dosage, is pretty easy to interpret.  </p>
<p>But what do we do with a photograph, which is just a grid of pixels with no internal structure?  Or a long series of images, along with a sound track, put together to form a piece of video?  </p>
<p>The U.S. military has been pumping money into image processing for several decades, and so all is not lost.  There is a vast body of mathematical research and software development that allows us to write programs that can find a particular face in a crowd and search satellite photos for airplane runways.  But in general, we cannot at this time write a program that can process an arbitrary photo or video clip and tell us what it <strong></strong><em>means</em><strong>.  </strong>That means we can’t quickly search vast media database for useful pieces of information.</p>
<p>The goal behind the Semantic Web effort is to build a new generation  of websites whose information can be searched automatically, and where information from multiple sites can be automatically integrated.  To do this with numeric and character based data is quite doable.  But when it comes to multimedia, like images and sound and video and 3D models and engineering designs, well, we have a long way to go. The meaning &#8211; in other words, the semantics &#8211; of these forms of data are complex and subtle, and highly dependent upon an individual’s interpretation of that media.</p>
<p><strong>So, we see that we have only just begun our journey to create the new Web.</strong></p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/semantic-web/multimedia-the-problem-of-subtle-semantics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dynamic pages, hidden data, and infered information: the danger of scale.</title>
		<link>http://itknowledgeexchange.techtarget.com/semantic-web/dynamic-pages-hidden-data-and-infered-information-the-danger-of-scale/</link>
		<comments>http://itknowledgeexchange.techtarget.com/semantic-web/dynamic-pages-hidden-data-and-infered-information-the-danger-of-scale/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 03:02:28 +0000</pubDate>
		<dc:creator>Roger King</dc:creator>
				<category><![CDATA[assertions]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[dynamic pages]]></category>
		<category><![CDATA[hidden web content]]></category>
		<category><![CDATA[inferences]]></category>
		<category><![CDATA[next generation search engines]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[smart search engines]]></category>
		<category><![CDATA[static pages]]></category>
		<category><![CDATA[triples]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/semantic-web/dynamic-pages-hidden-data-and-infered-information-the-danger-of-scale/</guid>
		<description><![CDATA[The good and bad sides of the powerful Semantic Web. So what happens when the Semantic Web is here? It’s supposed to largely automate the process of searching the Web by allowing us to attach machine-readable assertions (perhaps by using RDF) to information posted on the Web. Then, instead of us poor flailing humans having [...]]]></description>
				<content:encoded><![CDATA[<p><strong>The good and bad sides of the powerful Semantic Web.<br />
</strong><br />
So what happens when the <a href="http://itknowledgeexchange.techtarget.com/semantic-web/what-do-we-mean-by-semantic-web/">Semantic Web</a> is here?  It’s supposed to largely automate the process of searching the Web by allowing us to attach machine-readable assertions (perhaps by using <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-1/">RDF</a>) to information posted on the Web.  Then, instead of us poor flailing humans having to painstakingly chase down countless URLs until we get what we want, smart search engines would be able to find precisely what we want in a single shot.</p>
<p>There is an obvious danger to all of this.  The new Web will scale, in both good ways and bad.  I am certainly not the first person to point out that the smarter the Web, the easier it will be for software to peruse the Web and dig up personal information about us. There will be software that carefully crafts ads in Spam mail that will target our vulnerabilities and our preferences. Websites will dynamically create webpages that target us individually, as well.  When we shop online, when we read news, when we make social connections online, the Web will be disarmingly efficient and effective, and this leaves lots of room for fraud and manipulation.</p>
<p>This is already happening to a significant degree, and most of us are aware of it.</p>
<p><strong>The no-longer-hidden database factor.<br />
</strong><br />
There is something more subtle about all of this, however.  One of the most difficult things to do with traditional Web technology is to expose the content of databases to Web visitors.  That’s because the pages that deliver up content pulled from databases are highly dynamic in nature, and so it is very hard for web designers to make search engines (like Google) find and index the content of these databases.  There are simple and somewhat effective things web designers can do, like creating static pages that contain terms that are meant to draw web visitors to their sites.  These pages are not “destination” pages; rather, they exist only as a way of advertising the information  contained in databases. </p>
<p>In the future, RDF assertions (and other machine-readable content) will be added to websites, and they will server as far more effective draws.</p>
<p>But what about privacy?  Will web designers inadvertently facilitate fraud and identity theft by enabling the automatic cross-referencing of detailed information existing in databases that have been built and deployed on the Web in isolation?  This capability is at the heart of the Semantic Web effort.  Information that right now can only be obtained by individual users manipulating individual web interfaces will be discoverable by smart search engines.  </p>
<p><strong>The real problem: it will scale.<br />
</strong><br />
This is a big deal.  It’s not just that previously hidden information will now be <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-revealing-hidden-data/">discoverable</a>.  Because standardized terms and assertions will be used to describe information in databases, smart search engines will be able to automatically interrelate data from otherwise unrelated database systems. When information from multiple places is integrated, new information is effectively created.  </p>
<p>For a moment, let’s forget about databases and look at a simple example of information that might be stored statically in two websites.  Here is an example adapted from the previous posting of this blog:</p>
<p>Assertion 1: Joe <em><strong>is</strong></em> tall for an athlete.<br />
Assertion 2: Tall athletes <em><strong>should try out for</strong></em> basketball.</p>
<p>A new inference: Joe <em><strong>should try out for</strong></em> basketball.  </p>
<p>The point here is that this new inference can be inferred automatically, without the intervention of a human being.</p>
<p>We noted in the previous posting that the information  about Joe and the information about basketball might be on different websites.  These websites could easily have been built independently.  But a key notion &#8211; and that is the semantics of the word “tall” in the context of basketball &#8211; is what allows this information to be automatically integrated.  Another site might point out that Timmy is tall for a kindergarten student, but this would not trigger the suggestion  that Timmy try out for the NBA.</p>
<p>Now, let’s get back to database systems, these things that can contain countless terabytes of personal information.  Perhaps there is a database at one site containing information about many thousands of athletes. Perhaps there are hundreds or thousands of such sites.  The Semantic Web would allow us to find tall athletes without having to know in  advance what databases around the world have this sort of data inside them, data that previously could only have been extracted through tedious, time-consume human/computer interaction.  Now, a high school counselor or a sports agent looking for new clients can be far more effective at their jobs.</p>
<p>Or, maybe it’s a drug company matching potential customers up with expensive drugs targeted toward specific diseases, or toward people who might have vague symptoms of various diseases, and who might be easily convinced they are sick.   Ora con artist looking to scam elderly people who are likely to have dementias.  </p>
<p>Or &#8211; well, get it?  The Semantic Web will <em><strong>scale</strong></em> because it will have access to huge databases, and not just a world wide web of static pages.  That’s the danger.</p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/semantic-web/dynamic-pages-hidden-data-and-infered-information-the-danger-of-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dangers of the Semantic Web: Assertions, Inferences, and Surrogates</title>
		<link>http://itknowledgeexchange.techtarget.com/semantic-web/assertions-inferences-and-surrogates/</link>
		<comments>http://itknowledgeexchange.techtarget.com/semantic-web/assertions-inferences-and-surrogates/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 04:16:26 +0000</pubDate>
		<dc:creator>Roger King</dc:creator>
				<category><![CDATA[assertions]]></category>
		<category><![CDATA[inferences]]></category>
		<category><![CDATA[namespaces]]></category>
		<category><![CDATA[next generation search engines]]></category>
		<category><![CDATA[smart search engines]]></category>
		<category><![CDATA[surrogates]]></category>
		<category><![CDATA[the Semantic Web]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/semantic-web/assertions-inferences-and-surrogates/</guid>
		<description><![CDATA[This blog deals with advanced Web technology. Each posting should be quite understandable on its own, but the blog as a whole is a continuing story. We&#8217;ve been looking at the Semantic Web, which is a global effort to automate the searching of the Web, so that applications (we might call them smart search engines) [...]]]></description>
				<content:encoded><![CDATA[<p>This blog deals with advanced Web technology. Each posting should be quite understandable on its own, but the blog as a whole is a continuing story. We&#8217;ve been looking at the Semantic Web, which is a global effort to automate the searching of the Web, so that applications (we might call them smart search engines) can find, interpret, interrelate, and aggregate information stored in multiple, independent websites.</p>
<p><strong>Assertions and Inferences.</strong></p>
<p>A key concept is that of an &#8220;inference&#8221;, a fact that is created by putting together two or more pieces of information that we might call &#8220;assertions&#8221;. We used the following example in the example in a <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-semantic-web-rdf-and-sparql-part-3/">previous posting</a>. The two assertions might be posted on the Web somewhere.</p>
<p>Assertion 1: THE BALL is ORANGE.<br />
Assertion 2: ORANGE is an UGLY COLOR.<br />
An inference created by putting the two assertions together: THE BALL is an UGLY COLOR.</p>
<p>We have also discussed the fact that <a href="http://itknowledgeexchange.techtarget.com/semantic-web/the-dublin-core-and-the-metadata-object-description-schema-a-look-at-namespaces/">terminology</a> used in inferences must be very carefully defined and widely shared.</p>
<p><strong>What is a Surrogate?</strong></p>
<p>The word surrogate, in the programming world, refers to a measure or model that is being used to approximate the &#8220;real&#8221; measure or model. If I am trying to estimate the depth of the ocean at some point, but don&#8217;t have a direct way of measuring the distance to the ocean floor, I might judge the depth by using a table that associates the distance from the shore to the depth of the ocean. The assumption is that all points that are a particular distance from the shore will have the same depth more or less.</p>
<p>Here&#8217;s the important point for us: The Semantic Web will make very heavy use of surrogates. Let&#8217;s be precise about this. We&#8217;re not talking about approximations. We might search the Web for all banks that provide accounts that earn 5%, and our smart search engine might point us to banks that on the average, over the past two years, have paid at least 5.0% on their accounts. A surrogate is something different. Suppose we wanted to find all banks that never cheated their customers. This might be impossible to answer precisely, so we might look for banks that are in the bottom 10% when it comes to the number of formal complaints filed against them. That would be a surrogate.</p>
<p><strong>Surrogates on the New Web.</strong></p>
<p>Now, let&#8217;s consider the Web. It doesn&#8217;t matter if we are talking about the Web today or the emerging Semantic Web.</p>
<p>In fact, what we are concerned with here is global to computing in general: when we take a chore normally performed by a human using an interactive interface and turn that chore over to a computer program, we often turn a real world decision into a decision based on very simplified surrogates. A human can look at a bunch of information and, although it may take a very, very long time, make a &#8220;perfect&#8221; decision based on that data. But computer programs cannot think like a human. We can only crudely simulate with software the process of thinking that goes on in the mind of a real person.</p>
<p>Now, back to the Web, the new Semantic Web. Suppose we build a next generation website and use an official namespace (which is a structured set of terms) to specify assertions using terms from this namespace. What we&#8217;re doing is providing a surrogate for the smart search engine to use so that it can do the filtering of URLs and the integrating of information from multiple sites.</p>
<p>Consider our two assertions from above, along with the inference derived from them:</p>
<p>Assertion 1: THE BALL is ORANGE.<br />
Assertion 2: ORANGE is an UGLY COLOR.<br />
An inference created by putting the two assertions together: THE BALL is an UGLY COLOR.</p>
<p>Maybe we are shopping for a ball online. We mght have to follow hundreds of URLs and search hundreds of websites to find just the right ball. But who said the ball is orange? It&#8217;s an approximation made by the vendor of the ball in question. It has been labeled orange. But maybe it&#8217;s a shade of orange that we would actually have liked if we had looked at the picture of the ball ourselves instead of leaving it to the search engine.</p>
<p>Well, we might argue that the word orange, if it is precisely defined, won&#8217;t be confused with some other color. We can be confident that our notion of orange is the same as the vendor&#8217;s notion of orange. We do know how to express colors very precisely by using numbers.</p>
<p>So, let&#8217;s change the assertions and the inference a bit:</p>
<p>Assertion 1: DOROTHY THE DOLL is PRETTY.<br />
Assertion 2: WE want a PRETTY DOLL.<br />
An inference created by putting the two assertions together: WE might want DOROTHY THE DOLL.</p>
<p>Now, how could the notion of pretty ever be globally and uniformly defined?</p>
<p>It cannot.</p>
<p>Maybe we should shop for our own dolls and not leave it to a next generation search engine.</p>
<p><strong>The Lesson.</strong></p>
<p>The Semantic Web will trade speed for accuracy. No way around it.</p>
<p><br class="final-break" /></p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/semantic-web/assertions-inferences-and-surrogates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
