URI's archives - Buzz’s Blog: On Web 3.0 and the Semantic Web

Buzz’s Blog: On Web 3.0 and the Semantic Web:

URI's

Sep 9 2009   6:02PM GMT

Real-World Look at the Semantic Web, part 2



Posted by: Roger “Buzz” King
assertions, inferences, information, namespaces, RDF, SPARQL, triples, URI's, wikis

This blog is dedicated to the study of emerging Web technology, in particular, ongoing research and development aimed at building software tools that will underlie the emerging Semantic Web. Last time, we looked at DBpedia, something that a former graduate student at my university, Greg Ziebold, pointed me toward.

The Semantic MediaWiki.

In this posting, we look at the Semantic MediaWiki, something else that Greg told me about. It is an extension of MediaWiki, the application that the Wikipedia is built out of. You can learn all about it at the Semantic MediaWiki website. The idea behind Semantic MediaWiki is to provide a more powerful wiki tool, namely one that supports more than just human-readable things like text and images.

RDF and namespaces: creating machine-readable, web-based information.

The idea is to allow entries in wikis that contain machine-readable information, so that searching can be performed in a largely automatic fashion. Specifically, the Semantic MediaWiki allows users to export information from a wiki in RDF format. An RDF specification consists of “triples” that form “assertions”. Consider the following

Assertion 1: Joe is tall.
Assertion 2: Tall People should try out for Basketball.

The idea is for terms in triples (“Joe”, “tall”, “is”, “Tall People”, etc.) to be taken from predefined and globally accessible namespaces. This would ensure that everyone who uses a given term (like “tall” or “Should try out for”) will have the same meaning in mind. In this way, rather than having to painfully search for information that pertains to Tall People, for example, a smart search engine could do the searching for us.

Building locally, growing globally.

There is more to this. These namespaces can be available on the Web, and RDF statements can point to the relevant namespaces. This means that software searching the Web, and processing these triples, can easily find the relevant namespaces.

Also, the things in the right and left side of a triple (like “Joe” and “tall”) can themselves be Web-based resources. This means that information scattered around the Web can be interconnected - but all the work can be done locally. No one has to manually integrate millions of websites. The job can be done little by little, in a quiet way, as people start to store their information in an RDF compatible fashion.

This is how the Semantic Web will scale. Everyone will use shared namespaces and shared protocols like RDF. This will, in essence, turn the Web into one big website that can be searched in a partly automatic fashion.

SPARQL: querying RDF-based information.

How will we interrelate data scattered around the Web?

There is a query language out there, called SPARQL, that can be used to search the Web. SPARQL can follow RDF connections around the globe. How is this done? It has to do with being able to “infer” new things. Consider a fact that can be automatically deduced from the two assertions above:

A new inference: Joe should try out for Basketball.

Assertion 1 could be on a server in Detroit, and assertion 2 could be on a server in Miami, and SPARQL could do the job of making the leap that leads to the new inference.

This means that we could figure out what Joe should be doing right now without having to find the two pieces of information manually (the fact that he is tall, and that tall people should play basketball), and without having to make the inference ourselves.

This is a big deal. This sort of automation is what the Semantic Web is all about.

So what do real people do with the Semantic MediaWiki? We’ll look at this next.

Jul 10 2009   2:35AM GMT

The Semantic Web: RDF and SPARQL, part 2



Posted by: Roger “Buzz” King
the Semantic Web, RDF, triples, XML, URI's

This posting is a continuation of the previous posting. We are discussing RDF, the “triples” language that is serving as a cornerstone of the Semantic Web effort. In the previous posting, we looked at a simple RDF program, which creates a relationship between a web-based resource and the term “funstuff”; the relationship is called “topic”, thus telling us that the resource located at the given URL is something fun.

RDF and URI’s.

One interesting fact is that, although we only used URI’s for two parts of the RDF triple embedded in this RDF program, we could have used URI’s for all three pieces of the triple. Thus, the program from the previous blog posting (immediately below) might be changed to look like the second program below, which now has two triples in it:

<rdf:RDF

xmls:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”>
xmls:zx=”
http://www.someurl.org/zx/”>

<rdf:Description

rdf:about=”http://www.awebsite.org/index.html”>
<zx:topic>funstuff</zx:topic>

</rdf:Description>

</rdf:RDF>

————-

<rdf:RDF

xmls:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”>
xmls:zx=”
http://www.someurl.org/zx/”>

<rdf:Description

rdf:about=”http://www.awebsite.org/index.html”>
<zx:topic>funstuff</zx:topic>

<zx:created-by>http://www.anotherurl.org/buzz</zx:created-by>

</rdf:Description>

</rdf:RDF>

RDF and decentralized information.

As a reminder, the triple expressed in the first program can be stated as:

www.awebsite.org/index.html <topic> funstuff

So, what did we add in the second program?  There is a new triple that has been added.  It can be roughly stated as:

www.awebsite.org/index.html <created-by> http://www.anotherurl.org/buzz

In other words, our vocabulary defined at http://www.someurl.org/zx apparently has another standardized term called “created-by”.  The added triple in our second program says that the resource found at www.awebsite.org/index.html was created by someone who is identified by the url http://www.anotherurl.org/buzz.

We see that the value in the first triple, which concerns the “topic” of our resource, consists of a character string, but the value in the second triple, which concerns the “created-by” of our resource, is actually a URL.

This is big.  It shows us that all three parts of a triple in RDF can be URI’s, and they can be distributed around the Internet.  This means that the information embedded in the triple is highly decentralized.

The bottom line

This illustrates the power of RDF.  It can be used to express information which is not controlled in any centralized fashion.  RDF is thus the glue that can be used to bring diverse pieces information together.  And it can use standardized, shared terminologies to precisely dictate the semantics of the triples in RDF programs.  In our example, the resource is defined by one URI, the kind of relationship is defined by another URI, and the value of that relationship is defined by yet another URI.

We will continue this in the next posting.