Posted by: Roger King
assertions, inferences, information, namespaces, RDF, SPARQL, triples, URI's, wikis
This blog is dedicated to the study of emerging Web technology, in particular, ongoing research and development aimed at building software tools that will underlie the emerging Semantic Web. Last time, we looked at DBpedia, something that a former graduate student at my university, Greg Ziebold, pointed me toward.
The Semantic MediaWiki.
In this posting, we look at the Semantic MediaWiki, something else that Greg told me about. It is an extension of MediaWiki, the application that the Wikipedia is built out of. You can learn all about it at the Semantic MediaWiki website. The idea behind Semantic MediaWiki is to provide a more powerful wiki tool, namely one that supports more than just human-readable things like text and images.
RDF and namespaces: creating machine-readable, web-based information.
The idea is to allow entries in wikis that contain machine-readable information, so that searching can be performed in a largely automatic fashion. Specifically, the Semantic MediaWiki allows users to export information from a wiki in RDF format. An RDF specification consists of “triples” that form “assertions”. Consider the following
Assertion 1: Joe is tall.
Assertion 2: Tall People should try out for Basketball.
The idea is for terms in triples (“Joe”, “tall”, “is”, “Tall People”, etc.) to be taken from predefined and globally accessible namespaces. This would ensure that everyone who uses a given term (like “tall” or “Should try out for”) will have the same meaning in mind. In this way, rather than having to painfully search for information that pertains to Tall People, for example, a smart search engine could do the searching for us.
Building locally, growing globally.
There is more to this. These namespaces can be available on the Web, and RDF statements can point to the relevant namespaces. This means that software searching the Web, and processing these triples, can easily find the relevant namespaces.
Also, the things in the right and left side of a triple (like “Joe” and “tall”) can themselves be Web-based resources. This means that information scattered around the Web can be interconnected – but all the work can be done locally. No one has to manually integrate millions of websites. The job can be done little by little, in a quiet way, as people start to store their information in an RDF compatible fashion.
This is how the Semantic Web will scale. Everyone will use shared namespaces and shared protocols like RDF. This will, in essence, turn the Web into one big website that can be searched in a partly automatic fashion.
SPARQL: querying RDF-based information.
How will we interrelate data scattered around the Web?
There is a query language out there, called SPARQL, that can be used to search the Web. SPARQL can follow RDF connections around the globe. How is this done? It has to do with being able to “infer” new things. Consider a fact that can be automatically deduced from the two assertions above:
A new inference: Joe should try out for Basketball.
Assertion 1 could be on a server in Detroit, and assertion 2 could be on a server in Miami, and SPARQL could do the job of making the leap that leads to the new inference.
This means that we could figure out what Joe should be doing right now without having to find the two pieces of information manually (the fact that he is tall, and that tall people should play basketball), and without having to make the inference ourselves.
This is a big deal. This sort of automation is what the Semantic Web is all about.
So what do real people do with the Semantic MediaWiki? We’ll look at this next.