Buzz’s Blog: On Web 3.0 and the Semantic Web

Dec 1 2009   3:29AM GMT

Web privacy and the Vast Machine: You ain’t seen nothing yet.

Roger King Roger King Profile: Roger King

This blog is dedicated to the Semantic Web and Web 2.0/3.0 technology. In this posting, we consider privacy and the Semantic Web.

The Traveler had it easy.

There is a series of three science fiction novels by John Twelve Hawks. They concern a “Traveler” who battles the “Vast Machine”, which is a global grid of security cameras, governmental and corporate databases, and computers that collect information on people, track them, and manipulate society. They are very popular novels.

But these books are not all that imaginative.

Why not? If and when the Semantic Web ever emerges (please see previous postings of this blog), there will be a lot more than security camera footage and passive database systems out there. In his books, Twelve Hawks describes programmers working for the Vast Machine who pull information out of databases and plant information in databases, and who somehow locate and integrate information from many sources. It’s not clear how they do it.

The problem is tractability. Extracting the meaning of data (its “semantics”) is extremely difficult, and given today’s Web, it is a highly manual, painstaking, and ultimately intractable problem. Twelve Hawks’ Vast Machine isn’t all that much of a threat.

Consider, however, the emerging Semantic Web.

The whole idea of the Semantic Web, on the other hand, is to make databases proactive, to let them announce their content by using globally accepted standards. In this blog, we have looked at one proposed standard, called RDF, which is based on “triples” that interrelate information, and a Web-hopping query language called SPARQL that can concatenate triples that define information at diverse, independently-created websites – thus inferring new information. We’ve looked at the beginnings of this technology as it is taking form on the Web.

In other words, it might not be long at all before the least of our problems would be dastardly hackers who break into databases and pluck information – because the finding, integrating, and interpreting of data from highly divergent sources will become, in large part, automatic.

It will make the intractable quite tractable.

Okay, I confess…

It is not as simple as that, of course, and I am grossly overstating the danger. Presumably, private databases belonging to corporations and governments will not be loaded up with this sort of semantic metadata and placed on the open Web. And the sorts of inferences that can be made by unifying metadata from multiple sites will be fairly low-level, leaving a lot of difficult work for any Vast Machine that wants to manipulate our every move and thought.


But it is true that the potential for misuse will increase sharply. There will indeed be many isolated instances where innocently posted information from two or more sites will be automatically linked together because of uniformly-specified metadata. If one triple at one site has data marked up as “People OWN Kinds-0f-StampCollections”, and another site says that “Kinds-of-StampCollections HAVE Certain-Values”, a thief who knows little about philatelics might learn that Bob owns stamps from the Southern Confederacy, and that stamps from the Southern Confederacy are worth hundreds of thousands of dollars…

Just a thought for the next sci-fi writer.

1  Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • Ebwolf
    I've always wanted a way to find out what kind of things have a difference in value. Warren Buffet is the king of value - he has a tap on undervalued companies. Antique Roadshow attracts viewers because the occasional person who comes in with the $100K antique that they almost threw away... I'd like to be able go to a yard sale and connect the database of stuff for sale with a database of values and search for differences. One way now is to get eBay to start publishing triples. Then the problem is indexing someone's yard full of junk... The first time I went to the used book store on 95th Street in Louisville I had a rude surprise. They price all of their books based on Amazon's used books. So those compelling used bookstore finds there actually cost a small mint (for used books, anyway). And I'm not worried about thieves putting together this intelligence - I'm more worried about Skynet putting it together... But that's another sci-fi novel.
    0 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: