This blog is dedicated to the Semantic Web and Web 2.0/3.0 technology. In this posting, we consider privacy and the Semantic Web.
The Traveler had it easy.
There is a series of three science fiction novels by John Twelve Hawks. They concern a “Traveler” who battles the “Vast Machine”, which is a global grid of security cameras, governmental and corporate databases, and computers that collect information on people, track them, and manipulate society. They are very popular novels.
But these books are not all that imaginative.
Why not? If and when the Semantic Web ever emerges (please see previous postings of this blog), there will be a lot more than security camera footage and passive database systems out there. In his books, Twelve Hawks describes programmers working for the Vast Machine who pull information out of databases and plant information in databases, and who somehow locate and integrate information from many sources. It’s not clear how they do it.
The problem is tractability. Extracting the meaning of data (its “semantics”) is extremely difficult, and given today’s Web, it is a highly manual, painstaking, and ultimately intractable problem. Twelve Hawks’ Vast Machine isn’t all that much of a threat.
Consider, however, the emerging Semantic Web.
The whole idea of the Semantic Web, on the other hand, is to make databases proactive, to let them announce their content by using globally accepted standards. In this blog, we have looked at one proposed standard, called RDF, which is based on “triples” that interrelate information, and a Web-hopping query language called SPARQL that can concatenate triples that define information at diverse, independently-created websites – thus inferring new information. We’ve looked at the beginnings of this technology as it is taking form on the Web.
In other words, it might not be long at all before the least of our problems would be dastardly hackers who break into databases and pluck information – because the finding, integrating, and interpreting of data from highly divergent sources will become, in large part, automatic.
It will make the intractable quite tractable.
Okay, I confess…
It is not as simple as that, of course, and I am grossly overstating the danger. Presumably, private databases belonging to corporations and governments will not be loaded up with this sort of semantic metadata and placed on the open Web. And the sorts of inferences that can be made by unifying metadata from multiple sites will be fairly low-level, leaving a lot of difficult work for any Vast Machine that wants to manipulate our every move and thought.
But it is true that the potential for misuse will increase sharply. There will indeed be many isolated instances where innocently posted information from two or more sites will be automatically linked together because of uniformly-specified metadata. If one triple at one site has data marked up as “People OWN Kinds-0f-StampCollections”, and another site says that “Kinds-of-StampCollections HAVE Certain-Values”, a thief who knows little about philatelics might learn that Bob owns stamps from the Southern Confederacy, and that stamps from the Southern Confederacy are worth hundreds of thousands of dollars…
Just a thought for the next sci-fi writer.