This posting is a continuation of the previous posting. We are discussing RDF, the “triples” language that is serving as a cornerstone of the Semantic Web effort.
In the previous two postings, we looked at RDF, which is an excellent example of solid software technology: It serves an important purpose. It is easy to use. And, even if you don’t write any RDF yourself, it is easy to understand what it does, and therefore, how it will impact your life.
RDF, in its simple, quiet way, allows us to interconnect any resources that exist on the Web, and at the same time, make use of standardized terminologies. This provides a highly flexible and semantically expressive way of building the new Semantic Web.
SPARQL: what is it?
RDF is great stuff, but it’s only half the story. If knowledge on the emerging Semantic Web is going to be glued together into RDF triples, how will that information be searched? It doesn’t do any good to have a book that will solve all your problems if you can’t read it or search through it.
SPARQL stands for Protocol And RDF Query Language, with an S tossed into the beginning so we can say it as “sparkle”. Interestingly, when something is called a “query” language, we start thinking in terms of SQL, that largely declarative relational language that is the core of almost all successful relational database management systems. Indeed, as we will see in a later blog posting about XQuery, the language for searching XML-based data, SQL, has served as the model for SPARQL.
A blast from the past.
There’s something about triples that we should look at before moving on. It has to do with the fact that triples are also known as “assertions”, and that assertions can be chained together to make “inferences”. Here are two triples/assertions, specified very informally: THE BALL is ORANGE. ORANGE is an UGLY COLOR. The inference we can make is THE BALL is an UGLY COLOR.
Or, getting back to the Web and RDF, below are two triples specified in RDF; the first one comes from the previous posting of this blog.
This first one can be interpreted as the webpage at awesite.org/index.html was created by Buzz.
Here is the second one RDF triple:
This one can be interpreted as Buzz is the guy described at yetanotherurl.org/professor.
We can chain them together to deduce that the guy who built the page at awebsite.org/index.html is Buzz the professor.
This is an inference.
The point is that if you take a bunch of RDF statements and chain them together, you get what looks a lot like an object-oriented graph of related objects, somewhat like you see in Java. In a sense, RDF takes an object representation and breaks in down into triples. There’s really nothing new in RDF, other than the fact that any part of an RDF assertion (triple) can be something found on the Web.
Back to SPARQL.
So, what is SPARQL? It is a language that can be used to traverse graphs that consist of RDF triples that are chained together into an object network.
We will look at some SPARQL code in the next posting.
This posting is a continuation of the previous posting. We are discussing RDF, the “triples” language that is serving as a cornerstone of the Semantic Web effort. In the previous posting, we looked at a simple RDF program, which creates a relationship between a web-based resource and the term “funstuff”; the relationship is called “topic”, thus telling us that the resource located at the given URL is something fun.
RDF and URI’s.
One interesting fact is that, although we only used URI’s for two parts of the RDF triple embedded in this RDF program, we could have used URI’s for all three pieces of the triple. Thus, the program from the previous blog posting (immediately below) might be changed to look like the second program below, which now has two triples in it:
RDF and decentralized information.
As a reminder, the triple expressed in the first program can be stated as:
www.awebsite.org/index.html <topic> funstuff
So, what did we add in the second program? There is a new triple that has been added. It can be roughly stated as:
www.awebsite.org/index.html <created-by> http://www.anotherurl.org/buzz
In other words, our vocabulary defined at http://www.someurl.org/zx apparently has another standardized term called “created-by”. The added triple in our second program says that the resource found at www.awebsite.org/index.html was created by someone who is identified by the url http://www.anotherurl.org/buzz.
We see that the value in the first triple, which concerns the “topic” of our resource, consists of a character string, but the value in the second triple, which concerns the “created-by” of our resource, is actually a URL.
This is big. It shows us that all three parts of a triple in RDF can be URI’s, and they can be distributed around the Internet. This means that the information embedded in the triple is highly decentralized.
The bottom line
This illustrates the power of RDF. It can be used to express information which is not controlled in any centralized fashion. RDF is thus the glue that can be used to bring diverse pieces information together. And it can use standardized, shared terminologies to precisely dictate the semantics of the triples in RDF programs. In our example, the resource is defined by one URI, the kind of relationship is defined by another URI, and the value of that relationship is defined by yet another URI.
We will continue this in the next posting.
This blog is dedicated to advanced and emerging Web technology. Each posting is meant to be understandable and informative on its own, but the blog as a whole tells a continuing story.
The Semantic Web.
In this posting, we will focus on the Semantic Web, which is a global effort at radically improving our ability to search the Web.
Currently, to search the web, we type in keywords into a search engine like Google, which then searches its vast index of webpages for pages that have these keywords in them. Because this sort of search is very low-level, and not at all tied to the true meaning or purpose of the information stored in webpages, searching is painfully iterative and interactive. A user must chase down countless URLs returned by a search engine to see if any of them are relevant. Quite frequently, they are not. And so, the user must refine the set of keywords and tries again. It might take many attempts before a satisfactory result is obtained.
One of the primary goals of the Semantic Web is to automate the process of searching the Web. There are two stages to this. First, people who post information on the Web must capture knowledge about the meaning of their information; this knowledge is commonly called “metadata”. The metadata is then store with the posted information.
The second stage happens when users search the Web. Rather than using the low level keyword search approach, the search is at least partly automated. The iterative process is sharply reduced by employing a smart search engine that knows how to find relevant information by searching for metadata that pertains precisely to whatever it is that the user is seeking.
The bottom line.
The Semantic Web would be able to ease the burden of searching for information, as well as find vast stores of “hidden data” that reside in databases that are accessible via webpages, but whose contents right now are not seen by search engines.
Ultimately, we would want the Web to be entirely searchable by software, without any humans guiding the process. This would be the true Semantic Web.
Namespaces and triples.
In past postings of this blog, we have discussed a handful of key approaches to implement the Semantic Web. One idea is to tag information with standardized sets of terminology called “namespaces“.
We have also looked at the idea of embedding these tags in things called “triples“. In this posting, we look at this concept more closely and consider an existing language that would allow people to specify these triples.
RDF and SPARQL.
The most well-known standard for specifying triples is RDF, which stands for the Research Description Framework. SPARQL is a query language, heavily influenced by SQL, that can be used to search data that has been structured using RDF.
This is the first of a series of blog postings in which we will first look at RDF, and then at SPARQL. Then, we’ll consider the big issue: will RDF and SPARQL enable the development of the true Semantic Web?
So, what is RDF? At its highest level, RDF is used to describe anything that can be found on the Web. RDF has an XML syntax; in other words, RDF can be written as an XML program, using a set of predefined “element” and “attribute” tags. (XML and XML languages were discussed in an earlier posting of this blog, as was XML and declarative languages.)
We might remember that on its own, XML is impotent. It is not in itself a programming language. It is simply a language standard for taking a set of tags and using them as “elements” and “attributes” in a declarative, data-intensive languages. A good example is SMIL, which is used to define multimedia presentations.
Here is a fragment in RDF, using its XML syntax. Note that XML languages are embedded languages, with opening tags beginning with <> and closing ones ending in </>
This looks complicated, but it’s not. This simple example illustrates the power of RDF. It uses a set of standardized RDF-specific tags, and the second line of code tells us where these tags come from: the w3.org site, which contains a vast store of information about advanced web technology. In other words, we can go to w3.org to find the precise definition of RDF specific tags.
RDF is engineered to also use other sets of tags, in particular, domain-specific tags. In this example, these tags come from a (non-existing) url called someurl.org. The tags themselves are prefaced with “zx:” in the rest of the code, so we know which tags are native RDF and which come from a domain-specific set of tags (called a namespace).
The xml “element” called Description is an RDF-specific tag that tells us we are giving the description of some resource on the Web, namely one at a (non-existing) website called awebsite.org.
The whole piece of code is one triple: It says that the topic of the resource at www.awebsite.org/index.html is funstuff. Here it is as a triple, with all the xml syntax and the namespace information removed:
www.awebsite.org/index.html <topic> funstuff.
Let’s overview this again. RDF is an XML language, so it uses the syntax of XML. One of the primary concepts in XML is that of an “element”, and Description is an XML element, one defined in the RDF standard. The piece of code begins with two namespace statements, one telling us which RDF specification we are using, and the second telling us that we will also be using some tags from another, domain-specific specification, which includes the tag “topic”. Then there is the guts of the triple, telling us that we are listing the topic of a Web-resident resource.
More on this in the next posting…
This blog is dedicated to advanced Web technology. But today, we’ll take a little break and look at something very different. (Please look at previous blog postings for information about the Semantic Web and Web 2.0/3.0.)
Professional training in computing.
I’m back home now, but happen to have spent the last three weeks in India, visiting a large multinational corporation based in India, and with offices all over the world. It is called Infosys. The home office is in Bangalore, but I was in Mysore, which is a couple of hours away. This Infosys site is the home of their Global Education Center, where they bring thousands of young university graduates from around India to be trained in advanced computing skills. The scale and quality of what’s going on here is very impressive, enough to lend credence to the often repeated refrain that India will soon surpass the West in its development of cutting edge software technology.
Today: universities in the US.
U.S. companies are far less likely to make this sort of investment in their young people. I am a professor of computer science, and over the years, I have seen a wide gulf develop between the narrowly-scoped, somewhat formal computing courses offered by universities and the vast world of complex software components that a modern programmer must master. Very little is built from scratch today in the software world, but university students see little of that vision. They also don’t really learn that very few development efforts sit nicely inside a traditional computing areas like programming languages, databases, algorithms, or distributed computing. They don’t see the interdisciplinary nature of emerging computing applications, in areas like medicine, science, and engineering.
In sum, computing students certainly need a strong, conceptual foundation so that they can develop an intuitive understanding of just how to attack computing problems, but academic computing continues to move further and further away from the real world.
Yesterday: my training.
When I was a young college graduated, long ago, I had a BA in Mathematics, with a sprinkling of computing courses, and had very little in the way of marketable skills. I was hired by EDS (Electronic Data Systems), which at the time was still owned and run by its founder, Ross Perot. I went through an intense training program, something that was admittedly too applied and lacked a sound, broad-based abstract substrate. I learned to program, not to understand the world of computing technology as a whole. I lacked the big picture. It was the opposite problem of what we see in universities today.
But that applied training at EDS was still critical in getting me started in computing, and later when I went back to grad school to finally get some formal training, my applied experience, knowledge, and intuition developed in the EDS training program served me very well as I entered the research world.
Today, U.S. companies are finding it too expensive to make substantive investments in training their young people. Graduates from undergraduate computer science programs are expected to hit the ground running – or rather, coding. Coding, coding, coding, attacking real problems and building real solutions, often in team environments and using a broad swath of existing software technology they often have never seen before. My students come back from job interviews telling me they were grilled on software technology, and often expected to sit right down and build a solution to a real problem – as part of the interview process itself.
But Infosys is doing what U.S. companies are finding it hard to do. Here’s the background. India is covered with engineering schools. Hundreds of them. Some say a thousand or more. It’s an industry there, with young engineers being pumped out by private colleges far faster than India can make use of them. These students are bright, determined, respectful, and extremely hard-working.
But no one is training Indian students to be software people, at least not in the universities. The engineering schools don’t have the faculties to teach computing.
So Infosys has decided to recruit the top graduates from the top engineering schools in India. Then Infosys sends them to Mysore, to the GEC, as it’s called, where they are given several months of nonstop, day and night training. I was there helping to train some of their instructors on the process of teaching database management and related technologies.
The GEC facility sits on a many-acre campus that is beautiful. It is densely landscaped with countless species of tropical trees and bushes. The grounds are manicured nonstop. The building I taught in is reputed to be the largest single building built in India since it got its freedom from Great Britain. The ceilings are vaulted, the floors are polished granite, there is a several stories high atrium in the center, and the outside has arches and pillars and a dome. Inside and out, it looks like an oversized palace. It is elegant, and not at all garish.
Again, Infosys is doing things that aren’t done that much anymore in the West. They have built a campus and a building that are truly works of art. I lived on campus in a beautiful room, ate a couple of hundred yards away at a fine restaurant, went to an equally close gym every morning, and rode golf carts to work. The students live and work in this same environment. Okay, they don’t get to ride golf carts, I admit it.
Visitors are blown away by what has been built there.
Infosys has tapped a large, impressive generation of young people, and as a result, India is building up a vast, powerful human technology machine. Meanwhile, in the U.S., we have much better computing programs in universities, but we are struggling to get students interested in computing and to enroll in our classes. And we neglect the hands-on side of software education. And, compared to Infosys, U.S. companies are not investing anywhere near as heavily in grooming the next generation of computing professionals.
Imagine what will happen when Indian engineering colleges start adding true computing curriculums. Combined with Infosys’s efforts, these will be extraordinarily skilled young people. And there are thousands and thousands of them.
India will do big, big things.
This blog concerns advanced Web technologies. Each posting should be readable on its own, but the series of blogs as a whole tell a continuous story.
In this posting, we look at the Duct Tape Phenomena.
As a researcher, I have worked with biologist in the past. Big biologists, not microbiologists, the folks who tinker with DNA. The folks I worked with study macroscopic things mostly, species, in particular. They search for as-yet undocumented species. They tend to have appointments at major universities around the world, and then take extended field trips to study life. Most of them go to rain forests because that’s where biodiversity is its greatest.
Each scientist has a chunk of the world and a kind of animal they specialize in. I know the butterfly man of Costa Rica, a fellow who has documented several thousand varieties of butterflies, some of which have wing spans of several inches. I know the bug man of the Amazon, who builds long tunnel-like things from the floor of the forest up to the canopy, fills the tunnels with bug killer, and then looks among the dead for bugs that are yet unheard-of.
Here’s the interesting part, at least from a computing perspective: a lot of the scientists I came into contact with store their data in Excel. This is a phenomena that crosscuts the entire spectrum of computer users. They had to learn Excel at some point, maybe in school or at some workplace, and the next time they needed an application to do something, they found a way to make Excel do the job. For most people, learning the “right” application to use is far too much work, even if it’s hard to query Excel the way we would a database, even if Excel spreadsheets get way out of control size-wise, given the large amount of data many of us collect.
Excel, in many ways, is the duct tape of desktop and notebook computing.
Firefox (or your favorite browser).
But what about developers of desktop apps? What do they use as a design paradigm when building the interface to an app, even if it’s not meant for the Web?
Indeed, there is a merging of desktop GUI and web app interface technologies, and now you could sit down in front of a running app and not be sure which of the two you are seeing. In fact, the design impact is not the end of it. We actually use browsers now to interface with some desktop apps, but not often, not yet. However, at least as a user interface paradigm, the browser is becoming the duct tape of GUI design.
For developers of interfaces, Firefox has become a sort of duct tape.
The new Web.
These are the two things that underly much of computing: the need to store and compute (as with Excel) and the need to interface (as with Firefox). But when the new Web, (in the form of the Semantic Web and truly advanced Web 3.0 apps), begins to arrive, will a new paradigm emerge?
Perhaps they will be extra smart browsers that can process code written with xml and namespace and other semantic technology, so they can do more than just look for pages according to the English keywords on them.
In other words, we could imagine them as extensions of what our browsers do for us now. They’re very stupid now, really. They’re not at all smart like Excel.
How does it work now? Crawlers commissioned by search engines like Google constantly search the Web and “invert” every static page they find by building an index on every word in them. And then later, we can search this gigantic index store according to the words that appear on the pages that the crawler has found. Once we find URLs of interest, we click on them and go visit the actual pages. These searchers are far, far less than “semantic” in nature.
Our smart browsers will also have to let us build up organized libraries of specialized web content we have found, including documents, images, video, sound, animation, and such specialized data as medical treatment advice. We might maintain these in virtual space, or we might download frozen copies of pages to store on our machines. Our smart browsers could constantly look for updated versions of pages we have copied and downloaded.
These smart browsers will also have to interrelate data of a wide variety of sorts, so that a description of certain symptoms can be accurately hooked up with the specifics of a diagnosis and a medical treatment plan. Our browsers will have to isolate conflicting information, as well.
So, in the future, we’ll need browsers with smarts. We’ll look at this much more carefully in a future posting of this blog, but for now, here’s the lesson: thats the two things that applications do for us, they let us store and search things, and they let us compute things.
And what about viewing all this information? How will so much complex, multimedia information be presented? Not as simple webpages with images, text, and things you can click on. Perhaps the new browsers will lay out multimedia presentations of complex, integrated information that has been synthesized from many, many different sources.
So, what does this imply? That these two things underly computing apps of almost all sorts: 1, storing and searching, and 2, viewing and manipulating.
And they will underlie the most complex and sophisticated end-user applications of the future.
In a vague, somewhat analogous fashion, most apps are a blend of Excel and Firefox.
Things change radically over time. And things never really change at all.
This blog concerns advanced Web technologies that can be roughly described as being part of the Web 2.0 and the Semantic Web efforts. Most recently, we’ve looked at technology that will either buttress new Web development technology or take advantage of it. In particular, in the last posting of this blog, we looked at the Internet of Things and ubiquitous computing, and how they might interface with advanced Web applications to produce a combined, more powerful computing environment. We’ve also looked at New Songdo City – the u-city – and how it will at least indirectly serve as a testing ground for new Web technology.
Ambient Intelligence: A Powerful Enhancer of Advanced Web Technology.
In this blog entry, we’ll look at another new technology and how it might dovetail with the new Web. It’s called “ambient intelligence”. Like other software advances, although it is not directly related to the Web, it will dovetail beautifully with new Web technology.
We consider how ambient intelligence will make the Web radically better at serving individuals.
Ambient Intelligence: Just What Is It?
The term refers to computerized devices that tailor their behavior according to the nature of each user. First of all, though, we should make it clear that this is not a particularly new term, that it does not have a highly specific definition, and there are lots of other terms that have been used to describe similar concepts. But there is something focused that is emerging under the banner of this name.
Ambient intelligence is commonly discussed in the context of embedded devices, machines that have processors in them and that perform specific information-based tasks, as opposed to being general purpose programmable computers. Embedded computers are in cell phones, our automobiles, and “smart cards”. Sometimes, they can indeed be programmed to do almost anything, like the ones inside cell phones. But even then, it’s assumed that very few people will do so. The point is that they generally do not have displays, keyboards, or mice dedicated to their use. They are found inside small and large devices, as well as in the smarts of complex systems, like assembly lines. Mass produced, but sophisticated items like insulin meters have computers in them.
As an example, you could imagine that the vending machine you put money into tomorrow might already know that you drink nothing but 20 ounce Pepsis. Maybe every vending machine in your complex at work knows your habits. Maybe if you switch to Sprite on one machine, it will tell the rest. Maybe the machines will offer you one or the other until a new pattern seems to emerge and it appears that you will never again drink Pepsi. Or you might be able to enter your “favorites” on the corporate website, and declare what you prefer to drink. The machines will know – and so will the company that services those machines. All of this could happen without human intervention.
Ambient devices don’t have to specifically target individuals. You could imagine a computing system in an airport that can smoothly transition between human languages, customs, and regulations, to better serve a global audience. We’re very close to this sort of thing right now, actually.
Ambient Intelligence at the Fingertips of the Web.
But wait. Let’s get back to that vending machine. How do they communicate with each other to pass on the critical news that you’re a Sprite person now? How do you enter your favorites? How does the vending machine company get the news so they know what to order?
The Web. Those ambient vending machines use the Web.
On the Web, embedded devices can be engaged by web applications and
web services. (Remember that web services are programmatic interfaces to services;
i.e., they don’t have to be activated by a human using a browser.)
Embedded machines can also initiate web services, as well as trigger “push” tasks,
whereby a user on a client machine somewhere is told that something is happening and itʼs time to get to work. The embedded device and the user could be on opposite sides of the world, thanks to the Web.
RFID Technology: Tracking Things.
We’ve already looked at RFID technology.
As a reminder, the goal of RFID-based systems is help us coordinate and carefully control the use of various objects. Of particular interest are mobile objects. One of the key components behind this idea are RFID tags. RFID stands for “radio frequency identification”. A tag can be attached to almost anything. After they are deployed, an RFID reader can send out a signal, which is picked up by the RFID tags, and then respond. As things move around, as things are used in concert to perform tasks, they can be carefully tracked and managed.
There’s another aspect of ambient intelligence. When people talk about a device that has ambient intelligence, often they are referring to a dedicated devices with a simple display, not a general purpose computer. By this quality, the soda machine example is a bit rudimentary, in that it probably doesn’t have any true native display at all, and the indirect way of accessing it, at least according to our example, is too general purpose – a website that is accessed with a full blown computer.
Consider something that is a major topic of discussion now, and a subject we will return to in this blog in the near future: electronic health records. The idea is that we would have life-long electronic medical information bases that would be accessible to medical providers (with our approval). This way, the fact that I had some disease as a child that makes us vulnerable for some other disease
later in life would become apparent to my family doctor, and the necessary screening exam would be scheduled periodically. Otherwise, how am I supposed to know about the consequences of something that happened when I was a toddler? My “EHR” would also hold prescription records, imaging data, and anything else related to my health. It would, of course, be a web-based app.
But various sorts of doctors – not to mention non-medical types like me – need information displayed and abstracted in special ways. My family doc might want to see everything is its raw form, if for no other reason than my doctor would be expected to know my medical history, if it were readily available. (And yes, if I had a chronic disease or were the caregiver for someone with a chronic disease, the immense size of the EHR would be truly overwhelming. I imagine that doctors might be afraid of being expected to process huge EHRs belonging to new patients.)
Now, consider an emergency room doctor. If I was lying on a bed in an emergency room, not conscious, having just collapsed and complaining of a terrible pain from a horrendous headache, and from nauseous, and unable to answer questions, the doctor needs data fast. The display that the ER doc uses would not be on a general purpose desktop computer, would not provide that massive raw data view, and would present information in a highly readable form.
Most importantly, that computer would have to be instantly adaptable to suit the needs of an emergency, and then later, go back to a non-emergency mode, to be of help in further treatment.
Or, it might be that the web server and not the machine in the ER, contains the ambient software. The machine in the ER might be a very simple client. But either way, the combined web application and local client would have to be capable of searching my online EHR, to look for possible problems, and to display them. It might deliver up the fact that just this morning, I had minor surgery on the baby finger on my left hand – and since I was so squeamish, I was given general anesthesia.
Boom. The ER doc figures out that my headache is from high blood pressure, which, along with nausea, is a common side effect of anesthesia, and it can hit hours later. The doc now knows that if I’m given a blood pressure reducing drug, I’ll be fine. But I might have to first be given an anti-nausea drug, and obviously, I wouldn’t be able to swallow that and keep it down, and so it would be administered at the other end of my food processing subsystem.
Wait, one more thing. What about RFID tags? Maybe I have one around my neck, and that’s how the doc figured out who I was in the first place, since I was stumbling around with no driver’s license. The machine in the ER scanned the tag – and voila.
The Reach of the Web.
If you think about it, by leveraging the Web, ambient devices can be empower in incredible ways – and in the years to come, we’ll see a new generation of such web applications emerge.
(Finally, if my medical scenario is ridiculous, and you are a medical professional, then I’m sorry.)
The Internet of Things, ubiquitous computing, and something amazing.
The goal is for every posting of this blog to be understandable as a standalone posting, and so let’s review a few things quickly.
And in the last posting of this blob, we looked at the Internet of Things, and how it might interface with advanced Web applications.
As a reminder, the Internet of Things refers to the use of tracking technology such as RFID tags and other wave-based devices. Computers can easily be programmed to coordinate the use of everyday objects and track their movements. RIFID tags, which are cheap and can be mass produced, have been used all over the world to track products and components in factories and warehouses. They are used to catch shoplifters, as well.
In the previous blog posting, we also took a quick look at “ubiquitous computing”, which refers to the spread of computing technology into every aspect of our lives.
Multilevel integration of computerized tasks.
What’s the lesson?
The world of advanced web apps is merging with the worlds of ubiquitous computing and the Internet of Things. We see a future emerging where computing is deeply integrated into every facet of life. This includes such things as law enforcement, supply chains, manufacturing processes, retail shopping, and education.
Importantly, with multiple levels of computing working in unison, complex tasks will be performed online, and humans will not have to intercede to keep things going. As an example, from the initiation of a shopping session by a web user, to management of the tiny parts that make up a complex product that has just been ordered, the process of shopping will be automated.
All of this will deliver great power to the lone web user.
But is this future a faraway dream, one that depends on technology that has not yet been developed, or is it being built right now?
New Songdo City: the U-City.
The answer is yes, it’s here.
There’s a new, model city being built in South Korea. Much of it is already in existence. It is called New Songdo City, and it is touted as the “u-city” of the future. The u stands for ubiquitous. There are other ongoing developments and many other planned developments throughout the Middle East and Asia.
The concept of a u-city is disturbing to some people in the west, because of privacy concerns, but it’s probably unavoidable.
Importantly, New Songdo City isn’t an existing, older urban area that is being “computerized”. Lots of existing cities around the world (including the United States) are introducing city-wide wifi, adding dynamically-changeable subway and bus routing, and providing information kiosks for visitors and businesspeople.
But New Songdo City will be a mid-sized city, built from the ground up according to a design paradigm that has as one of its primary goals the smooth introduction of computing wherever it seems useful. This city will be the home of several tens of thousands of people, will employ a few hundred thousand, will have a high tech centralized business district, and will have several cultural facilities. There will be a high tech hospital and a golf course.
And computing technology will be everywhere, visibly and invisibly.
For example, a single, integrated smart card, armed with an embedded microprocessor, will get residents, visitors, and workers rides on the subway, time on parking meters, and access to movie theatres. A smart card will even get the free loan of a city bicycle.
In New Songdo City, there will be an Internet of Things that serves every citizen. It’s not clear exactly what will exist when the city is complete (perhaps in a few years), but one often-repeated promise from the New Songdo developers is the use of RFID technology that will credit people every time they toss a bottle into a public recycling bin.
The U-City as a laboratory.
The potential uses of RFID technology are almost endless, and in fact, New Songdo, with its ubiquitous computing infrastructure, will actually serve as a giant Internet of Things experimental platform. People, city services, and countless RFID tagged objects will be part of the real world laboratory.
Injecting Smarts into the Semantic Web and Web 2.0/3.0.
In our continuing series on advanced web technology, we’ve looked at the difference between the Semantic Web and Web 2.0/3.0. We’ve also looked closely at the Semantic Web, and in particular, we’ve discussed what we mean by that word “semantic“. And with respect to Web 2.0/3.0, we’ve considered just what constitutes an advanced web app. And we’ve looked at some specific advanced apps.
But one thing has stood out above all else: the new world of web applications depends on our ability to make web apps smarter. At the core of this are a handful of key technological advances: namespaces, XML languages, full text searching, and web services. Still, as we have seen, we can only crudely mimic intelligence, which we do largely by using a complex mixture of standards, heuristics, and pre-made components.
Importantly, this issue of being smart is very old, and has been a far off goal of the folks who build software development tools since the very early days of computing. In truth, some of the things that seem new and exciting to us have actually been around for a long time, and have existed under multiple names.
But this base of intelligence-injecting technology, could it be used to give the Semantic Web and Web 2.0/3.0 a shot in the arm? Can we leverage the greater world of smart technology to make the new web even more powerful?
Let’s focus on just one technology that has been around a while, but is still vibrant and rapidly growing.
The Internet of Things.
This idea is centered around the idea that the objects in our world would serve us a lot better if computers could coordinate their use. Of particular interest are mobile objects. One of the key components behind this idea are RFID tags. RFID stands for “radio frequency identification”. A tag can be attached to almost anything. After they are deployed, an RFID reader can send out a signal, which is picked up by the RFID tags, when then respond. As things move around, as things are used in concert to perform tasks, they can be carefully tracked and managed.
Other technologies for tracking objects can be employed, too, and RFID is just one example of something that is fairly cheap and very dependable.
It’s also true that objects can respond with more than a “Yo, I’m here.” In particular, they are likely to tell us exactly where they are, and whether they are in use. But for the most part, these things tend to be fairly inert when it comes to intelligence. They might be warehouse items or objects in retail stores. Volume is a key factor. RFID tags are cheap enough that an organization can tag tens of thousands or hundreds of thousands of items.
Immobile Things, but Mobile Users.
We can use the Internet of things concept in another mode. The objects might be immobile, but the users might be highly mobile, and they might be carrying the tags. The objects might have computing capabilities in them, as well. If I work in a secure facility, and if I use a variety of computing devices in the course of the workday, I can be carefully tracked. And every machine could be engineered to allow me to perform only those functions for which I have been authorized. The computers could also track suspicious trends that involve multiple machines and multiple users over a period of time.
The Internet of Things and the Internet of Web Apps.
What does this all have to do with the Internet we are concerned with in this blog, the one that hosts next generation web apps? The two worlds could be blended together.
Consider this. When we buy things on the web, we normally use one of two retail models. If the object is software or data or in any downloadable electronic form, the website can ensure that by the end of the shopping session, our credit card has been paid and we have received the goods. This makes both the seller and the user happy.
Or, if the object is physical, like a printed book, the website will ensure that by the end of the session, our credit card has been charged, and we have been given a shipping number, a shipping date, or some other piece of information that gives us some assurance that we will get what we paid for. In this mode, the seller is likely to be quite happy, and the buyer might not be quite so happy.
But there’s another way. At the end of retail session, the buyer of a physical product could be given the ID of the particular object being purchased, and then, via the retail website, track that object nonstop from the moment the session ends until the moment it arrives. The buyer could even track the construction of a purchased object out of many subcomponents.
The Bigger Picture.
Here’s something to think about, something else that can be used in concert with the advanced web technology and the Internet of things concept. It’s called “ubiquitous computing”, and it is a concept that has been around for many years. It refers to the expansion of computing technology into every aspect of our lives.
Putting all of this technology together means that the new web is working its way into law enforcement, supply chains, manufacturing processes, retail shopping, education, etc., etc., etc.
This will have a huge impact over the next decade.
The Hidden Web.
The Semantic Web – a primary topic of this continuing blog series – will help us search the web with greater ease. One of the things it will (hopefully) do is expose a vast sea of information that is currently invisible to our web browsers. In fact, some say that right now, we can see less than 1% of what’s out there. I cannot vouch for this number, but I can say that what we cannot see right now includes large volumes of extremely valuable data.
Perhaps you have heard of the mysterious “Hidden Web”? So, what is this stuff and where is it?
Forms, Databases, and Interactive Interfaces.
The Hidden Web refers to data that is out there on the web, publicly accessible – but only via webpage interfaces that are opaque to the indexing software of search engines like Google.
Let’s step back for a moment.
The way search engines work, in case you don’t know, is by constantly searching the web, looking for new webpages. When a new page is found, it is added to the search engines index, meaning that now, when people search the web with Google, they might get the URL for that page in their search results.
The important thing to note is that the primary source of information that Google uses when it indexes a page is the page itself. What words are on it?
This sounds great for static webpages that are stored as-is on websites and delivered as-is to the Google user.
But suppose we want Google to find dynamic pages? A typical dynamic page has content that isn’t known until an interactive user types some words into a web “form”. A web form is a page where the browser user fills in blanks and then lets the browser send the completed page back to the server. There, the information in the form is used to select other information, which is plugged into a “dynamically” created page that is sent to the client machine and viewed by the browser user.
So, I might visit Amazon. I navigate to their search page, which is a form, and I type in the title of the book I want. That information goes back to the server. A description of this book, including its cost, is plugged into a dynamically created page, which is then downloaded to my machine so that I can read the material with my browser.
Indexing Dynamic Pages.
So, if I have information that is not sitting in static pages, how can I get Google to index this information? There are multiple ways. For example, if the primary job of your website is to create large volumes of dynamically created pages, you might want to create a special directory page for your site – a static page – loaded with all the right words, and that contains links to the pages and forms you want the user to discover.
On the future Semantic Web, you might want to make sure that those magic words come at least in part from globally accessible namespaces, so that people who are using next-generation browsers, and who will be using these namespaces as a source of search keywords, will find your static page. As we have discussed, namespaces will provide us with detailed sets of terms, which will be tied to specific domains. This will make the search for static pages far more efficient than it is now.
As an example, a namespace concerning books might have words like ISBN-10 and ISBN-13. If the web designer uses these terms to describe static pages about books, and if the user of the browser can specify that they are looking for ISBN numbers, the browser will have a much more detailed idea of what is meant by those 10 and 13 digit numbers the user types in.
Here’s the critical part. Right now, Amazon lets you search by the these numbers on their specialized web form page, but imagine if you could at any time tell your browser to look for ISBN numbers on whatever webpages it searches.
An example of a namespace that is used to describe documents on the web is the Dublin Core, by the way.
So, that’s one way to make your dynamic pages somewhat visible. Create a web page that is static and leads to the pages you want users to see, and to make it all the more powerful, use terms from a globally accepted namespace like the Dublin Core. This is something that is already partly doable. The Dublin Core, along with other namespaces, are in wide use.
Where Does that Information Come From?
Is there a better way, though? This technique will only point users to our static web directory, which will then enable interactive users to find our web forms. The users must then use our forms to get detailed data. Could the searching for dynamic pages be made more automatic?
Well, where does data in dynamic pages come from? Often from large databases built with such database management systems as Oracle, SQL Server, MySQL, PostgreSQL, and DB2. This is why some folks conjecture that the amount of information in the Hidden Web is vastly bigger than the web we see today. Databases can be BIG.
Imagine all the information on the ancient Pharaohs, genetic diseases, investments, philosophy, and countless other topics is sitting inside databases that right now are only accessible via web forms. Right now, we Google keywords like “pharaoh” and the first things we see are static, highly condensed Wikipedia pages, and perhaps some static pages posted by museums and academics.
What Will the Semantic Web Do?
The Semantic Web will have as a primary challenge the ability for us to ask for information, and know that the search space will contain information tucked away in databases dotted all around the globe.
This is a very complex problem. Right now, we need a human sitting at the keyboard of the client machine to navigate to the correct URL and then type terms into a web form. In the future, web designers will need ways of capturing information about what is contained in databases, and to specify that information in a fashion that browsers can access. And this information will have to be very detailed, sometimes very intricate.
The browser will also have to take information specified by the user and match it up with the information that describes databases on the web. This means that we will need some automatic way to search databases without a user interactively and incrementally screening tens or hundreds or thousands of URLs. In an earlier blog posting in this series we described one possible technique called “triples” that might, combined with namespaces, provide a partial solution to this problem.
We will look at this again, more closely, in a future blog posting.
The Semantic Web.
This blog concerns advanced Web technology, in particular,Web 2.0/3.0 and the Semantic Web. Each blog entry should be fully understandable on its own, but the blog as a whole tells a continuing story.
Very roughly, we’ve defined the Web 2.0/3.0 as the class of emerging web applications that are highly responsive, to the point of being competitive with desktop apps. Another characteristic is that they can manage large volumes of very complex media, like images, sound, and animation, as well as interconnected forms of media. We’ve looked at some specific advanced web applications.
Our concern here, in this blog entry, is the Semantic Web, which we have also roughly defined. The Semantic Web is something that does not yet exist, but would meet the very aggressive goal of supporting largely automatic web searches, freeing us from excruciatingly interactive, manual Google and Yahoo sessions. And we’ve seen that we would use such things as shared namespaces, intelligent full text searching, and XML-based markup languages to embed information in websites that could be used by smart browsers to perform far more accurate searches.
Web services would help a lot, too, by taking humans out of the loop when providing powerful web-based capabilities; one website can now provide a vast amount of information, for example, by silently using web services to collect information from many other web-based sources.
(By the way, we have also looked at precisely what we mean by “semantic” in the Semantic Web.)
The way we pay.
This all sounds very good. The Web would be far more useful, with automatically searchable Semantic Web-sites. But there’s a bad side to all of this, and it has to do with how we often pay for Web use.
The problem is that we often do not pay at all. At least not directly, with money. We pay by putting up with ads. Free email services, such as those hustled by Yahoo, Hotmail, AOL, and Mail.com, are generally accessed via web browsers, and we find the main pages of these email accounts stuffed with ads.
Some free email accounts even stick ads in your outgoing mail!
Often, the only way to get the ads stripped from a web mail interface is to pay a fee. We might also get more than just ad-free web mail pages; paying sometimes allows users to access their email with POP or IMAP protocols, via desktop clients (like Outlook and Apple Mail), thus avoiding ads in another way.
(As an aside, there are free email sites that either have no ads in them, or only very subtle ones. Try Gmail.com and Inbox.com. My favorite, with its clean interface and growing set of accompanying capabilities, is GMX.com.)
As it turns out, folks looking to buy ad space online find that they have a vast array of choices, and this drives down the cost of ad space. But these two things, an ever-growing list of free online services and cheap ad space, are related. This is because it is all too easy to build useful web applications. Like browsers, bulletin boards, calendar apps, blogging services, and stickies applications, email servers are cheap to build and maintain. Venders can use canned, largely free software components.
And, transmission costs on the Internet are effectively free, and the bandwidth is huge. Free email accounts often offer a gigabyte or several gigabytes of storage, because disk space is dirt cheap, too.
There is a lot of rebranding going on, too, where someone seems to be offering free email (or some other service), but it is actually being provided by a large email provider.
So, the way things have shaken out, is that free web apps like email servers look like NASCAR racing cars, covered with colorful ads. Many of these ads consist of video, and so we have to battle distracting, flashing colors so we can focus on our mail.
The trick behind online ads.
There is something happening in the online ad world: folks who provide these free, pay-for-it-with-ads services are learning to carefully target ads. There is specialized software available for this, and by plugging in some smarts, folks can make the ads that appear on your screen far more likely to be of interest to you.
How is this done? By watching what you type into search engines, by taking advantage of personal information you supply when you sign up for free email accounts and other services, and by carefully examining the content of the messages you send and receive, that’s how it’s done.
It’s important to point out that this works. The “click through” rate on ads can be radically improved, just by using some simple heuristics in choosing your ads. Folks who pay for ads love this, and it has allowed individuals who don’t even provide free web applications turn themselves in to ad space sellers. Your blog, your specialized website, can now host ads carefully targeted toward the visitors to your blog or your website.
But just wait for the Semantic Web.
But it will really kick in when the semantic web is here. The same technology that would make browsers far, far smarter about finding good URLs for you will make the targeting of ads at you extremely precise.
This slowly-emerging technology is badly needed by the folks who sell ad space and by the people who buy that ad space. That’s because you and I are starting to get used to this world of NASCAR websites. We are looking through or past or around the ads. They need to be made a lot smarter, is order to get our attention back.
But by using Semantic Web technology to radically increase click-through rates, by getting us interested in ads again, impulse shopping on the Web might skyrocket. It’s very easy to go from seeing an ad for a product you have never heard of before to having bought it.
Like little kids watching commercials for sugar-heavy cereals on Saturday cartoon shows, we will be manipulated like we have never imagined before. That’s the bad side to the Semantic Web.