Sunday, May 6, 2007

Beyond the structured web

If you can imagine your city library without computerized catalogs, without assignment of any specific aisle for specific category of books and just a computer that has all the text in all the books indexed. Assuming you found something, you could put it back anywhere you wanted.

Web is not a whole lot different today. How do you find a book in this library by "Nora Roberts" in fiction? If you were looking for information about marriage, will it pull books with "monogamy", "polygamy", and "civil union"? This is where "google" becomes totally ineffective in its search engine (besides its other problems)

At first step, SemWeb is an attempt to enable web applications to categorize and describe the information they are holding in a parlance that computers can chew on. At second step it goes further, inside the information. Something even latest library cataloging system could not do. For example, you could not look for cook books in a library that has a recipe with chicken and eggplant as ingredients, could you? With the information structured on the web, that should be pretty trivial.

All of this is still about all the available information arranged in a syntax, but how do you make sense out of it. Isn't "semantic" part of the semantic web supposed to imply that? Drawing meaning out of information? When does syntax become semantics?

Tim berners Lee vision of semantic web:

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.

Intelligence, unfortunately, has varying connotation. If you refer to my earlier post, the intelligence of finding an available Golf course, ensuring their ratings is above a certain specific level, getting busy/free data from invitees' shared calendar are just operations pieced together by a human who authored such an agent. The real intelligence must possess ability to reason deductively and power of inference and match such reasonings with an objective.

Meaning out of information comes from relationship of one data piece with another, resulting in a pattern that "intelligence" can piece together. Something we, human, are good at. Computers suck at it. Beyond the structure of information, which could be utilized for building useful applications such as the golf tee time scheduler or my personal shopping agent, this set of information still remains rather meaningless.

This is where the first word of the phrase "semantic web" gets confusing. There are two stages to SemWeb; one that is concerned with providing structure to world wide web and make it appear more like world wide database to computers. The second step to semantic web is to develop technologies to comb and find patterns in the rather chaotic stream of structured information. The second stage overlaps areas of artificial intelligence and much at research labs.

Regardless of where "semantic" in semantic web lands, one thing is very clear. We need "structured" web to build the next generation of application, something that is taking ground in new startup companies. freebase, launched by the parallel computing pioneer Danny Hills is attempting to collaboratively categorize and connect existing web in structured format. Cylive, co-founded by me, is a social publishing platform that allows an author to categorize, describe meta-data and store every piece of content in multiple attributes enabling structured publishing, something that can be easily chewed up by other semantic web agents.

In my subsequent posts, I will share my thoughts on challenges from implementation as well as business point of view of SemWeb.

No comments: