Thursday, May 8, 2008

Power, Gas, Water and now internet meter

America is a global leader, in so many ways including technology. But when it comes to high technology consumption it stands behind a the little country of Estonia, at 24th place in the world. I am talking about internet, a technology that was born and raised here. To make the matters worse, the business cartels (yeah I am talking about the same set of guys who pretty much control oil, energy, cellular and all other lucrative industries with the nexus to politicians), have taken to internet now. One thing that we were proud of in terms of true democracy is now going to be controlled by industry behemoths.

The latest is a so called experiment by Time warner cable, which sells broadband through its Road Runner service. If you live in Beaumont, Texas, an additional meter that you are going to watch for in your house is internet meter. If you have plans to download and watch lots of movies on internet, apart from the bills that iTunes will ring you up with, be ready to pay overage charges for all those gigabytes that were outside your allowed limit. Yeah, kinda like cell phone.

If we really look close, maybe Time warner's move may not be targeted at making money off highest users of their broadband services. Time warner after all is a media company and would want people to watch paid television on their settop boxes rather than on internet and lose revenue to, say, Apple iTunes. So, there appears to be a dark side to the this move. A move that is justified by Time Warner as charging for premium usage.

You think that it's happening there and not going to happen in your 'hood? Think again! Comcast is already toying with this idea. How would the overage charges look? We don't need to look further than peeking over to our neighbor. Bell Canada, which meters service in some plans, charges its customer as much as $7.50 per additional gigabyte of download. A high def movie will cost you something like $30.

Sometime back we were only struggling with the concept of "net-neutrality". What if AT&T starts routing VOIP calls to its own CallVantage service at higher quality of service than to, say, a competitor like Vonage. AT&T is already accused of slowing peer to peer traffic on its networks. FCC isn't talking yet.

Just like the cell phone industry that takes close to $2400 dollar for providing me a contracted 2 year service and there is nothing we can do about it, do consumers need to sit down and watch the industry take control of internet too?

Thursday, October 18, 2007

Standards

From the time of VHS and BETACAM to DVD-R and DVD+R to Blue Ray Vs HDDVD, the industry has a tendency to "not" agree on any standards easily. It always takes a test of time and market adoption to make that agreement.

When it comes to web standards, thankfully, there is a body that is leading the efforts to get the majority of industry to agree. There are large initiatives such as DBPedia (providing RDF access to Wikipedia), Wordnet (Princeton lexical resource in RDF), Geonames (geographic information about the world) following the standards such as RDF and OWL already live.

However, when you look around, you will find that API to major web content providers are plenty and following different kinds of standards. Web sites becoming web services is good news, the only problem is that it is probably not going to help much in meeting the big visions of SemWeb. Some examples if such API are REST, JSON, extended RSS, PHP array etc.

Another simpler approach to information annotation, has gained some momentum lately and is loosely described as microformats. The idea being very simple, embed markup within HTML and let the application read that part in structured fashion. For example: your name, address and email address could be annotated on one page and by just handing that page to a service, it could extract all the information that it needed. A popular microformat is hCard, which describes contact information about people, organizations, and places.

This is a simple implementation but is obviously not scalable.

Next step and the big picture

What I imagine is that as the business models around structured web becomes clearer, we will see convergence of the APIs and formats. RSS 2.0 is extensible to whatever you want to describe in it and is a big contender for a unified API that is simple. RSS is well known and possibly the biggest use of any XML format on the web today.

While the business models are in process of working out, W3C initiatives on Semantic web standards are working out as well. RDF, SPARQL and OWL will see a wide adoption in acadamic and research institutions very soon.

What I definitely see is that there will always be competing standards but that is not a problem at all. Once the data is structured, it can be transformed on the fly.

Regardless of what standards solidify in market, a big step towards the structured web has already been taken. The next web will be a precursor to the semantic web dreams that Tim Berner Lee saw. And for all of us, it will continue to bring on challenges to create new products and services to make businesses run smoothly and our lives better.

Sunday, May 13, 2007

User powered content and SemWeb

As it usually happens with a emergent long term concepts, people tend to go overboard and attempt to boil the ocean and lose touch with reality. Semantic web in its current direction appears to be laden with fantastic descriptions but offers little incentives for business to agree on one standard.

Obviously, all new technologies take time to mature and get adopted by businesses but then one successful adoption comes with hundred other candidates that died their natural death in the lab. Let's see what issues might prevent the concept of web applications using other web application and create a fantastic meaningful web.

Inherent problem with user powered content

How a user contributed content is described in terms of its metadata has a very significant role to play in SemWeb. Metadata is what structured search engines would use to locate relevant information for SemWeb. Unfortunately, there are inherent problems [Metacrap].

People are lazy
Let's face it, most of us are lazy. Effective SemWeb is only possible if a complicated set of metadata in terms of taxonomy or conceptual relationship to one another is described correctly. Creating and maintaining these schemas or adapting a pre-existing one is no trivial task. Solution is not to open this to people because there is no incentive for a user to take the pain of going beyond a couple of clicks.

People Lie
It's no wonder that your inbox is inundated with spam that reads the subject line as "Here is the information you requested". It is equally no wonder that your web search for a "enterprise application integration" might result in hundreds of press releases from companies which have nothing to do with EAI.

Lying exists in real world to compete for business as much as in the online world. How do we ensure that metadata is reliable? This is one thing where democracy does not really work.

People don't agree on terminology
You can't force people to describe something your way. I might like to call a porn, an erotic thriller and I can't stop you to call a nude beach, a naturalist shorefront. Your wish!
However, this creates a big problem for SemWeb, because computers cannot figure this out reliably.


Sunday, May 6, 2007

Beyond the structured web

If you can imagine your city library without computerized catalogs, without assignment of any specific aisle for specific category of books and just a computer that has all the text in all the books indexed. Assuming you found something, you could put it back anywhere you wanted.

Web is not a whole lot different today. How do you find a book in this library by "Nora Roberts" in fiction? If you were looking for information about marriage, will it pull books with "monogamy", "polygamy", and "civil union"? This is where "google" becomes totally ineffective in its search engine (besides its other problems)

At first step, SemWeb is an attempt to enable web applications to categorize and describe the information they are holding in a parlance that computers can chew on. At second step it goes further, inside the information. Something even latest library cataloging system could not do. For example, you could not look for cook books in a library that has a recipe with chicken and eggplant as ingredients, could you? With the information structured on the web, that should be pretty trivial.

All of this is still about all the available information arranged in a syntax, but how do you make sense out of it. Isn't "semantic" part of the semantic web supposed to imply that? Drawing meaning out of information? When does syntax become semantics?

Tim berners Lee vision of semantic web:

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.

Intelligence, unfortunately, has varying connotation. If you refer to my earlier post, the intelligence of finding an available Golf course, ensuring their ratings is above a certain specific level, getting busy/free data from invitees' shared calendar are just operations pieced together by a human who authored such an agent. The real intelligence must possess ability to reason deductively and power of inference and match such reasonings with an objective.

Meaning out of information comes from relationship of one data piece with another, resulting in a pattern that "intelligence" can piece together. Something we, human, are good at. Computers suck at it. Beyond the structure of information, which could be utilized for building useful applications such as the golf tee time scheduler or my personal shopping agent, this set of information still remains rather meaningless.

This is where the first word of the phrase "semantic web" gets confusing. There are two stages to SemWeb; one that is concerned with providing structure to world wide web and make it appear more like world wide database to computers. The second step to semantic web is to develop technologies to comb and find patterns in the rather chaotic stream of structured information. The second stage overlaps areas of artificial intelligence and much at research labs.

Regardless of where "semantic" in semantic web lands, one thing is very clear. We need "structured" web to build the next generation of application, something that is taking ground in new startup companies. freebase, launched by the parallel computing pioneer Danny Hills is attempting to collaboratively categorize and connect existing web in structured format. Cylive, co-founded by me, is a social publishing platform that allows an author to categorize, describe meta-data and store every piece of content in multiple attributes enabling structured publishing, something that can be easily chewed up by other semantic web agents.

In my subsequent posts, I will share my thoughts on challenges from implementation as well as business point of view of SemWeb.

Tuesday, May 1, 2007

Data interchange

Since time immemorial, computer representation of data for information interchange has gone through millions of version of formats and standards. Almost every company, group, and consortium came up with its own format of data interchange, touting it obviously as the best of the lot. Now that semantic web is talk of the town (at least in the web 2.0 town), thankfully we are well positioned now with ubiquitous XML, resurrected RDF, and esoteric OWL.

A gentle primer on semantic web

Much of web today evolved over Mosaic and Netscape browser. It was an obvious choice for web based information providers to build everything that catered to these browsers. The result was astounding. The explosive growth is all in front of us.

Yet, there was one problem. Computers in this ocean screamed "Water water everywhere, not a drop to drink". Web got littered with information everywhere. Computers became faster, bandwidth availability grew exponentially, storage became cheaper. Yet, all the tall promises of early web such as the refrigerator that will order groceries when the bread was done, remained promises.

Sooner or later, it was imperative that web got structured in a way that one computer could discover, explore capabilities and create relationship between all the possible information on the web. Semantic web is the direction, web has to evolve into.

How does this direction affect applications of tomorrow? Let's talk about an example. Suppose I wanted to create this web application that will allow me to schedule tee time with my friends, at a top rated golf course within drivable distance. This will require the application to check calendar free/busy data from all different calendaring applications my friends use, query a sports directory for a list of golf courses within 50 miles from my home, consult multiple sites to filter this list to have overall user rating of 4 or more. Finally, it has to check each of the golf course site to check availability information for next 7 days.

It is not only a nightmare to build an application like this today, it is practically impossible. Why? because all the assumption about availability of information such as calendar and user ratings etc. are not available in a way my application could interrogate. To accomplish such a task today, one of the friends have to spend hours (if not days) trying to arrange that using the information available on the web or start a flurry of emails trying to gather it.

What if you wanted to buy a used camera today. Even today, you will spend countless hours surfing trying to find a deal. What if you had a software agent, you could just instruct the model and make of the camera you were interested in and it could find couple on eBay, one on craigslist and 3 different ecommerce dealers who were ready to ship it to you at the lowest price available anywhere. The agent would know your zipcode, estimate shipping and tax and perform the comparison all by itself. Is there an incentive for eBay to publish its information in structured fashion? Absolutely. Everybody wants a bigger audience.

Semantic web promises to allow creation of such application much easier, perhaps designed by user themselves.