Thursday, August 12, 2010

Google: Undisputed King of Disruptive Technology

Recently I came across some really funny lines about Google.

One of them goes like this - “I’ll hit you so hard that you’ll fall in a place where even Google will not be able to find you”

A board outside a church says Come in, Google doesn't have all the answers!"

All this really pushed me to go back & do a quick Google on “Google”.

Google can undisputedly be considered as one of the most disruptive innovations of 21st century. Google was founded in 1998 with an initial funding of just $100,000 & in 2004 Google was valued at $23 billion.

The Google web search engine is the company's most popular service which has done wonders for the company.

Very few people know that early in 1999, while still graduate students, Brin and Page (Google founders) decided that the search engine they had developed was taking up too much of their time from academic pursuits. They went to Excite CEO George Bell and offered to sell it to him for $1 million. He rejected the offer, and later threw Vinod Khosla, one of Excite's venture capitalists, out of his office after he had negotiated Brin and Page down to $750,000. Today undoubtedly then Excite CEO must be hitting his head really hard against the wall for letting go the biggest opportunity of his life.

So How does Google Works?? Well you can find millions of web results on Google on this topic. I tried to put all this information in a simple way. Of course, Google search engine is lot more complex but you can get some idea by reading the post below. A simple idea of how complex it is can be made from the fact that the patented PageRank algorithm of Google ranks web pages by considering more than 500 million variables and 2 billion terms.

Google search can be divided into three parts:-

1. Googlebot (or spiders) – is a web crawler (a software program) that finds and retrieves pages on the web and hands them off to the Google indexer. Googlebot runs on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. When Googlebot fetches a page, it culls all the links appearing on the page and adds them to a queue for subsequent crawling. By harvesting links from every page it encounters, Googlebot can quickly build a list of links that can cover broad reaches of the web. This technique, known as deep crawling, also allows Googlebot to probe deep within individual sites. Because of their massive scale, deep crawls can reach almost every page in the web. Because the web is vast, this can take some time, so some pages may be crawled only once a month.

2. The indexer - that sorts every word on every page and stores the resulting index of words in a huge database. Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google’s index database. Google has a complex data structure to store all this information which allows rapid access to documents that contain user query terms. To improve search performance, Google ignores (doesn’t index) common words called stop words (such as the,is, on, or, of, how, why, as well as certain single digits and single letters). Stop words are so common that they do little to narrow a search, and therefore they can safely be discarded.

3. The query processor - This compares your search query to the index and recommends the documents that it considers most relevant. The query processor has several parts, including the user interface (search box), the “engine” that evaluates queries and matches them to relevant documents, and the results formatter.

PageRank is Google’s system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a page with a lower PageRank. Google considers over a hundred factors in computing a PageRank and determining which documents are most relevant to a query, including the popularity of the page, the position and size of the search terms within the page, and the proximity of the search terms to one another on the page.

Indexing the full text of the web allows Google to go beyond simply matching single search terms. Google gives more priority to pages that have search terms near each other and in the same order as the query.

No comments:

Post a Comment