Google Caffeine
Google have now launched a new search index called ‘Caffeine’ which is said to produce 50 percent fresher results for searches than their existing search index system and is designed to cope with the massive amount of new data on the web.

(Google’s graphic designed to show the difference between Caffeine and the old index.)
With this new indexing system Google will update its entire web index all the time and incrementally instead of crawling the web over a period of about two weeks to produce an index.
Previously Google updated its search results using a batch process. Google would crawl pages, process them and then index them. This process was continuous but all the documents in the batch needed to wait until the whole batch was processed. With Caffeine, when Google crawls a page, it will process it through the entire index pipeline and push it live almost straight away.
This new indexing system doesn’t necessarily mean that pages will be crawled faster but it will mean that once pages have been crawled, they will be available to searchers quicker than before.
Caffeine will now be able to process thousands of pages in parallel every second and according to Google it will take up about 100 million gigabytes of storage in each database and add thousands of gigabytes of new information each day.
Related posts: