The revolutionary update of Google’s infrastructure
Google recognized the problem at the latest after the terrorist attacks of September 11, 2001: Only “outdated” content was available in their index and it was hardly possible to get websites with current content – preferably in real time – into the index. The entire infrastructure had to be updated.
The new index system Google Caffeine was published on 8 June 2010.
However, the frequently used term Google Caffeine Update is misleading, as Caffeine is not a conventional update like Panda or Penguin. Thus, this is not an algorithm change that directly influences the ranking in the search engine result pages (SERPs). Rather, the upgrade involves a more or less completely new infrastructure that has fundamentally changed the way Google’s indexing process works.
It was the first step in a process to increase the size, accuracy and completeness of Google’s own index and its indexing speed.
Interview with Matt Cutts, the head of the Web Spam Team of Google:
„The Caffeine update isn’t about making some UI changes here or there. Currently, even power users won’t notice much of a difference at all. This update is primarily under the hood: we’re rewriting the foundation of some of our infrastructure. But some of the search results do change, so we wanted to open up a preview so that power searchers and web developers could give us feedback.“ (Cutts, 2009)
This means that the new infrastructure contains only changes “under the hood”, i.e. under the hood of the Google search engine, so that most users do not notice any difference in the use of the search engine or in the search results.
Web developers, website operators and power searchers should, however, notice some differences. For this reason, a preview for webmasters and power searchers was made available at the beginning of August 2009 to preview and test the changes. And also to send feedback to Google for further adjustments.
The development of the Google Index and its infrastructure
In the year 2000 the index was renewed by Google every 4 months. As a result, in the worst case, a website operator had to wait a full 4 months for its modified content or new web pages to be included in the index. At the latest after the terrorist attacks of September 11, 2001, Google recognized the problem: There was only “outdated” content in their index and it was hardly possible to get websites with current content – preferably in real time – into the index. As a reaction to this finding, Google has since tried to update its index ever faster. At the end of 2001, the frequency of index updates was reduced to one month.
Other reasons for Google’s actions were the lack of responsiveness to current events, the ever-growing number of websites, and the increasing diversity of web page content.
Whereas in 2009 there were still around 200 million, there were already almost 1 billion registered websites in 2014. (Source: Statista, 2015)
To meet the increasing expectations of search engine users and website operators, Google finally had to act and create a way to get the new content into the index as quickly as possible and thus also make it available to users.
Immediately before the introduction of the comprehensive infrastructure update, Google’s search index consisted of several layers. The different layers were updated at different speeds. For example, the main layer was updated every few weeks. To update a layer of the old index, the entire web had to be analyzed and compared with the existing index. So there was a significant delay between the time the page was discovered and its visibility in the index. The already found page was parked in a queue until the entire web was searched. (Carrie Grimes, Google Software Engineer, 2010)
Google Caffeine was thus an “update” with a view to the future and should serve as a robust foundation for all future web search updates and make it possible to create an even faster and more comprehensive search engine from Google. The new search index was now based on an incremental, step-by-step and continuous crawling and indexing process. Now every second hundreds of thousands of pages were processed in parallel. Since then, the index has been able to grow with the increase in information on the web and deliver even more relevant search results.
A look at the hardware requirements: Google’s index uses more than 100 million gigabytes of storage and is growing by several hundred thousand GB a day as it is constantly updated. As a logical consequence, Google’s storage capacity and flexibility has increased dramatically.
With Google Caffeine the search engine was completely renewed. The way Google finds websites and implements them into the index has been completely restructured.
The web is now analyzed in small doses and no longer the entire network at once. This makes it possible to crawl each website separately and index the new content and web pages shortly after publication. In addition, new pages and content can be added to the index immediately after discovery and do not disappear in a queue for the next time.
Google Caffeine offers benefits for all stakeholders
Not only Google, but also search engine users and website operators benefit from the new infrastructure and the possibility to crawl and index current information within seconds:
„Caffeine benefits both searchers and content owners because it means that all content (and not just content deemed “real time”) can be searchable within seconds after its crawled“ (Matt Cutts, 2010)
The advantages for search engine users extend to the present day
Google Caffeine enables faster searches and offers around 50% more up-to-date search results than the previous index. By extending the index, up to twice as many results can be achieved. With the introduction of infrastructure changes, not only traditional website content will be indexed, but also various multimedia content such as news, news feeds, blog posts and postings from forums. The result is the largest collection of web content Google has ever been able to offer. By constantly updating relevant and up-to-date content, searchers can now find it much more quickly after it has been published. In addition, searchers can generally be presented with better search results, as there are more results in the index overall. (Carrie Grimes, Google Software Engineer, 2010)
Advantages for website operators
In addition to indexing speed, website operators are given increased freedom to create high-quality content for web pages, attract attention to new opportunities, and are even rewarded for using other multimedia elements on the site.
Google Caffeine also encourages website operators to keep their websites up to date, otherwise they will have to reckon with ranking losses. The search result lists give priority to websites that offer more up-to-date content than others. However, this does not affect sites where there is no news.
More info on the Caffeine Update