September 24, 2004

Google's index size has plateaued

Google's index should be growing exponentially to keep up with the growing size of the web.  Google's main page indicates that it searches 4,285,199,774 web pages, and this number has remained the same for the last year and seven months, since around February 8-14, 2003, according to the Internet Archive's Wayback Machine.  Previously, it had been growing rapidly —  1,326,920,000 on Feb. 1, 2001;  2,073,418,204 on Feb. 6, 2002;  3,083,324,652 on Feb. 8, 2003;  4,285,199,774 on Feb. 14, 2004.   In other words, it had been growing at a rate of roughly 50% per year until it reached its present plateau.  Extrapolating, by now it should be over 5 billion.

I bet they are suffering from a 4-byte limit with their URL identifiers.  With 4 bytes, which is the natural word-size for the inexpensive ia32/x86-compatible processors they are using, they can store 32 bits, and that means 232 different values, or 4,294,967,296.  They may be using some of the values for special purposes, and so haven't reached the absolute maximum, yet they are within 0.22% of the maximum.  Adding another bit or byte to store more URL ID numbers would probably slow things down because it would require their CPUs to do much more work when manipulating the IDs.  I suspect they have decided that they are in a engineering sweet spot and 4.285 billion URLs are enough for a while.   So they won't increase the index size until they switch to using 64-bit processors, which would provide enough bits to easily manipulate 264, or 18,446,744,073,709,551,616, URL IDs (that's over 18.4 quintillion).  

As a result of the plateau, there are an exponentially growing number "unimportant" (as measured by their PageRank) web sites that are not in Google's index.  An increasing number of web site owners and web site searchers will be rather unhappy with Google because of this, and the situation might not improve for months if not years.

December 17, 2004 Update:
Google finally increased their index size in mid-November, so It was only another month after this post that things improved.

Posted by seander at September 24, 2004 12:02 AM
Comments