The name "Google Dance" was in the past used to describe the period that a major index update of the Google search engine are being implemented. These major Google index update occured on average every 36 days or 10 times per year. It was easiest be identified by significant changes in search results, and by an updating of Google's cache of all indexed pages. These changes would be evident from one minute to the next. But the update did not proceed as a switch from one index to another like the flip of a switch. In fact, it took several days to finish the complete update of the index.
Technical Background on Google
The Google search engine pulls its results from more than 10,000 servers which are simple Linux PCs that are used by Google for reasons of cost. Naturally, an index update cannot be proceeded on all those servers at the same time. One server after the other has to be updated with the new index.
Many webmasters think that, during the Google Dance, Google is in some way able to control if a server with the new index or a server with an old index responds to a search query. But, since Google's index is inverse, this would be very complicated. As we will show below, there is no such control within the system. In fact, the reason for the Google Dance is Google's way of using the Domain Name System (DNS).
Google Dance and Domain Name System (DNS)
Not only Google's index is spread over more than 10,000 servers, but also these servers are, as of now, placed in eight different data centers. These data centers are mainly located in the US, literally, in June 2002 Google's first European data center in Zurich, Switzerland went online. Very likely, there are more data centers to come, which will maybe be expansion over the whole world. However, in January and April 2003 Google has put two data centers on stream which are again located in the US.
In order to direct traffic to all these data centers, Google could thoeretically record all queries centrally and then send them to the data centers. But this would obviously be inefficient. In fact, each data center has its own IP address (numerical address on the internet) and the way these IP addresses are accessed is managed by the Domain Name System.
Basically, the Domain Name System works like this: On the Internet, data transfers always take place in-between IP addresses. The information about which domain resolves to which IP address is provided by the name servers of the DNS. When a user enters a domain into his browser, a locally configured name server gets him the IP address for that domain by contacting the name server which is responsible for that domain. The IP address is then cached by the name server, so that it is not necessary to contact the accountable name server each time a connection is built up to a domain.
The records for a domain at the accountable name server constitute for how long the record may be cached by a caching name server. This is the Time To Live (TTL) of a domain. As soon as the TTL expires, the caching name server has to fetch the record for a domain again from the accountable name server. Quite often, the TTL is set to one or more days. In contrast, the Time To Live of the domain www.google.com is only five minutes. So, a name server may only cache Google's IP address for five minutes and has then to look up the IP address again.
Each time, Google's name server is contacted, it sends back the IP address of only one data center. In this way, Google queries are always directed to different data centers by changing DNS records. On the one hand, the DNS records may be based on the load of the single data centers. In this way, Google would behavior a simple form of load balancing by its use of the DNS. On the other hand, the geographical location of a caching name server may impact how often it receives the single data centers' IP addresses. So, the distance for data transmissions can be shorten. In order to show the DNS records of the domain www.google.com, we present them here by the example of one caching name server.
How data centers, DNS and Google Dance are related, is with ease answered. During the Google Dance, the data centers do not receive the new index at the same time. In fact, the new index is transferred to one data center after the other. When a user queries Google during the Google Dance, he may get the results from a data center which still has the old index at one point im time and from a data center which has the new index a few minutes later. From the users perspective, the index update took place within some minutes. But of course, this procedure may reverse, so that Google switches it seems between the old and the new index.