“…Large HDFS instances run on a cluster of computers that commonly spread across many racks. Communication between two nodes in different racks has to go through switches. In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks…”.
The idea is that the rack id of any data node can be obtained with the help of some processes, and it can be used to apply some replica policies.
For more information, have a look at the <a href=”http://hadoop.apache.org/common/docs/current/hdfs_design.html#Data+Replication”>Data Replication</a> section of the documentation.