I came across an interesting article and theory about how Google finds duplicate content. The theory is by a ex-Google employee, who worked there in 2000. So he himself is not sure whether the theory is in practice or not. Check this out - http://www.cs.umd.edu/~pugh/google/Duplicates.pdf Source: http://www.seomoz.org/blogdetail.php?ID=1516