I checked my logs for my fonts site today and Googlebot has been downloading my fonts all night long (zip files). I quickly changed robots.txt to disallow the downloading and added a rel="nofollow" tag to the links. Have a look at some logs : 66.249.72.1 - - [19/May/2006:07:34:43 +0200] "GET /fonts/keyboard_light_ssi_light.zip HTTP/1.1" 200 34108 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.72.1 - - [19/May/2006:07:34:44 +0200] "GET /fonts/handelgotdbol.zip HTTP/1.1" 200 31566 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Code (markup): This has been going on for 7 hours now. Is this normal behaviour ? Does G now index stuff that is in .zip files
Yes indeed ! But what possible reason could they have to do it ? Would some people set their webserver to return text or html when a .zip file is served ? And if so then you'd expect G to be smart enough to stop downloading once it downloaded a couple of zips and sees they are all real zip files.
Bah you know perhaps i said a very stupid thing... Google is getting smarter everyday with a way of indexing Videos Pictures Documents, perhaps now they are attacking on the Archives Files and they try to find a way to index them... But i think on the other hand they want to get as much informations as possible for each sites owners to know who they are, imagine that a site owner got lots of pictures by a man molesting kidz in a archive file? How can you know there is that kind of pictures on his site without looking into the archive file? Eheheheh....