Has anyone noticed a new Googlebot lurking around? I'm getting hit by two different kinds. The normal one: 66.249.64.47 - - [15/Sep/2004:18:59:12 -0700] "GET /robots.txt HTTP/1.0" 404 1227 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)" and also this one: 66.249.66.129 - - [15/Sep/2004:18:12:51 -0700] "GET / HTTP/1.1" 200 38358 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Aside from the slightly different user agent, it's also HTTP 1.1. The IP address it uses is an IP block is normally just used for Mediapartners (AdSense spider), but it's spidering a site without any AdSense. Also, the spidering pattern is different. Instead of using multiple IPs and getting groups at a time, this one seems to be a slower, steady spidering, multiple levels deep in a single pass.
This is the spider that G has developed that will read javascript and pull url's, and also can kind of read flash content. also logging as googlebot/new. So all you javascript spammer beware
How on earth does it read flash? (or "kind of" read flash?) Just looked at my log files - and i see it - didn't look to far back but it came this morning about 40 minutes after normal Gbot
I was gonna post a similar thread. Initially I thought "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" was just someone who switched their user-agent. That was until it grabbed 6000 pages. I got suspicious and check IP and odly enough it's on Googles IP range.
I had several visits by this new googlebot a couple of days ago. Don't remember the exact IP addresses (about 15-20 of them) but here's the IP ranges. (I did write them down on a piece of paper): 66.249.78.* 66.249.64.* 66.249.79.*
Yeah just checked my log files and noticed it too. Old Welsh Guy how do we know that it can read javascript?
One of my sites normally gets hit by Googlebot at the same time each day, but for the last 3 days, I've been getting two hits, with the second coming about 15 minutes after the first. I thought it strange, but hadn't had time to investigate, but now I look in my stats, and I'm also getting both GoogleBots, as Shawn described.
This one hasn't grabbed any JavaScript as the Googlebot/Test bot did, but it is HTTP 1.1 like Googlebot/Test is/was. Just wish they would grab files compressed when available now (since 1.1 supports it).
You can setup your servers to compress (basically gzip) your HTML documents before sending it to a browser (if the browser supports HTTP 1.1, it's an option... it's not an option for 1.0). For example, this forum compresses the HTML sent to you. The bandwidth savings on this are pretty big. For example, this forum's main index page (when I just tested it) is 44,007 bytes, but since it's sent out compressed (which the client side decompresses), the bandwidth used is 9,099 bytes.
I didn't think so, but I just remembered that the server it's spidering of mine right now didn't have it turned on. So I just turned it on, and waited for it, and low and behold, it *is* using compression now! That is bad ASS, and something I was wishing for.
I have a few questions about this if you don't mind - I really don't know anything about it. 1- So, are there duplicates of each file sitting on your server then, or does the server recognise the HTTP1.1 and then serve the file accordingly with the compression? 2- Does it put a lot more sress on servers if you are running it? 3- Does it increase loading times on the users browser - does it put more stress on the users CPu (I guess the difference would be neglegable if it does)? I did think of more questions but I'm sure I could find the anwsers out if I looked hard enough.
It does not replicate data... it compresses it on the fly. It really depends on if your server is more bandwidth limited or CPU limited if it's worth turning on or not. I run it at the lowest compression level so it doesn't stress the CPU (my servers get a lot of traffic). Loading time should actually be a little faster for the user because they have less data to download. Really just depends on how fast their computer can decompress the file, compared to downloading a larger one. A simple way to turn it on for PHP files only would be to add this to your .htaccess file: php_value zlib.output_compression 1 php_value zlib.output_compression_level 1 Code (markup): The higher the compression_level number, the better the compression (but more CPU overhead).
Thanks for that shawn. So If I wanted to find a little more about it, what would be the correct termonology to use on a search. How would that .htaccess file be used in reference to a .cfm extension?
The .htaccess thing is just for PHP files. Look for mod_gzip for Apache for server-wide compression. You can find the mod_gzip project at: http://sourceforge.net/projects/mod-gzip/