New Googlebot?

Discussion in 'Google' started by digitalpoint, Sep 15, 2004.

Page 1 of 2

digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#1

Has anyone noticed a new Googlebot lurking around?

I'm getting hit by two different kinds. The normal one:

66.249.64.47 - - [15/Sep/2004:18:59:12 -0700] "GET /robots.txt HTTP/1.0" 404 1227 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

and also this one:

66.249.66.129 - - [15/Sep/2004:18:12:51 -0700] "GET / HTTP/1.1" 200 38358 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Aside from the slightly different user agent, it's also HTTP 1.1. The IP address it uses is an IP block is normally just used for Mediapartners (AdSense spider), but it's spidering a site without any AdSense.

Also, the spidering pattern is different. Instead of using multiple IPs and getting groups at a time, this one seems to be a slower, steady spidering, multiple levels deep in a single pass.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Sep 15, 2004 IP
Old Welsh Guy Notable Member

Messages:

2,699

Likes Received:

291

Best Answers:

0

Trophy Points:

205

#2

This is the spider that G has developed that will read javascript and pull url's, and also can kind of read flash content. also logging as googlebot/new.

So all you javascript spammer beware

Old Welsh Guy, Sep 16, 2004 IP
fluke Guest

Messages:

209

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#3

How on earth does it read flash? (or "kind of" read flash?)

Just looked at my log files - and i see it - didn't look to far back but it came this morning about 40 minutes after normal Gbot

fluke, Sep 16, 2004 IP
Arnica Peon

Messages:

320

Likes Received:

14

Best Answers:

0

Trophy Points:

0

#4

I had the new crawl around a week or so but not since.

Mick

Arnica, Sep 16, 2004 IP
SEbasic Peon

Messages:

6,317

Likes Received:

318

Best Answers:

0

Trophy Points:

0

#5

Thanks for the heads up Shawn...

SEbasic, Sep 16, 2004 IP
xml Peon

Messages:

254

Likes Received:

2

Best Answers:

0

Trophy Points:

0

#6

I was gonna post a similar thread.

Initially I thought "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" was just someone who switched their user-agent.

That was until it grabbed 6000 pages. I got suspicious and check IP and odly enough it's on Googles IP range.

xml, Sep 16, 2004 IP
Redleg Raider

Messages:

360

Likes Received:

5

Best Answers:

0

Trophy Points:

0

#7

I had several visits by this new googlebot a couple of days ago.
Don't remember the exact IP addresses (about 15-20 of them) but here's the IP ranges. (I did write them down on a piece of paper):
66.249.78.* 66.249.64.* 66.249.79.*

Redleg, Sep 16, 2004 IP
a389951l Must Create More Content

Messages:

1,885

Likes Received:

65

Best Answers:

0

Trophy Points:

140

#8

Yeah just checked my log files and noticed it too.

Old Welsh Guy how do we know that it can read javascript?

a389951l, Sep 16, 2004 IP
nadlay Guest

Messages:

306

Likes Received:

4

Best Answers:

0

Trophy Points:

0

#9

One of my sites normally gets hit by Googlebot at the same time each day, but for the last 3 days, I've been getting two hits, with the second coming about 15 minutes after the first.

I thought it strange, but hadn't had time to investigate, but now I look in my stats, and I'm also getting both GoogleBots, as Shawn described.

nadlay, Sep 16, 2004 IP
flawebworks Tech Services

Messages:

991

Likes Received:

36

Best Answers:

1

Trophy Points:

78

#10

I've been getting this one all night: 66.249.65.212

flawebworks, Sep 16, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#11

This one hasn't grabbed any JavaScript as the Googlebot/Test bot did, but it is HTTP 1.1 like Googlebot/Test is/was. Just wish they would grab files compressed when available now (since 1.1 supports it).

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Sep 16, 2004 IP
SEbasic Peon

Messages:

6,317

Likes Received:

318

Best Answers:

0

Trophy Points:

0

#12

Just wish they would grab files compressed when available now (since 1.1 supports it).
Click to expand...

Could you clarify that please. Not too sure what you mean.

SEbasic, Sep 16, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#13

You can setup your servers to compress (basically gzip) your HTML documents before sending it to a browser (if the browser supports HTTP 1.1, it's an option... it's not an option for 1.0). For example, this forum compresses the HTML sent to you. The bandwidth savings on this are pretty big. For example, this forum's main index page (when I just tested it) is 44,007 bytes, but since it's sent out compressed (which the client side decompresses), the bandwidth used is 9,099 bytes.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Sep 16, 2004 IP
SEbasic Peon

Messages:

6,317

Likes Received:

318

Best Answers:

0

Trophy Points:

0

#14

WOW, that's a pretty big difference.

And the new GoogleBot doesn't take advantage of that then?

SEbasic, Sep 16, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#15

I didn't think so, but I just remembered that the server it's spidering of mine right now didn't have it turned on. So I just turned it on, and waited for it, and low and behold, it *is* using compression now!

That is bad ASS, and something I was wishing for.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Sep 16, 2004 IP
SEbasic Peon

Messages:

6,317

Likes Received:

318

Best Answers:

0

Trophy Points:

0

#16

I have a few questions about this if you don't mind - I really don't know anything about it.

1- So, are there duplicates of each file sitting on your server then, or does the server recognise the HTTP1.1 and then serve the file accordingly with the compression?

2- Does it put a lot more sress on servers if you are running it?

3- Does it increase loading times on the users browser - does it put more stress on the users CPu (I guess the difference would be neglegable if it does)?

I did think of more questions but I'm sure I could find the anwsers out if I looked hard enough.

SEbasic, Sep 16, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29
#17
It does not replicate data... it compresses it on the fly. It really depends on if your server is more bandwidth limited or CPU limited if it's worth turning on or not. I run it at the lowest compression level so it doesn't stress the CPU (my servers get a lot of traffic). Loading time should actually be a little faster for the user because they have less data to download. Really just depends on how fast their computer can decompress the file, compared to downloading a larger one.

A simple way to turn it on for PHP files only would be to add this to your .htaccess file:

php_value zlib.output_compression 1 php_value zlib.output_compression_level 1

Code (markup):

The higher the compression_level number, the better the compression (but more CPU overhead).
If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Sep 16, 2004 IP
SEbasic Peon

Messages:

6,317

Likes Received:

318

Best Answers:

0

Trophy Points:

0

#18

Thanks for that shawn.

So If I wanted to find a little more about it, what would be the correct termonology to use on a search.

How would that .htaccess file be used in reference to a .cfm extension?

SEbasic, Sep 16, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#19

The .htaccess thing is just for PHP files. Look for mod_gzip for Apache for server-wide compression.

You can find the mod_gzip project at:

http://sourceforge.net/projects/mod-gzip/

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Sep 16, 2004 IP
SEbasic Peon

Messages:

6,317

Likes Received:

318

Best Answers:

0

Trophy Points:

0

#20

Thanks for that. I'll look in to it. Could same some cash...

SEbasic, Sep 16, 2004 IP