1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

New Googlebot?

Discussion in 'Google' started by digitalpoint, Sep 15, 2004.

  1. #1
    Has anyone noticed a new Googlebot lurking around?

    I'm getting hit by two different kinds. The normal one:

    66.249.64.47 - - [15/Sep/2004:18:59:12 -0700] "GET /robots.txt HTTP/1.0" 404 1227 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

    and also this one:

    66.249.66.129 - - [15/Sep/2004:18:12:51 -0700] "GET / HTTP/1.1" 200 38358 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    Aside from the slightly different user agent, it's also HTTP 1.1. The IP address it uses is an IP block is normally just used for Mediapartners (AdSense spider), but it's spidering a site without any AdSense.

    Also, the spidering pattern is different. Instead of using multiple IPs and getting groups at a time, this one seems to be a slower, steady spidering, multiple levels deep in a single pass.
     
    digitalpoint, Sep 15, 2004 IP
  2. Old Welsh Guy

    Old Welsh Guy Notable Member

    Messages:
    2,699
    Likes Received:
    291
    Best Answers:
    0
    Trophy Points:
    205
    #2
    This is the spider that G has developed that will read javascript and pull url's, and also can kind of read flash content. also logging as googlebot/new.

    So all you javascript spammer beware :)
     
    Old Welsh Guy, Sep 16, 2004 IP
  3. fluke

    fluke Guest

    Messages:
    209
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    How on earth does it read flash? (or "kind of" read flash?)

    Just looked at my log files - and i see it - didn't look to far back but it came this morning about 40 minutes after normal Gbot
     
    fluke, Sep 16, 2004 IP
  4. Arnica

    Arnica Peon

    Messages:
    320
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I had the new crawl around a week or so but not since.

    Mick
     
    Arnica, Sep 16, 2004 IP
  5. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks for the heads up Shawn...
     
    SEbasic, Sep 16, 2004 IP
  6. xml

    xml Peon

    Messages:
    254
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I was gonna post a similar thread.

    Initially I thought "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" was just someone who switched their user-agent.

    That was until it grabbed 6000 pages. I got suspicious and check IP and odly enough it's on Googles IP range.
     
    xml, Sep 16, 2004 IP
  7. Redleg

    Redleg Raider

    Messages:
    360
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I had several visits by this new googlebot a couple of days ago.
    Don't remember the exact IP addresses (about 15-20 of them) but here's the IP ranges. (I did write them down on a piece of paper):
    66.249.78.* 66.249.64.* 66.249.79.*
     
    Redleg, Sep 16, 2004 IP
  8. a389951l

    a389951l Must Create More Content

    Messages:
    1,885
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    140
    #8
    Yeah just checked my log files and noticed it too.

    Old Welsh Guy how do we know that it can read javascript?
     
    a389951l, Sep 16, 2004 IP
  9. nadlay

    nadlay Guest

    Messages:
    306
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #9
    One of my sites normally gets hit by Googlebot at the same time each day, but for the last 3 days, I've been getting two hits, with the second coming about 15 minutes after the first.

    I thought it strange, but hadn't had time to investigate, but now I look in my stats, and I'm also getting both GoogleBots, as Shawn described.
     
    nadlay, Sep 16, 2004 IP
  10. flawebworks

    flawebworks Tech Services

    Messages:
    991
    Likes Received:
    36
    Best Answers:
    1
    Trophy Points:
    78
    #10
    I've been getting this one all night: 66.249.65.212
     
    flawebworks, Sep 16, 2004 IP
  11. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #11
    This one hasn't grabbed any JavaScript as the Googlebot/Test bot did, but it is HTTP 1.1 like Googlebot/Test is/was. Just wish they would grab files compressed when available now (since 1.1 supports it).
     
    digitalpoint, Sep 16, 2004 IP
  12. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Could you clarify that please. Not too sure what you mean.
     
    SEbasic, Sep 16, 2004 IP
  13. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #13
    You can setup your servers to compress (basically gzip) your HTML documents before sending it to a browser (if the browser supports HTTP 1.1, it's an option... it's not an option for 1.0). For example, this forum compresses the HTML sent to you. The bandwidth savings on this are pretty big. For example, this forum's main index page (when I just tested it) is 44,007 bytes, but since it's sent out compressed (which the client side decompresses), the bandwidth used is 9,099 bytes.
     
    digitalpoint, Sep 16, 2004 IP
  14. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #14
    WOW, that's a pretty big difference.

    And the new GoogleBot doesn't take advantage of that then?
     
    SEbasic, Sep 16, 2004 IP
  15. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #15
    I didn't think so, but I just remembered that the server it's spidering of mine right now didn't have it turned on. So I just turned it on, and waited for it, and low and behold, it *is* using compression now!

    That is bad ASS, and something I was wishing for.
     
    digitalpoint, Sep 16, 2004 IP
  16. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #16
    I have a few questions about this if you don't mind - I really don't know anything about it.

    1- So, are there duplicates of each file sitting on your server then, or does the server recognise the HTTP1.1 and then serve the file accordingly with the compression?

    2- Does it put a lot more sress on servers if you are running it?

    3- Does it increase loading times on the users browser - does it put more stress on the users CPu (I guess the difference would be neglegable if it does)?

    I did think of more questions but I'm sure I could find the anwsers out if I looked hard enough.
     
    SEbasic, Sep 16, 2004 IP
  17. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #17
    It does not replicate data... it compresses it on the fly. It really depends on if your server is more bandwidth limited or CPU limited if it's worth turning on or not. I run it at the lowest compression level so it doesn't stress the CPU (my servers get a lot of traffic). Loading time should actually be a little faster for the user because they have less data to download. Really just depends on how fast their computer can decompress the file, compared to downloading a larger one.

    A simple way to turn it on for PHP files only would be to add this to your .htaccess file:

    php_value zlib.output_compression 1
    php_value zlib.output_compression_level 1
    Code (markup):
    The higher the compression_level number, the better the compression (but more CPU overhead).
     
    digitalpoint, Sep 16, 2004 IP
  18. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Thanks for that shawn.

    So If I wanted to find a little more about it, what would be the correct termonology to use on a search.

    How would that .htaccess file be used in reference to a .cfm extension?
     
    SEbasic, Sep 16, 2004 IP
  19. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #19
    digitalpoint, Sep 16, 2004 IP
  20. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Thanks for that. I'll look in to it. Could same some cash...
     
    SEbasic, Sep 16, 2004 IP