Need to send Google my log

Discussion in 'Traffic Analysis' started by PedstersPlanet, Jan 23, 2006.

  1. #1
    as Googlebot is eating my bandwidth more than usual.. I sent them a c&p of my webalizer, but they replied stating they need detailed pages from my apache log! How can I do this without manually wasting time going through a 20mb log?

    Here's my January webalizer log for GBot:

    Any ideas?
     
    PedstersPlanet, Jan 23, 2006 IP
  2. just-4-teens

    just-4-teens Peon

    Messages:
    3,967
    Likes Received:
    168
    Best Answers:
    0
    Trophy Points:
    0
    #2
    you should first check that u aint got a dodgy page/script (forum with session ids??) somewhere and the google bot is getting trapped in a circle.

    i not sure how you would check, but im sure someone can help.
     
    just-4-teens, Jan 23, 2006 IP
  3. tstaut

    tstaut Active Member

    Messages:
    408
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    90
    #3
    If you can figure out where it's using all that bandwidth, robots.txt would be a good solution on your side (without having to depend on G).
     
    tstaut, Jan 24, 2006 IP
  4. dave487

    dave487 Peon

    Messages:
    701
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Find out which pages google was visiting and then decide if you want to allow it to visit all of these pages or not.
     
    dave487, Jan 24, 2006 IP
  5. blinxdk

    blinxdk Peon

    Messages:
    660
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    0
    #5
    grep Googlebot access.log >> fileforgoogle.txt

    Will send all lines from access.log containing a useragent of Googlebot to a seperate file.

    Requires that you use the combined log format for apache where useragent is included in the main logfile.
     
    blinxdk, Jan 24, 2006 IP
  6. blinxdk

    blinxdk Peon

    Messages:
    660
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Btw. I don't think that hit number sounds so bad if you have a few pages, I usually have +1500 hits/day from googlebot.
     
    blinxdk, Jan 24, 2006 IP
  7. PedstersPlanet

    PedstersPlanet Peon

    Messages:
    195
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #7
    D'oh, I didn't think of that..lol.. Thanks. This will tell me if it's worth contacting Google again.. I have a 100gig limit a month (server wide), but my server currently hosts 4 sites, so I have to judge the bandwidth..

    Thanks for the help :)
     
    PedstersPlanet, Jan 24, 2006 IP
  8. PedstersPlanet

    PedstersPlanet Peon

    Messages:
    195
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I have more dynamic pages than static pages, like 10,000+ (18,600 indexed by G).... Anyway, I'll use the grep suggestion then decide after......
     
    PedstersPlanet, Jan 24, 2006 IP
  9. dave487

    dave487 Peon

    Messages:
    701
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I think if you have 18,000 pages you should expect/want G to look at them all at least once a month.
     
    dave487, Jan 25, 2006 IP
  10. PedstersPlanet

    PedstersPlanet Peon

    Messages:
    195
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #10
    True, I do, however most pages (even if they're dynamic) do not change content, its my home page that changes every page load - RSS feed stuff - so I'm not sure why GB visits those "static" pages? Unless of course the COOP Network links are taken into account for page changes?
     
    PedstersPlanet, Jan 25, 2006 IP
  11. dave487

    dave487 Peon

    Messages:
    701
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #11
    If the page changes, even slightly, between crawls then G will increase the frequency of crawls as it thinks you are getting fresh content.

    The more frequently G spiders your site the more important it must think it is.
     
    dave487, Jan 25, 2006 IP