Dos and Don'ts of the Googlebot

Discussion in 'Site & Server Administration' started by Diddy1, Jul 4, 2006.

  1. #1
    Google Adsense relies on the google adsense spider to properly deliver relevant ads. Which is closely related or the same as the googlebot. So let's explorer this spider and found out how to better our sites through it. The thing that most webmasters don't realize is the googlebot is a lot smarter than most people think. It has been coded to fit the rising amount of spam websites out there. I've compiled a list fo things they like and things that may get your site dropped by Google which needless to say will leave your Google Adsense profits out of the loop. Let's start with the negatives as to what you shouldn't do:

    First duplicate content is an immediate turn-off. They don't like it if 1000 websites all say the same thing. So think twice before using those articles from article websites.

    Second the thing about nested tables these are the tables hidden behind tables. Which means you see an ordered content but it's really a lot of code. My advice stay away from nested tables, stick with normal tables.

    Third the so called doorway pages which basically involves loading your webpage with keywords and then diverting visitors to another webpage. Again major no-no here.

    Fourth these dynamic urls which change depend on inputted information. Too many of these will have the spider spend more time than necessary on your website, bringing your ranking down.

    Fifth the HTML or PHP or whatever code it is needs to be kept to a minimum. More code than actual content is bad for your websites since the Googlebot just sees text.

    Sixth which is a scam many use to get higher ranking, this only whoever works for a while then gets you completely drooped from Google, is keyword overloading. Putting nothing but keywords on a page will get you high the first time the bot searches your site but as time progresses it learns and you pay.

    Seventh of course is self-promoting. Which is basically putting too many links to your homepage from your sub-pages. This sounds like it'll work for you but it doesn't as the Googlebot know enough to know which links are from which domain.

    Eighth is having the same content on a page forever, one of the fundamental rules of a good webmaster is to keep your website fresh at all times.

    Ninth is the new iframes these frames do nothing short of confusing the Googlebot as it indexes each frame as a different page ending your content in a disarray. Which is why Google Ads only respond to the iframe they are put in.

    That's about it for the negatives. Focus on those and check to see that you don't use any of these methods, this will keep you in the neutral zone. But we want more we want to be able to go positive, to make sure the Googlebot leaves our website with enough info to satisfy Google Adsense and get you a good SEO ranking. So here is a list of the things to do in order to get ahead of the spider game:

    First things first updated content is a must. If you can update your website daily. Ever wonder why blogs and forums do so well in search engines? Because their content is ever so changing. And not to mention original.

    Second use a useful tool by Google called Google Sitemaps. It's a complete layout of your website that you tell Google so it already knows where to spider. Which makes the Googlebot's job so much easier. If you haven't already sign up for www.google.com/webmasters/sitemaps/login. It's free and easy.

    Third keep your code clean of errors and keep it to a minimum. There a lot of tools out there that check you website code to make sure there no errors. Even if your content is automatic check jsut to make sure, machines aren't perfect.

    Fourth make sure you have a lot of relevant backlinks with different titles. Your website shouldn't be "All About Google Adsense" all the time someone else can link to it under a different title. Like "Google Adsense Info Blog". This makes sure the Google Spider doesn't mistake your backlinks as linksfarm backlinks, which do more harm than good.

    Fifth keep your keyword density normal about 3-7% should be good enough. Remember moderation in all things you don't wnat too much or too little.

    Sixth make sure each of your images has an ALT tag this makes sure the Googlebot will understand what it stands for. Since it can only read text.

    Seventh always keep you webpages static unless it's unavoidable. Too much generated content is bad for spiders.

    If you follow all of these your website should be the most Googlebot friendly website out there. Making your Google Adsense ads stay relevant and giving you a higher Google PR. But make sure you keep up-to-date about new developments in the Google Spider field.

    Thank You
     
    Diddy1, Jul 4, 2006 IP
    ServerUnion likes this.
  2. MikeSwede

    MikeSwede Peon

    Messages:
    601
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I have been watching googlebot for a couple of days and from what I can see it is all confused.
    This is one page it tried to access:
    /search/index.php/food_companies.php
    I have both pages but there is no rime and reason why it tries to access food_companies.php that way. I even have in my menu, where I think the Bot gets the url, fully qualified paths PLUS a sitemap so what the heck is going on here?
    My site got hundreds of pages moved to supplemental hell about a week ago just because googlebot got totally out of whack, where it tried to get to pages in subfolders of subfolders that wasn't even there and nothing in my paths would even have them to try to crawl.
    So my guess is: Googlebot is just screwed up and I say that since Yahoo and MSN is crawling the same site without problem!!:mad:
     
    MikeSwede, Jul 4, 2006 IP
  3. nddb

    nddb Peon

    Messages:
    803
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #3
    You can have 500 pages of PHP code, the bot never sees PHP, it only sees the HTML after PHP has been processed.

    Googlebot should have no idea what is generated by PHP and what is static, beyond the file extension, which of course you can change.
     
    nddb, Jul 4, 2006 IP
  4. vlasta

    vlasta Peon

    Messages:
    173
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Well, GoogleBot might be broken, but this could also be your problem. A problem with relative urls...
    If you have page www.yoursite.com/search/ and there is a link like <a href="index.php"> instead of <a href="/index.php">, it'll have these symptoms. Happened once to me too. :mad:
     
    vlasta, Jul 4, 2006 IP
  5. jquindlen

    jquindlen Notable Member

    Messages:
    1,725
    Likes Received:
    112
    Best Answers:
    0
    Trophy Points:
    220
    Digital Goods:
    1
    #5
    Nice post Diddy. Some good tips in there.
     
    jquindlen, Jul 5, 2006 IP
  6. MikeSwede

    MikeSwede Peon

    Messages:
    601
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #6
    so why does this happen all of a sudden? This is not the worst case. There are even wors things with subdirectories and pages nested. Seems just like a mess and I don't know where they all these pages from. The site is 6 years old and have never had any problems with paths!!
     
    MikeSwede, Jul 5, 2006 IP
  7. xeno

    xeno Peon

    Messages:
    788
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Yes, very informative post. More than I ever thought I need to know about the Googlebot. Thanks
     
    xeno, Jul 6, 2006 IP
  8. vlasta

    vlasta Peon

    Messages:
    173
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #8
    could be, someone else linked to your page with the trailing slash. Some webmasters think that if it has no extension, it must be a directory and add the slash to the link themselves. And once googlebot reads the page with this url, it tries to fetch it from your site and you return the page with relative links, which are correct if there is no trailing slash but wrong in this case. And it could propagate.
     
    vlasta, Jul 6, 2006 IP