Google Bots can't crawl URLs of my directory properly! Is it .htaccess or???

Discussion in 'Directories' started by jhnrang, Oct 30, 2007.

  1. #1
    Since last few weeks my - directory has been loosing numbers of pages indexed on Google. Out of 3500+ indexed pages previously, I only have around 125 pages indexed now.:(

    I have tried a lot to get the pages indexed, from getting quality links to submitting XML sitemap to Google webmaster tool.

    And yes- the directory looks seemingly penalized:( as it does not rank even for its official title.

    But I have checked many other penalized and non-penalized directories. While -many penalized directories still have good amount of indexed pages and some non-penalized directories have also lost indexed pages.

    My question is "why are Google bots doing this?"

    I am currently using phpLD3.1 version.:(

    Here is a screenshot of Googlebots crawling few hrs back from my latest visitor's list

    [​IMG]
     
    jhnrang, Oct 30, 2007 IP
  2. mikey1090

    mikey1090 Moderator Staff

    Messages:
    15,869
    Likes Received:
    1,055
    Best Answers:
    0
    Trophy Points:
    445
    Digital Goods:
    2
    #2
    mikey1090, Oct 30, 2007 IP
  3. an0n

    an0n Prominent Member

    Messages:
    5,688
    Likes Received:
    915
    Best Answers:
    0
    Trophy Points:
    360
    #3
    those urls are and should be 404's.

    obviously the bot has crawled an incorrect url so look at the 404 and don't worry about it.

    also, check your sitemap to be sure you aren't giving those url's to gaygle.
     
    an0n, Oct 30, 2007 IP
    jhnrang likes this.
  4. Fastian

    Fastian Peon

    Messages:
    2,085
    Likes Received:
    235
    Best Answers:
    0
    Trophy Points:
    0
    #4
    If you have done any recent changes (now mod or something) try to see if that's causing the trouble.

    Take a backup and try to undo them and see the change.
     
    Fastian, Oct 30, 2007 IP
  5. SilkySmooth

    SilkySmooth Well-Known Member

    Messages:
    1,583
    Likes Received:
    269
    Best Answers:
    0
    Trophy Points:
    180
    #5
    This is actually a flaw in either phpLD or the template design, I can't remember which but I do remember having the same problem and fixing it on Directory Share.

    I will use Allinfodir as an example to explain what happens. Lets say someone gives the owner a link to his Games category, the link should look like this:

    allinfodir.com/Games/

    But, the linker makes a mistake and links to:

    allinfodir.com/Game/

    Now copy and paste that URL into your browser, you will get a 404 header and will be shown the homepage of the site, but mouse over any of the category links and you will see that they have /Game/ included, like so:

    allinfodir.com/Game/

    allinfodir.com/Game/Arts_and_Humanities/
    allinfodir.com/Game/Arts_and_Humanities/Fashion_Houses/
    allinfodir.com/Game/Arts_and_Humanities/Animation/

    And so on, thus creating a whole subset of 404's which Google crawls through and reports. This will work for anything you type in, try:

    allinfodir.com/this-dont-exist/
    allinfodir.com/anything-here/

    So bad inlinks, renaming a category that has already been crawled, etc, will all cause these 404's to appear in Google Webmaster tools.

    Note: I deliberately didn't link to these URL's, please do not quote this post and activate these 404's!

    This problem exists on most directories, Alive, Aviva, DirJournal, Directory Dump, etc, etc.

    As I said I can't remember exactly how I fixed it, but I will see if I can find what I did and post the solution here.
     
    SilkySmooth, Nov 1, 2007 IP
    robjones likes this.
  6. mikey1090

    mikey1090 Moderator Staff

    Messages:
    15,869
    Likes Received:
    1,055
    Best Answers:
    0
    Trophy Points:
    445
    Digital Goods:
    2
    #6
    I think the version 3.2 of phpLD doesnt have that problem silky, mine works fine:)
     
    mikey1090, Nov 1, 2007 IP
  7. jhnrang

    jhnrang Notable Member

    Messages:
    4,107
    Likes Received:
    436
    Best Answers:
    0
    Trophy Points:
    225
    #7
    Thanks Rob (an0n) and Silky for your advices. I reckon- its another problem that is set-up by Google to disturb dirtectories. I have initiated some rectifications yesterday and the way I see G bots crawling my pages on my C-panel- I hope to have 6K+ indexed pages by end of Nov.
    Pls wish me.:)
     
    jhnrang, Nov 1, 2007 IP
  8. SilkySmooth

    SilkySmooth Well-Known Member

    Messages:
    1,583
    Likes Received:
    269
    Best Answers:
    0
    Trophy Points:
    180
    #8
    If you were referring to Zorg Directory then I'm afraid that it does:

    zorg-directory.com/sdfs/

    Mouse over the category structure.

    I have seen this problem on most versions of phpLD including 3.2
     
    SilkySmooth, Nov 1, 2007 IP
  9. jhnrang

    jhnrang Notable Member

    Messages:
    4,107
    Likes Received:
    436
    Best Answers:
    0
    Trophy Points:
    225
    #9
    Hi Silky.
    Can you pls check my PR6 directory if you know it!:( Otherwise pls- tell me if I send the URL by PM. Because this is an industry-problem-not only mine. If we can solve it- the industry will be a better place again.:)
     
    jhnrang, Nov 1, 2007 IP
  10. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #10
    - As long as those pages return a 404 I don't think there is a problem. I don't think google follows urls in the 404 page.
    one old version of phpld had a problem with the setting to force 404/200 Ok headers, I don't remember exactly the issue.

    The problem with this propagation is that the urls are relative. They should be absolute at least if there is is problem with a url, you won't generate all those wrong urls.

    There is a more serious flaw as the urls in this case return 200 OK headers and can be followed by the crawlers...
    http://forums.digitalpoint.com/showthread.php?t=295586
     
    enQuira, Nov 1, 2007 IP
  11. SilkySmooth

    SilkySmooth Well-Known Member

    Messages:
    1,583
    Likes Received:
    269
    Best Answers:
    0
    Trophy Points:
    180
    #11
    Ok, just tested this on another directory and it is a template issue....

    If you open up your main.tpl (this file is usually the same for most templates with only a few minor changes which explains why the problem is so widespread).

    Locate the lines which start:

    
    <a href="{if $smarty.const.ENABLE_REWRITE}
    
    Code (markup):
    Just add a slash in them like so:

    
    <a href="/{if $smarty.const.ENABLE_REWRITE}
    
    Code (markup):
    Then retry a 404 URL and you shouldn't have the problem anymore.

    Sorry I can't give line numbers as it varies from template to template.

    HTH
     
    SilkySmooth, Nov 1, 2007 IP
    mikey1090 and enQuira like this.
  12. SilkySmooth

    SilkySmooth Well-Known Member

    Messages:
    1,583
    Likes Received:
    269
    Best Answers:
    0
    Trophy Points:
    180
    #12
    Umm not strictly true because the 404 page is a replica of the homepage, including meta tags and a vast proportion of them have the following...

    
    <meta name="robots" content="index, follow" />
    
    Code (markup):
    Anyway, the fix worked for me, I haven`t had any 404's reported by Google for months.
     
    SilkySmooth, Nov 1, 2007 IP
    jhnrang likes this.
  13. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Yep that makes the absolute paths.
    The second issue I mentioned is more serious.
     
    enQuira, Nov 1, 2007 IP
    jhnrang and SilkySmooth like this.
  14. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Yes but I think when there is a 404 header the crawler won't continue reading the file altogether. Anyways that's not the issue as absolute paths do the job.
     
    enQuira, Nov 1, 2007 IP
  15. jhnrang

    jhnrang Notable Member

    Messages:
    4,107
    Likes Received:
    436
    Best Answers:
    0
    Trophy Points:
    225
    #15
    Thanks Silky mate- it was exactly as you mentioned. I have put the / -- hope it wont distub my ongoing project of re-structuring all the URLs with unique ness.
     
    jhnrang, Nov 1, 2007 IP