is it possible?

Discussion in 'robots.txt' started by shailendra, Jan 29, 2009.

  1. #1
    hello friends,

    suppose, i create a robots.txt file with the following entry:

    User-Agent: *
    Allow: /
    Disallow: /index.html


    Will this stop the spider from crawling the Home Page or will it be crawled?
    Someone told me that http://www.xyz.com/ and http://www.xyz.com/index are both different URLs

    Thanks & Regards
    Shailendra
     
    shailendra, Jan 29, 2009 IP
  2. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #2
    Yes it'll stop the spider from crawling the Home Page.

    xyz.com and zyz.com/index.html are physically the same page, however, Google considers it as 2 different pages.
     
    manish.chauhan, Jan 29, 2009 IP
  3. shailendra

    shailendra Peon

    Messages:
    1,225
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #3
    yes they are physically the same page and if it stops crawling the home page then why we get two entries in the sitemap for home page i.e. w/o index.html and with index.html. How PR gets distributed between the two?
     
    shailendra, Jan 29, 2009 IP
  4. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #4
    Google considers these as two separate pages. so your PR also distribute between these two pages.

    To avoid this, I suggest you to do canonical optimization of your website.
     
    manish.chauhan, Jan 29, 2009 IP
  5. shailendra

    shailendra Peon

    Messages:
    1,225
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #5
    i have done it...but doing what i written for robots file will do any good?
     
    shailendra, Jan 29, 2009 IP
  6. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #6
    Sorry??:confused::confused:
     
    manish.chauhan, Jan 29, 2009 IP
  7. ggmittal

    ggmittal Guest

    Messages:
    27
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7

    hello friend... i had done same with my website... and the result was google showed the error that says that google is uable to crawl the homepage.. the best solution for this problem is to edit .htaccess and redirect yoursite.com/index.html to yoursite.com ... it will definatly work...
     
    ggmittal, Feb 13, 2009 IP