hello friends, suppose, i create a robots.txt file with the following entry: User-Agent: * Allow: / Disallow: /index.html Will this stop the spider from crawling the Home Page or will it be crawled? Someone told me that http://www.xyz.com/ and http://www.xyz.com/index are both different URLs Thanks & Regards Shailendra
Yes it'll stop the spider from crawling the Home Page. xyz.com and zyz.com/index.html are physically the same page, however, Google considers it as 2 different pages.
yes they are physically the same page and if it stops crawling the home page then why we get two entries in the sitemap for home page i.e. w/o index.html and with index.html. How PR gets distributed between the two?
Google considers these as two separate pages. so your PR also distribute between these two pages. To avoid this, I suggest you to do canonical optimization of your website.
hello friend... i had done same with my website... and the result was google showed the error that says that google is uable to crawl the homepage.. the best solution for this problem is to edit .htaccess and redirect yoursite.com/index.html to yoursite.com ... it will definatly work...