What are the advantages of using robots.txt I read Shawn AKA digitalpoint post that he completly removed it from this site. Why? ". The more content Google has, the better so I figure it's just one of those things with running a forum..." http://forums.digitalpoint.com/showthread.php?t=2150 What you guys think?
The obvious advantage is you can keep search engines from indexing parts of your site. (You know... top secret stuff, personal business, stuff you don't want to see on Google)
You can stop search engine to index specific files or folder... or you can call them to index pages fast...
robots.txt in very important to business or company sites, you can use this file to restrict search engines to index your confidential part of you website.
In order for google robots to index a "private part" of your site, there must first be a link on your site that points to the "private part". If I create a website and slip the file "private.php" in the root directory, google isnt going to magically know its there. This is just my personal opinion, not trying to flame.
Just to let you know that a robots.txt file only tells well behaved spiders not to index (i.e exclude) parts of your website. The robots.txt file does NOT offer any form of security or protection. Many spiders out there completely ignore the robots.txt file and will index everything. phplife
Exactly....there are some spammy robots that do not respect robots.txt instructions..if you have blocked them by robots.txt....they will still follow the whole pages of your site.. In this case you can track their IP addresses by your traffic log and then block them using .htaccess... It will also help you to cut down your bandwidth usage as half of your website bandwidth is used by these spammy bots if you not block them..
Another advantage is that it can be used to block search engine spiders from indexing part or all of your website saving valuable bandwidth
Don't you think you need to keep some parts of your website out of the reach of search engines ?. I sue robots.txt to restrict certain parts of my website from being indexed or known to public for eg: the bin directory, dataabse etc
Primary reason of the same in not to save bandwidth, however, to keep our private files secure from search engines
just don't use this to exclude your downloads area if anyone has access to robots.txt because someone can dl your stuff
Thanks rvitgroup, I am glad your liked my reply. I would request you to drop me an message with your queries so that I can assist you further. Looking forward to hear from you soon.
The robots.txt file is a simple text file (no html) that is placed in your website’s root directory in order to tell the search engines which pages to index and which to skip. Many webmasters utilize this file to help the search engines index the content of their websites. If webmasters can tell the search engine spiders to skip pages that they do not consider important enough to be crawled (eg. printable versions of pages, .pdf files etc.), then they have a better opportunity to have their most valuable pages featured in the search engine results pages. The robots.txt file is a simple method of essentially easing the process for the spiders to return the most relevant search results. That being said, I have seen many occasions where the robots.txt has not been used in the best way possible. For instance, webmasters are prone to make mistakes when installing the robots.txt and the repercussions can be severe. There is a simple instruction that restricts all search engine spiders from crawling the entire site: User-agent: * Disallow: / Without the “forward slash†in the instructions, search engines are granted access to the entire site. So, the inclusion of this one character in the robots.txt can prevent a website from showing in the search engines. There could be many reasons why webmasters would do this intentionally (website is still relatively new and they may still want to tweak certain pages for keyword density etc.), but more often than not, it is a mistake and is usually only realized when the site hasn’t shown up in the search engine indexes for months. Errors aside, another benefit of having a robots.txt is that you can specify the location of the Google .xml or Yahoo sitemap with this simple instruction: sitemap: http://www.client.com/sitemap.xml (this assumes the xml sitemap is located at the root of the domain). This also increases spiderability for the search engines. Of course, even though this is a small aspect of the search engine optimization process, if utilized correctly, a robots.txt can be a significant benefit.
robots.txt provides you control to access and deny crawling of specific contents from your, to learn all syntex properly please search on google about robots.txt
By using robot.txt you can mention your sitemap URL there, also you can allow and disallow links from google crawling, etc...
Robot.txt is very useful why you ant to search engine don't crawl your site . One most advantages is that using robot.txt your admin panel must be keep away from crawl . So you fight with the hackers.