Hi DP Members, Today i visited this site and found it has robot.txt file - springflex.com/robots.txt due to this Google spider is not reachable to this site while this site is cached by yahoo and bing. Is robot.txt file work for only google? If yes, why? and if no, why? Please solve my problem. I am waiting for your solutions. Thanks
robots.txt file is used by many bots like alexa, yahoo, msn etc etc not just Google. check out this page : http://www.mcanerin.com/en/search-engine/robots-txt.asp its robots.txt generator. awesome tool. will also tell u who follow that file rules.
Robots.txt is for all the bots, not only for any single search engine. Using this, u can restrict or allow any single bot or multiple bots as per your wish. Find the full information about robots.txt at: http://www.robotstxt.org/robotstxt.html Have a look of this ref. Even if you have any other queries after going through the above ref, post it here in the forum.
Robot.txt file is used for all bots such as Google, Bing, Yahoo, Ask, Altavista as well as local search engines. Whenever a spider or crawler (it may be of any search engine) come to any site it first try to load Robot.txt file to find the specifications given. Robot.txt file has following format: User-Agent: * Disallow: Here "User-Agent" specify which bots must follow this rule. If it is given as "User-Agent: * " then it means that it is common for every search engine bot. If you want to specify any specific search engine you can give the name of that bot in this. For ex. to restrict Google to index your site you will use: "User-Agent: Googlebot" So Robot.txt file is applicable to each search engine.
Hi, Thanks for information. But this site used this file - User-Agent: * Disallow: Is it format disallow for only google? If no, why that site is being crawled by bing and yahoo?
The robots.txt file: Says for all user agents... disallow nothing! In other words, this robots.txt file tells all of the search engines that they can index any page on the site. It ALLOWs everything to be indexed. It does NOT restrict the bots at all. If you want to block an entire site then you would use: If the site is not indexed at Google then 1) Google just hasn't crawled them, 2) Google crawled them but decided not to index them, or 3) they could be banned. I can tell you that the robots.txt file on this site is TOTALLY invalid. They have their User-agent: and Disallow: directive on the same line which is invalid. There is all kinds of error text showing up when I access it (looks like something to do with an ATTEMPT to call possibly a php program to build a sitemap from their robots.txt). I would highly suggest fixing the robots.txt. It might be that since the robots.txt is totally screwed up that Google has no clue what you are trying to block, so they are erroring on the safe side and not indexing anything. The User-agent directive should be on one line, the Disallow: directive should be on the next line followed by a blank line followed by your Sitemap: directive. Instead I get the following when I access their robots.txt: