I have submitted Google sitemap & yahoo feeds for 3 of my sites. I havenot included robots.txt file in the root directory as I want all my pages indexed in search engines. My question is, is it ok without a robots.txt file or should I include a robots.txt file as User-agent: * Disallow: Thanks
If you do not upload a robots.txt file to your google sitemap account then you get errors from them. You should use the file because that is what tells the spiders to go back and read your meta tags.
to avoid later security problems around hackers on your site I would disallow any path leading into - admin folders - cgi-bin - configuration folders - any logfile/statistic pages ( stats have URLs into all private pages as well such as cgi-bin, admin, etc unless cleanly configured, hence online stats are a potential security risk for site owner - also exclude an y folders or Sub-folders containing non-content pages this keeps your number of pages clean to the real content and avoids at least partially that hackers find via Google paths to particular scripts known at a given time to have a security issue Google is the prime resource for cyber criminals and easiest way for hackers to find sites open for abuse ( I had the lesson of being victim last winter several time, each time a hacker attempt was initiated via Google search-result ) hence a typical robots.txt might look like User-agent: * Disallow: /cgi-bin Disallow: /logs Disallow: /any-software/admin Disallow: /your_blog/trackback.php Disallow: /some-software/include Disallow: /your-scripts/templates like almost all pages and formats on the web - robots.txt has a validator http://www.searchengineworld.com/cgi-bin/robotcheck.cgi and a home page http://www.robotstxt.org/
Hi racer22, I disagree with the two other posts. Your site is perfect for search engines without robots.txt. robots.txt is used to suggest robots not to visit some pages. That's all ! If you want the robots to visit all pages, there is no need for a robots.txt file. If there is no robots.txt in your site, the robots will still try to find it and there will be "404 not found" messages in your log file. This is not a problem for the robots, but you might prefer to avoid this with a robots.txt as : User-agent: * Disallow: Code (markup): Not true: robots.txt is used to disallow things, not to allow them. Not true: robots.txt only contains "suggestions" which are respected by polite robots. They are not respected by bad intended robots. Jean-Luc Jean-Luc
WTF? Why do you even post if you have absolutely no idea of what you are talking about??? Just put up a blank robots.txt file so you don't get any errors in your error log. If you don't mind about that then you are fin without a robots file at all.
the key source for all known to me cyber crime is Google Google does strictly respect robots.txt hence excluding admin and script sections of a site in robots.txt also successfully disables the major information-source of cyber criminals all known hacker attacks always came from Google search and NO single other source is known to me in the past 9 yrs of full time web publishing all MAJOR SE such as MSN, Y, G and ask.com do fully respect robots.txt and no other SE is of any significance to obtain correct and current CC-relevant information