I have been unable to find if this will work... Assume I have links to the following pages on my site /folder/file1.php /folder/file1.php?action=something /folder/file1.php?action=something&do=else /folder/file2.php?action=something2&do=ornothing /folder/file2.php?action=something2&do=mighthappen /folder/filex.html /folder/filey.html /folder/filez.html I'm assuming these would all be indexed differently, as they all have different content. I want to disallow the spider from accessing ANY .php files in the directory. Is this a valid approach ? User-agent: * Disallow: /folder/*.php* Code (markup): Also am I correct in assuming that unless there is an href link to a new page, that a spider will not crawl it ?
try this tutorial http://www.freefind.com/library/howto/robots/ your 2nd question a bot crawls pages linked on your site OR linked from ANY other site in the www - including wrongful sitemaps, published log files/stats etc