Quick robots.txt dissalow question

Discussion in 'Google' started by White40thGT, Jul 13, 2007.

  1. #1
    I have been unable to find if this will work...

    Assume I have links to the following pages on my site

    /folder/file1.php
    /folder/file1.php?action=something
    /folder/file1.php?action=something&do=else

    /folder/file2.php?action=something2&do=ornothing
    /folder/file2.php?action=something2&do=mighthappen

    /folder/filex.html
    /folder/filey.html
    /folder/filez.html

    I'm assuming these would all be indexed differently, as they all have different content. I want to disallow the spider from accessing ANY .php files in the directory. Is this a valid approach ?

    User-agent: *
    Disallow: /folder/*.php*
    Code (markup):
    Also am I correct in assuming that unless there is an href link to a new page, that a spider will not crawl it ?
     
    White40thGT, Jul 13, 2007 IP
  2. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #2
    try this tutorial
    http://www.freefind.com/library/howto/robots/

    your 2nd question
    a bot crawls pages linked on your site OR linked from ANY other site in the www - including wrongful sitemaps, published log files/stats etc
     
    hans, Jul 13, 2007 IP