robot.txt help

Discussion in 'robots.txt' started by ashiezai, May 31, 2005.

  1. #1
    Hi there, im running a link exchange directory and i think it is being hit by the dup filter ...

    Currently, the directory generated by the script i use (duncan carver's LMA) is dropped by google ..

    Basically the url looks like this
    http://www.xxx.com/directory/Alternative/index.html

    But when i do a site: command in google and found that the indexed page is
    http://www.xxx.com/directory/Alternative/ (without index.html)
    And the problem is that without the index.html the page is empty... that is im having all empty pages indexed and finally hit the dup filter..

    I've checked all of the links in the page generated by the script is ending with index.html .. i do not know that the version without index.html is indexed :confused:

    Is that any way to prohibit the google bot to crawl the page without index.html using robot.txt or .htaccess?

    Thanks in advance for any help
     
    ashiezai, May 31, 2005 IP
  2. noppid

    noppid gunnin' for the quota

    Messages:
    4,246
    Likes Received:
    232
    Best Answers:
    0
    Trophy Points:
    135
    #2
    Use robots.txt, not robot.txt and yes you can limit access there.
     
    noppid, May 31, 2005 IP
  3. ashiezai

    ashiezai Peon

    Messages:
    927
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    0
    #3
    But i do not know the code .. can any1 help me ?

    I've searched for it but couldnt get anything ..

    i want directory/a/index.html to be indexed but not directory/a/

    All tutorials i found out doesnt do this.

    It's that possible that i 301 redirect them ?
     
    ashiezai, May 31, 2005 IP