1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Robots.txt

Discussion in 'robots.txt' started by Jetlag, Jan 11, 2005.

  1. #1
    Hello
    Im using Able2know Mod on my forum and Google is indexing all my Disallows from my robots.txt
    Disallow: forums/post-*.html$
    Disallow: forums/updates-topic.html*$
    Disallow: forums/stop-updates-topic.html*$
    Disallow: forums/ptopic*.html$
    Disallow: forums/ntopic*.html$
    www.canadianpwc.com/post-301.html and www.canadianpwc.com/pwc-193.html are the same post and they are listed in google. Should i remove the "$" in my robots?
    Thanks
    Jetlag
     
    Jetlag, Jan 11, 2005 IP
  2. Jayess

    Jayess Peon

    Messages:
    87
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    It validates, but has some bad style.

    55 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

    Disallow: /post-*.html$
    56 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

    Disallow: /updates-topic.html*$
    57 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

    Disallow: /stop-updates-topic.html*$
    58 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

    Disallow: /ptopic*.html$
    59 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

    Disallow: /ntopic*.html$

    that is according to

    www.searchengineworld.com/cgi-bin/robotcheck.cgi
     
    Jayess, Jan 11, 2005 IP
  3. EdenView

    EdenView Peon

    Messages:
    305
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #3
    EdenView, Jan 11, 2005 IP
  4. reppy

    reppy Peon

    Messages:
    19
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Very nice resource. But does anyone know how to fix it? I'm using the same robots.txt :)
     
    reppy, Jan 11, 2005 IP
  5. protesto

    protesto Peon

    Messages:
    140
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I have the same problem with the mod.
     
    protesto, Jan 11, 2005 IP
  6. Jetlag

    Jetlag Active Member

    Messages:
    619
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    88
    #6
    Thank Jayess
    I like that link you posted
     
    Jetlag, Jan 12, 2005 IP
  7. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #7
    The robots.txt standard says that bots are to match the wanted filespec with the patterns they find and not take the file if the match is correct out to the end of the pattern as presented. Your patterns thus need to start with a root slash.

    You also can not use wildcards in specifications, nor regex symbols. The only "workaround" here is that all specs have an implicit wildcard at their end. That is,
    /forums/ntopic
    would match--
    /forums/ntopic27.shtml
    /forums/ntopics/stuff.php
    /forums/ntopical33.htm
    and so on.

    A robots.txt file needs to be organized so:

    User-agent: thisone
    User-agent: thatone
    User-agent: totherone
    Disallow: somespec
    Disallow: someotherspec

    User-agent: aspecial
    User-agent: anotherspecial
    Disallow: /hotstuff

    That is, directive blocks must have no blank lines within them--a blank line ends any block. Within a block, you can stack as many User-agent declarations as the specs in that block will apply to, and as many Disallow declarations as you need. (There is no generally recognized Allow declaration, though a few bots are said to recognize it; I'd advise not relying on it.)

    You can use a bare asterisk * as a wildcard in a User-agent declaration, where it will mean "all user agents". You can use a blank Disallow to mean "block nothing".

    Note that bots will seek their matches in order, down the file. That matters, because you need to place all particularly restricted (by user agent) blocks before any more general (that is, "all agents") blocks, or the particular bots may find their match in the general block and thus never get down to what you intended for them. So--
    User-agent: knowncreep
    Disallow: /

    User-agent: *
    Disallow:
    --will keep knowncreep out of everything, while letting every other bot into anything, whereas if you had those blocks reversed, knowncreep would also get into everything.

    What you probably want--but you should work it out for yourself, knowing your files structure--is something like:

    Disallow: /forums/post-
    Disallow: /forums/updates-topic.html
    Disallow: /forums/stop-updates-topic.html
    Disallow: /forums/ptopic
    Disallow: /forums/ntopic
     
    Owlcroft, Jan 14, 2005 IP
  8. Jetlag

    Jetlag Active Member

    Messages:
    619
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    88
    #8
    Thanks Owlcroft
    I added what you posted in the robots.txt
    I have my forum in the root directory so i removed "/forums" so now ill just wait.
    thanks again
    Jetlag
     
    Jetlag, Jan 16, 2005 IP
  9. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #9
    You don't have to wait very long. Check this thread .

    Meanwhile, remember that my suggested contents were only that: suggested. You should work out the consequences yourself, to be sure what you use will do what you want.
     
    Owlcroft, Jan 16, 2005 IP