Hello Im using Able2know Mod on my forum and Google is indexing all my Disallows from my robots.txt Disallow: forums/post-*.html$ Disallow: forums/updates-topic.html*$ Disallow: forums/stop-updates-topic.html*$ Disallow: forums/ptopic*.html$ Disallow: forums/ntopic*.html$ www.canadianpwc.com/post-301.html and www.canadianpwc.com/pwc-193.html are the same post and they are listed in google. Should i remove the "$" in my robots? Thanks Jetlag
It validates, but has some bad style. 55 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard. Disallow: /post-*.html$ 56 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard. Disallow: /updates-topic.html*$ 57 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard. Disallow: /stop-updates-topic.html*$ 58 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard. Disallow: /ptopic*.html$ 59 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard. Disallow: /ntopic*.html$ that is according to www.searchengineworld.com/cgi-bin/robotcheck.cgi
The robots.txt standard says that bots are to match the wanted filespec with the patterns they find and not take the file if the match is correct out to the end of the pattern as presented. Your patterns thus need to start with a root slash. You also can not use wildcards in specifications, nor regex symbols. The only "workaround" here is that all specs have an implicit wildcard at their end. That is, /forums/ntopicwould match-- /forums/ntopic27.shtml /forums/ntopics/stuff.php /forums/ntopical33.htmand so on. A robots.txt file needs to be organized so: User-agent: thisone User-agent: thatone User-agent: totherone Disallow: somespec Disallow: someotherspec User-agent: aspecial User-agent: anotherspecial Disallow: /hotstuff That is, directive blocks must have no blank lines within them--a blank line ends any block. Within a block, you can stack as many User-agent declarations as the specs in that block will apply to, and as many Disallow declarations as you need. (There is no generally recognized Allow declaration, though a few bots are said to recognize it; I'd advise not relying on it.) You can use a bare asterisk * as a wildcard in a User-agent declaration, where it will mean "all user agents". You can use a blank Disallow to mean "block nothing". Note that bots will seek their matches in order, down the file. That matters, because you need to place all particularly restricted (by user agent) blocks before any more general (that is, "all agents") blocks, or the particular bots may find their match in the general block and thus never get down to what you intended for them. So-- User-agent: knowncreep Disallow: / User-agent: * Disallow:--will keep knowncreep out of everything, while letting every other bot into anything, whereas if you had those blocks reversed, knowncreep would also get into everything. What you probably want--but you should work it out for yourself, knowing your files structure--is something like: Disallow: /forums/post- Disallow: /forums/updates-topic.html Disallow: /forums/stop-updates-topic.html Disallow: /forums/ptopic Disallow: /forums/ntopic
Thanks Owlcroft I added what you posted in the robots.txt I have my forum in the root directory so i removed "/forums" so now ill just wait. thanks again Jetlag
You don't have to wait very long. Check this thread . Meanwhile, remember that my suggested contents were only that: suggested. You should work out the consequences yourself, to be sure what you use will do what you want.