Credit Cards - MPAA - Car Credit - Refinance - Loans

PDA

View Full Version : Robots.txt


Jetlag
Jan 11th 2005, 1:25 pm
Hello
Im using Able2know Mod on my forum and Google is indexing all my Disallows from my robots.txt
Disallow: forums/post-*.html$
Disallow: forums/updates-topic.html*$
Disallow: forums/stop-updates-topic.html*$
Disallow: forums/ptopic*.html$
Disallow: forums/ntopic*.html$
www.canadianpwc.com/post-301.html and www.canadianpwc.com/pwc-193.html are the same post and they are listed in google. Should i remove the "$" in my robots?
Thanks
Jetlag

Jayess
Jan 11th 2005, 1:27 pm
Hello
Im using Able2know Mod on my forum and Google is indexing all my Disallows from my robots.txt
Disallow: forums/post-*.html$
Disallow: forums/updates-topic.html*$
Disallow: forums/stop-updates-topic.html*$
Disallow: forums/ptopic*.html$
Disallow: forums/ntopic*.html$
www.canadianpwc.com/post-301.html and www.canadianpwc.com/pwc-193.html are the same post and they are listed in google. Should i remove the "$" in my robots?
Thanks
Jetlag
It validates, but has some bad style.

55 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

Disallow: /post-*.html$
56 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

Disallow: /updates-topic.html*$
57 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

Disallow: /stop-updates-topic.html*$
58 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

Disallow: /ptopic*.html$
59 warning Possible Missplaced Wildcard. Although Google supports wildcards in the Disallow field, it is nonstandard.

Disallow: /ntopic*.html$

that is according to

www.searchengineworld.com/cgi-bin/robotcheck.cgi

EdenView
Jan 11th 2005, 5:40 pm
that is according to

www.searchengineworld.com/cgi-bin/robotcheck.cgi


Nice resource!

reppy
Jan 11th 2005, 11:06 pm
Very nice resource. But does anyone know how to fix it? I'm using the same robots.txt :)

protesto
Jan 11th 2005, 11:09 pm
I have the same problem with the mod.

Jetlag
Jan 12th 2005, 10:02 pm
Thank Jayess
I like that link you posted

Owlcroft
Jan 14th 2005, 2:53 am
Hello
Im using Able2know Mod on my forum and Google is indexing all my Disallows from my robots.txt
Disallow: forums/post-*.html$
Disallow: forums/updates-topic.html*$
Disallow: forums/stop-updates-topic.html*$
Disallow: forums/ptopic*.html$
Disallow: forums/ntopic*.html$
www.canadianpwc.com/post-301.html and www.canadianpwc.com/pwc-193.html are the same post and they are listed in google. Should i remove the "$" in my robots?
Thanks
Jetlag
The robots.txt standard says that bots are to match the wanted filespec with the patterns they find and not take the file if the match is correct out to the end of the pattern as presented. Your patterns thus need to start with a root slash.

You also can not use wildcards in specifications, nor regex symbols. The only "workaround" here is that all specs have an implicit wildcard at their end. That is,
/forums/ntopic
would match--
/forums/ntopic27.shtml
/forums/ntopics/stuff.php
/forums/ntopical33.htm
and so on.

A robots.txt file needs to be organized so:

User-agent: thisone
User-agent: thatone
User-agent: totherone
Disallow: somespec
Disallow: someotherspec

User-agent: aspecial
User-agent: anotherspecial
Disallow: /hotstuff

That is, directive blocks must have no blank lines within them--a blank line ends any block. Within a block, you can stack as many User-agent declarations as the specs in that block will apply to, and as many Disallow declarations as you need. (There is no generally recognized Allow declaration, though a few bots are said to recognize it; I'd advise not relying on it.)

You can use a bare asterisk * as a wildcard in a User-agent declaration, where it will mean "all user agents". You can use a blank Disallow to mean "block nothing".

Note that bots will seek their matches in order, down the file. That matters, because you need to place all particularly restricted (by user agent) blocks before any more general (that is, "all agents") blocks, or the particular bots may find their match in the general block and thus never get down to what you intended for them. So--
User-agent: knowncreep
Disallow: /

User-agent: *
Disallow:
--will keep knowncreep out of everything, while letting every other bot into anything, whereas if you had those blocks reversed, knowncreep would also get into everything.

What you probably want--but you should work it out for yourself, knowing your files structure--is something like:

Disallow: /forums/post-
Disallow: /forums/updates-topic.html
Disallow: /forums/stop-updates-topic.html
Disallow: /forums/ptopic
Disallow: /forums/ntopic

Jetlag
Jan 16th 2005, 7:10 pm
Thanks Owlcroft
I added what you posted in the robots.txt
Disallow: /post-
Disallow: /updates-topic.html
Disallow: /stop-updates-topic.html
Disallow: /ptopic
Disallow: /ntopic
I have my forum in the root directory so i removed "/forums" so now ill just wait.
thanks again
Jetlag

Owlcroft
Jan 16th 2005, 7:51 pm
You don't have to wait very long. Check this thread (http://forums.digitalpoint.com/showthread.php?t=8043) .

Meanwhile, remember that my suggested contents were only that: suggested. You should work out the consequences yourself, to be sure what you use will do what you want.