Car Credit - Celebrity Pictures - Personalized Gifts - Worcester Landscaping Company - Car Insurance

PDA

View Full Version : Checking robots.txt?


Tuning
Jun 7th 2005, 9:09 pm
Hi Dp Folks,

Is there anyway I can check working of robots.txt ?

I have recently done changes to robots.txt and is expecting something I want. But I'm not sure the current setting will work. :confused:

Can anyone suggest some idea ?

Regards,
Tuning

noppid
Jun 7th 2005, 9:12 pm
Validate robots.txt (http://www.searchengineworld.com/cgi-bin/robotcheck.cgi)

That should help.

Tuning
Jun 7th 2005, 10:12 pm
Thanks noppid,

But it seems that is not what I'm looking for. I wanted how SE's view my pages following robots.txt instructions.

Do you know any tools ?

noppid
Jun 7th 2005, 10:23 pm
Thanks noppid,

But it seems that is not what I'm looking for. I wanted how SE's view my pages following robots.txt instructions.

Do you know any tools ?

I don't understand exactly what you mean? What site is the file at? The source will tell.

Tuning
Jun 8th 2005, 4:12 am
I don't understand exactly what you mean? What site is the file at? The source will tell.

This is the site :

forums.matrixweb.org

The pages got dropped from google index. It was found that my robots.txt was wrong. Hence it was updated and I'm unsure it will work or not. :confused:

The problem is duplicate contents. same pages have3 urls.
User-agent: *
Disallow: /post-*.html$
Disallow: /updates-topic.html*$
Disallow: /stop-updates-topic.html*$
Disallow: /ptopic*.html$
Disallow: /ntopic*.html$

Thanks,
Tuning :)

noppid
Jun 8th 2005, 6:01 am
IIRC, you can't use wildcards in the paths. :)

Tuning
Jun 8th 2005, 7:59 am
IIRC, you can't use wildcards in the paths. :)

But noppid , this was the exact code I got from able2know mod.
#
#-----[ OPEN ]------------------------------------------
#

robots.txt

Disallow: forums/post-*.html$
Disallow: forums/updates-topic.html*$
Disallow: forums/stop-updates-topic.html*$
Disallow: forums/ptopic*.html$
Disallow: forums/ntopic*.html$

And as far as i can understand ( sorry for my n00bness :o ) they built this mod for www.domain.com/forums/

And for my forum, it is on a subdomain and hence I removed the "forums" part.

Regards,
Tuning :)

noppid
Jun 8th 2005, 8:17 am
Big discussion at DP: http://forums.digitalpoint.com/showthread.php?t=6894

I have no clue why they made it that way. Wildcards don't work in the path. There are many many places to verify that. http://www.aim-pro.com/helpfiles/robots-txt.html

I dunno on that one.

Also, depending on how your server does the redirect for the subdomain, the robots.txt file may not be found in the subdomain folder. Bots may be looking for it in the root folder. You can probably tell which is getting hit in the control panel to sort that out.

Tuning
Jun 8th 2005, 8:55 am
Thanks noppid. Thats great info. I will check the cpanel and see what is in there.

Thanks for the help. :)

noppid
Jun 8th 2005, 9:02 am
Thanks noppid. Thats great info. I will check the cpanel and see what is in there.

Thanks for the help. :)

Glad to help. I learned a little too. It's not like I knew all that without some research. :D