Bad Credit Mortgages - Credit - Loans - Credit Card - Online Loans

PDA

View Full Version : robots.txt - How to ban everyone except Google


lim (x² - 5x³) = -∞
Mar 27th 2008, 9:21 am
Hello, I want to make robots.txt that will allow only Google to index web site. So it will be like:

User-agent: Google
Allow: /

User-agent: *
Disallow: /

But I am not sure if it will allow access to ALL google bots.

Do I need to add separate rule for each type of bot? Like:

User-agent: Mediapartners-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

etc.

And if yes, can you give me the list of all google bots, by User-agent?

PS yes, I understand that many bots ignore robots.txt

lim (x² - 5x³) = -∞
Apr 1st 2008, 3:35 pm
digital bump

ssandecki
Apr 1st 2008, 3:59 pm
You must make a different entry for each of Google's web robots, so yes you must make the entry for the other two. The real question is why would you want to block Yahoo!, Live! Search & msn. :confused:

lim (x² - 5x³) = -∞
Apr 2nd 2008, 7:07 am
This is one of experiments, I don't need other traffic sources. Anyone can help to make a list of bots?

The Stealthy One
Apr 2nd 2008, 8:05 am
But will that work with the *, too? Wouldn't that over-ride the first Allow?

enriquerojas
Apr 2nd 2008, 8:27 am
It should work. By logic the agents gets matched when it hits Google if it doesn't match it will continue over the list.

vitalous
Apr 2nd 2008, 8:32 am
Google spiders sites good in Live & Yahoo...
Aren't you shooting self in foot?

windtalker
Apr 2nd 2008, 11:03 am
That will not work, you need to address each individual agent, as example:

User-agent: Google
Allow: /

User-agent: Slurp
Disallow: /

User-agent: Msn
Disallow: /

ssandecki
Apr 2nd 2008, 11:09 am
This is one of experiments, I don't need other traffic sources. Anyone can help to make a list of bots?

http://www.user-agents.org/

guidyy
Apr 2nd 2008, 11:19 am
I dont know what kind of experiment are you doing, but some bad robots will not follow directives from your robots.txt.
If you want to block all bots except google u will need a bad bot trap also.

ashisharora_83
Apr 7th 2008, 3:21 am
That will not work, you need to address each individual agent, as example:

User-agent: Google
Allow: /

User-agent: Slurp
Disallow: /

User-agent: Msn
Disallow: /

This looks ok to me... thanks windtalker for sharing...

--Ashish

manish.chauhan
Apr 7th 2008, 7:44 am
Hello, I want to make robots.txt that will allow only Google to index web site. So it will be like:

User-agent: Google
Allow: /

User-agent: *
Disallow: /

But I am not sure if it will allow access to ALL google bots.

Do I need to add separate rule for each type of bot? Like:

User-agent: Mediapartners-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

etc.

And if yes, can you give me the list of all google bots, by User-agent?

PS yes, I understand that many bots ignore robots.txt

No you do not need to allow separate Google bots, just allow only Google, all its bots will be allowed automatically...
Here is the list of Google Bots (http://www.iplists.com/) and also you can find spammy bots here (http://seocrazy.blogspot.com/2008/04/spammy-robots-list.html), what you should block to reduce your bandwidth usage and cut down your cost.

manish.chauhan
Apr 7th 2008, 7:47 am
PS yes, I understand that many bots ignore robots.txt
In that case you can block them through their IP adress by using .htaccess..
I hope..this will help you :)

tekz999
Apr 7th 2008, 7:53 am
you have a very 31337 us3rn3m3, ph34r j00.

lim (x² - 5x³) = -∞
Apr 7th 2008, 8:54 am
No you do not need to allow separate Google bots, just allow only Google, all its bots will be allowed automatically...

Thats nice, thanks for information, but dont you remember where you read about it?

manish.chauhan
Apr 7th 2008, 11:08 pm
You can find more info at
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40364

lim (x² - 5x³) = -∞
Apr 8th 2008, 8:39 am
You can find more info at
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40364

Thanks, I think thats what I need.

manish.chauhan
Apr 9th 2008, 3:15 am
you have a very 31337 us3rn3m3, ph34r j00.

????:confused:

jasoncdu
Apr 9th 2008, 9:09 pm
????:confused:

He means You have a very elite username ... fear you? Kinda iffy on that last part. The reason he types in 1337 language is because the creator of this post's username is all in numbers, characters etc.

Jason

manish.chauhan
Apr 10th 2008, 3:56 am
He means You have a very elite username ... fear you? Kinda iffy on that last part. The reason he types in 1337 language is because the creator of this post's username is all in numbers, characters etc.

Jason
Thanks to make it clear..

jasp
Jun 2nd 2008, 10:57 am
That will not work, you need to address each individual agent

This is not true if the robot is standards compliant. I'd be surprised if the big 5 didn't understand the official robots.txt standard.

To allow only specific robots, like those from google you can use

User-agent: *
Disallow: /

User-agent: Googlebot
User-agent: Googlebot-Mobile
User-agent: Googlebot-Image
User-agent: Mediapartners-Google
User-agent: Adsbot-Google
Disallow:



Remembering that the 'allow' directive is non-standard.

dermax
Jun 2nd 2008, 1:49 pm
I can confirm jasp's solution.