is this robo file right or wrong

QueenEve Active Member

Messages:: 256

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 53

#1

can you check for me if this robot file is right or wrong , and if possible to explain for me why it is right or why it is wrong

http://www.tripontop.com/robots.txt

thanks in advance

QueenEve, Oct 15, 2009 IP

WebshoppeSolutions Peon

Messages:: 139

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#2

QueenEve said: ↑

can you check for me if this robot file is right or wrong , and if possible to explain for me why it is right or why it is wrong

http://www.tripontop.com/robots.txt

thanks in advance
Click to expand...

A little over the top I think.

See if you can keep it fairly simple.

-----------------------------------------------

# Begin block Bad-Robots from robots.txt
User-agent: asterias
Disallow:/
User-agent: BotALot
Disallow:/
User-agent: BuiltBotTough
Disallow:/
User-agent: BunnySlippers
Disallow:/
User-agent: Cegbfeieh
Disallow:/
User-agent: CheeseBot
Disallow:/
User-agent: CherryPicker
Disallow:/
User-agent: CopyRightCheck
Disallow:/
User-agent: cosmos
Disallow:/
User-agent: Crescent
Disallow:/
User-agent: DittoSpyder
Disallow:/
User-agent: EmailCollector
Disallow:/
User-agent: EmailSiphon
Disallow:/
User-agent: EmailWolf
Disallow:/
User-agent: EroCrawler
Disallow:/
User-agent: ExtractorPro
Disallow:/
User-agent: Foobot
Disallow:/
User-agent: hloader
Disallow:/
User-agent: httplib
Disallow:/
User-agent: humanlinks
Disallow:/
User-agent: InfoNaviRobot
Disallow:/
User-agent: JennyBot
Disallow:/
User-agent: LexiBot
Disallow:/
User-agent: LinkextractorPro
Disallow:/
User-agent: LinkWalker
Disallow:/
User-agent: LNSpiderguy
Disallow:/
User-agent: lwp-trivial
Disallow:/
User-agent: MIIxpc
Disallow:/
User-agent: moget
Disallow:/
User-agent: NetAnts
Disallow:/
User-agent: NICErsPRO
Disallow:/
User-agent: Openfind
Disallow:/
User-agent: ProWebWalker
Disallow:/
User-agent: RepoMonkey
Disallow:/
User-agent: RMA
Disallow:/
User-agent: SiteSnagger
Disallow:/
User-agent: SpankBot
Disallow:/
User-agent: spanner
Disallow:/
User-agent: suzuran
Disallow:/
User-agent: Teleport
Disallow:/
User-agent: TeleportPro
Disallow:/
User-agent: Telesoft
Disallow:/
User-agent: TheNomad
Disallow:/
User-agent: TightTwatBot
Disallow:/
User-agent: Titan
Disallow:/
User-agent: True_Robot
Disallow:/
User-agent: turingos
Disallow:/
User-agent: VCI
Disallow:/
User-agent: WebAuto
Disallow:/
User-agent: WebBandit
Disallow:/
User-agent: WebCopier
Disallow:/
User-agent: WebEnhancer
Disallow:/
User-agent: WebmasterWorldForumBot
Disallow:/
User-agent: WebSauger
Disallow:/
User-agent: WebStripper
Disallow:/
User-agent: WebZip
Disallow:/
User-agent: Wget
Disallow:/
User-agent: WWW-Collector-E
Disallow:/
User-agent: Xenu's
Disallow:/
User-agent: Zeus
Disallow:/
# Begin Exclusion From Directories from robots.txt
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php

Sitemap: http://www.tripontop.com/sitemap.xml.gz

----------------------------------------

Why people insist on adding version and build numbers ( "4.01", or, /1.0") is beyond me .. but it's been done for years and bots will blow on by without even giving version numbers a second glance.

Fact of the matter is that most of these listed above, if being manipulated at all by the one that runs them, won't pay any attention to your robots.txt file anyway.

Google? well, truth be known .. if you have any Google ads on your site, then blocking Mediapartners-Google or related Google ad bots won't work .. they'll come on in anyway.

Oh, and unzip your sitemap.xml ... it'll give the search engines one less hoop to jump through while indexing your site.

If you are really really serious about blocking site scrapers, spam-bots, and nosey-nates, then I'll suggest you do all of the blocking through your .htaccess file .. stops them cold .. guaranteed.

Things change fast on the net ... and I try to keep my tools as up-to-date as possible.
In order to get a robots.txt that will validate, you may want to visit our robots.txt tool here;
http://www.webshoppesolutions.com/bottxt_generator.htm

Last edited: Oct 15, 2009

WebshoppeSolutions, Oct 15, 2009 IP

sherone Well-Known Member

Messages:: 1,539

Likes Received:: 16

Best Answers:: 0

Trophy Points:: 130

#3

WebshoppeSolutions said: ↑

A little over the top I think.

See if you can keep it fairly simple.

-----------------------------------------------

# Begin block Bad-Robots from robots.txt
User-agent: asterias
Disallow:/
User-agent: BotALot
Disallow:/
User-agent: BuiltBotTough
Disallow:/
User-agent: BunnySlippers
Disallow:/
User-agent: Cegbfeieh
Disallow:/
User-agent: CheeseBot
Disallow:/
User-agent: CherryPicker
Disallow:/
User-agent: CopyRightCheck
Disallow:/
User-agent: cosmos
Disallow:/
User-agent: Crescent
Disallow:/
User-agent: DittoSpyder
Disallow:/
User-agent: EmailCollector
Disallow:/
User-agent: EmailSiphon
Disallow:/
User-agent: EmailWolf
Disallow:/
User-agent: EroCrawler
Disallow:/
User-agent: ExtractorPro
Disallow:/
User-agent: Foobot
Disallow:/
User-agent: hloader
Disallow:/
User-agent: httplib
Disallow:/
User-agent: humanlinks
Disallow:/
User-agent: InfoNaviRobot
Disallow:/
User-agent: JennyBot
Disallow:/
User-agent: LexiBot
Disallow:/
User-agent: LinkextractorPro
Disallow:/
User-agent: LinkWalker
Disallow:/
User-agent: LNSpiderguy
Disallow:/
User-agent: lwp-trivial
Disallow:/
User-agent: MIIxpc
Disallow:/
User-agent: moget
Disallow:/
User-agent: NetAnts
Disallow:/
User-agent: NICErsPRO
Disallow:/
User-agent: Openfind
Disallow:/
User-agent: ProWebWalker
Disallow:/
User-agent: RepoMonkey
Disallow:/
User-agent: RMA
Disallow:/
User-agent: SiteSnagger
Disallow:/
User-agent: SpankBot
Disallow:/
User-agent: spanner
Disallow:/
User-agent: suzuran
Disallow:/
User-agent: Teleport
Disallow:/
User-agent: TeleportPro
Disallow:/
User-agent: Telesoft
Disallow:/
User-agent: TheNomad
Disallow:/
User-agent: TightTwatBot
Disallow:/
User-agent: Titan
Disallow:/
User-agent: True_Robot
Disallow:/
User-agent: turingos
Disallow:/
User-agent: VCI
Disallow:/
User-agent: WebAuto
Disallow:/
User-agent: WebBandit
Disallow:/
User-agent: WebCopier
Disallow:/
User-agent: WebEnhancer
Disallow:/
User-agent: WebmasterWorldForumBot
Disallow:/
User-agent: WebSauger
Disallow:/
User-agent: WebStripper
Disallow:/
User-agent: WebZip
Disallow:/
User-agent: Wget
Disallow:/
User-agent: WWW-Collector-E
Disallow:/
User-agent: Xenu's
Disallow:/
User-agent: Zeus
Disallow:/
# Begin Exclusion From Directories from robots.txt
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php

Sitemap: http://www.tripontop.com/sitemap.xml.gz

----------------------------------------

Why people insist on adding version and build numbers ( "4.01", or, /1.0") is beyond me .. but it's been done for years and bots will blow on by without even giving version numbers a second glance.

Fact of the matter is that most of these listed above, if being manipulated at all by the one that runs them, won't pay any attention to your robots.txt file anyway.

Google? well, truth be known .. if you have any Google ads on your site, then blocking Mediapartners-Google or related Google ad bots won't work .. they'll come on in anyway.

Oh, and unzip your sitemap.xml ... it'll give the search engines one less hoop to jump through while indexing your site.

If you are really really serious about blocking site scrapers, spam-bots, and nosey-nates, then I'll suggest you do all of the blocking through your .htaccess file .. stops them cold .. guaranteed.

Things change fast on the net ... and I try to keep my tools as up-to-date as possible.
In order to get a robots.txt that will validate, you may want to visit our robots.txt tool here;
http://www.webshoppesolutions.com/bottxt_generator.htm
Click to expand...

I simply write

User-agent: *
Sitemap: http://www.myurl/sitemap.xml

Is not correct?
shout I disallow all the bot like above?

sherone, Oct 15, 2009 IP

WebshoppeSolutions Peon

Messages:: 139

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#4

sherone said: ↑

I simply write

User-agent: *
Sitemap: http://www.myurl/sitemap.xml

Is not correct?
shout I disallow all the bot like above?
Click to expand...

Sure .. your way could work providing you specify Allow or Disallow

User-agent: *
Disallow: /

or

User-agent: *
Allow: /

In that the "*" refers to "all" robots and parsing agents.

If you want only Google to visit you, you can make an exception for Google this way;

User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /

WebshoppeSolutions, Oct 15, 2009 IP

Traffic-Bug Active Member

Messages:: 1,866

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 80

#5

I usually use this robots.txt file and it works well for me.

Sitemap: http://www.domain.com/sitemap.xml
User-Agent: *
Allow: /

Traffic-Bug, Oct 15, 2009 IP

QueenEve Active Member

Messages:: 256

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 53

#6

Sherone

Reference to this http://www.fleiner.com/bots/ I think you should disallow some of them and dont allow all of the robots

QueenEve, Oct 16, 2009 IP

slidetheweb Peon

Messages:: 27

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#7

Guessing here but isn't it shorter to Allow all the browser user-agents instead of blocking the zillion spiders ?

slidetheweb, Oct 17, 2009 IP

Log in or Sign up

is this robo file right or wrong

QueenEve Active Member

WebshoppeSolutions Peon

sherone Well-Known Member

WebshoppeSolutions Peon

Traffic-Bug Active Member

QueenEve Active Member

slidetheweb Peon

Useful Searches