View Full Version : Do I Need ROBOT.TXT on my website???
sunchy
Dec 28th 2005, 1:30 am
Do I Need ROBOT.TXT on my website???
I see that spider visit my new site and I see that he get code 404.
I check and it is no robot.txt on my website.
If I need, how to build best robot.txt file
All The Best
Sunchy
monRa
Dec 28th 2005, 1:40 am
I think you must add one! I had the same problem when I saw in my stats that unsuccesfull hits where done to it so I created one!
here is mine
User-agent: WebmasterWorldForumBot
Disallow: /
User-agent: URL_Spider_Pro
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: LNSpiderguy
Disallow: /
User-agent: Black Hole
Disallow: /
User-agent: Titan
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: NetMechanic
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: Wget
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: mozilla/4
Disallow: /
User-agent: mozilla/5
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: moget/2.1
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: RMA
Disallow: /
User-agent: libWeb/clsHTTP
Disallow: /
User-agent: asterias
Disallow: /
User-agent: httplib
Disallow: /
User-agent: turingos
Disallow: /
User-agent: spanner
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: Wget/1.6
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: Wget/1.5.3
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: moget
Disallow: /
User-agent: hloader
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VCI
Disallow: /
User-agent: Szukacz/1.4
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: Openfind data gathere
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: LinkScan/8.1a Unix
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Cegbfeieh
Disallow: /
A big forums was using this robots.txt so I thought it was good!
Digitalpoint has this robots
User-agent: *
Disallow: /tools/suggestion/?
Disallow: /search.php
Disallow: /go.php
Disallow: /~shawn/scripts/
User-agent: Googlebot
Disallow: /~shawn/ebay_
Jean-Luc
Dec 28th 2005, 2:07 am
Do I Need ROBOT.TXT on my website???
I see that spider visit my new site and I see that he get code 404.
Hi,
No web robot looks for a robot.txt file. Robots search for the robots.txt file.
Do you need it ? Many sites do not need it. Do not add one if you do not know why you add it. The only advantage would be that the "error 404" codes would disappear from your stats. This "error" does not mean that there is a problem with your site. It just means that you allow all robots to visit and index all pages. So, for web robots, that's a perfect answer.
You should use a robots.txt file only if you want to disallow access to some parts of your site to robots. Why would you do that :
- to reduce the bandwidth consumption;
- to avoid that search engines visit the same page several times through different addresses.
Never use a robots.txt file to "protect" sensible informations.
Jean-Luc
blissallstar
Dec 28th 2005, 2:22 am
It really depends. Is there any folders on your website that you don't want spiders to spider?
Like the person above me has said:
You should use a robots.txt file only if you want to disallow access to some parts of your site to robots.
robots.txt file isn't good to tell where to spider, just basicly tells it where not to spider. I wouldn't depend on it though, because spiders don't have to follow robots.txt, it is just that it's web ediquct <sp>
Perrow
Dec 28th 2005, 2:26 am
To get rid of the 404s in your stats you can use an empty robots.txt. Search for and read up on the robots.txt specification, then form your own opinion about what to include in it. Check your logs and see if some robot is hitting you unreasonably and try to exclude it (some robots dont giva a rats a-- about robots.txt so be prepared to fail).
sunchy
Dec 28th 2005, 3:36 am
Thanks for informations Guys :)
It is help me a lot ...
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.