Flights - Flights - Loans - 0 Credit Cards - Cell Phones

PDA

View Full Version : How to put robots.txt


nonflasher
Feb 9th 2006, 9:49 am
hey all,

I've a little problem. I've got at the moment only ftp access to my server to the (-sub) directory which is linked to my site.

example:

www.example.com/template/ <-here are all my files like index.htm ...

but I have to put the robots.txt into

www.example.com/ <---

How can I do that, the server runs with confixx.

jrd1mra
Feb 9th 2006, 10:22 am
you have to get ftp access to either the www/ folder or the root folder. You can do this through your site admin or your cpanel. Or ask your host to do it for you.

JEET
Feb 9th 2006, 11:09 am
I think this is how it works .

Example.com already has a robots.txt
Now you created a folder in their root .
If they don't "disallow " your folder named "templets" in robots.txt , then search engines will not have problems indexing your site .

"An empty robots is considered as Allow: all" until disallow is mentioned .

MSN did not ask for a robots file when I created a folder in my root .It used the robots of the main domain in your case example.com
Check the error logs . If there is a request for robots.txt , then you need to put one otherwise they found it .

I am not very sure about what I said above , it's just what I saw in my logs.
Let others comment .

Regards
jeet

mussolinihitler
May 7th 2006, 2:52 am
hmm....if no robots.txt means index everything. JEET, so does that mean we dont need a robots.txt if there is nothing to dissallow in our website ?

concord
May 7th 2006, 9:55 am
see if the robots.txt is already there? If not, you typically dont need it unless you want to get sophisticated about who and who not to block

DP Most
May 7th 2006, 10:00 am
Can any one you tell me its benefits pls

dizyn
May 12th 2006, 10:14 pm
can anyone share sample robots.txt file.

bentong
May 13th 2006, 6:14 am
Can any one you tell me its benefits pls
sometimes you have to blocked some robots coz they are only a waste of your bandwidth, email harvester bots, etc...

@ dizyn
http://www.seyq.com/robots.txt <- sample

just look for a robots.txt file to some popular sites, copy it and upload to your server. ;)

jrd1mra
May 13th 2006, 8:17 am
User-agent: *


User-agent: Turn It In
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: grub
Disallow: /

User-agent: looksmart
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: larbin
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: LNSpiderguy
Disallow: /


User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: WebmasterWorldForumBot
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: /1.6
Disallow: /

User-agent: /1.5.3
Disallow: /

User-agent:
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: QueryN Metasearch
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: Bookmark search tool
Disallow: /

User-agent: GetRight/4.2
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: Oracle Ultra Search
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: searchpreview
Disallow: /

User-agent: aipbot
Disallow: /

User-agent: abot
Disallow: /

sam1
Jun 3rd 2006, 4:24 am
Hmmm... very good info.

MaxPowers
Jun 3rd 2006, 7:47 am
robots.txt needs to go in your web root.... mydomain.com/robots.txt

It is an exclusion standard to kindly request that bots shouldn't visit an area or a page, but it does not 'block' bad bots. It merely suggests to 'honest' bots that they shouldn't spider an area.

I have a script on my site specifically for email harvesters that links to page after page of fake email addresses generated at random. The robots.txt lists this page as Disallowed for the sake of 'honest' bots like Google, MSN, Yahoo, (and the AutoMapIt.com spider), but 'bad' bots that ignore the robots.txt will go hog wild harvesting thousands or even millions of random, gibberish email addresses.

Robots.txt doesn't block bad bots so it's best to use that file to help good bots stay out of traps and admin areas. As for bad bots, if you find a particular one visiting you, it's best to use htaccess to block them by IP, user-agent, or other identifying info... htaccess is absolute, robots.txt is just a suggestion.

dizyn
Jun 7th 2006, 10:37 pm
sometimes you have to blocked some robots coz they are only a waste of your bandwidth, email harvester bots, etc...

@ dizyn

<- sample
just look for a robots.txt file to some popular sites, copy it and upload to your server. ;)

thanks for sharing.