I have a site where the script has got arround 5 mb of code and 1 or 2 mb in images. The following ip devoured me 2gb yesterday!! I saw it in AWStats!!: h46.11.19.98.ip.windstream.net Code (markup): Is it a bot? A joke? Anyone has experience with that type of traffic? How can I block that bot in my robots.txt ? Anyone know it? My current file has got the following lines: ### START FILE ### # Bots not allowed: User-agent: Gigabot User-agent: Pioneer User-agent: InternetSeer User-agent: BBot User-agent: Walhello appie User-agent: WebZip User-agent: larbin User-agent: b2w/0.1 User-agent: psbot User-agent: Python-urllib User-agent: URL_Spider_Pro User-agent: CherryPicker User-agent: EmailCollector User-agent: EmailSiphon User-agent: WebBandit User-agent: EmailWolf User-agent: ExtractorPro User-agent: CopyRightCheck User-agent: Crescent User-agent: SiteSnagger User-agent: ProWebWalker User-agent: CheeseBot User-agent: LNSpiderguy User-agent: Alexibot User-agent: Teleport User-agent: TeleportPro User-agent: MIIxpc User-agent: Telesoft User-agent: Website Quester User-agent: moget/2.1 User-agent: WebZip/4.0 User-agent: WebStripper User-agent: WebSauger User-agent: WebCopier User-agent: NetAnts User-agent: Mister PiX User-agent: WebAuto User-agent: TheNomad User-agent: WWW-Collector-E User-agent: RMA User-agent: libWeb/clsHTTP User-agent: asterias User-agent: httplib User-agent: turingos User-agent: spanner User-agent: InfoNaviRobot User-agent: Harvest/1.5 User-agent: Bullseye/1.0 User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 User-agent: CherryPickerSE/1.0 User-agent: CherryPickerElite/1.0 User-agent: WebBandit/3.50 User-agent: NICErsPRO User-agent: Microsoft URL Control - 5.01.4511 User-agent: DittoSpyder User-agent: Foobot User-agent: WebmasterWorldForumBot User-agent: SpankBot User-agent: BotALot User-agent: lwp-trivial/1.34 User-agent: lwp-trivial User-agent: BunnySlippers User-agent: Microsoft URL Control - 6.00.8169 User-agent: URLy Warning User-agent: Wget/1.6 User-agent: Wget/1.5.3 User-agent: Wget User-agent: LinkWalker User-agent: cosmos User-agent: moget User-agent: hloader User-agent: humanlinks User-agent: LinkextractorPro User-agent: Offline Explorer User-agent: Mata Hari User-agent: LexiBot User-agent: Web Image Collector User-agent: The Intraformant User-agent: True_Robot/1.0 User-agent: True_Robot User-agent: BlowFish/1.0 User-agent: JennyBot User-agent: MIIxpc/4.2 User-agent: BuiltBotTough User-agent: ProPowerBot/2.14 User-agent: BackDoorBot/1.0 User-agent: toCrawl/UrlDispatcher User-agent: WebEnhancer User-agent: suzuran User-agent: VCI WebViewer VCI WebViewer Win32 User-agent: VCI User-agent: Szukacz/1.4 User-agent: QueryN Metasearch User-agent: Openfind data gathere User-agent: Openfind User-agent: Xenu's Link Sleuth 1.1c User-agent: Xenu's User-agent: Zeus User-agent: RepoMonkey Bait & Tackle/v1.01 User-agent: RepoMonkey User-agent: Microsoft URL Control User-agent: Openbot User-agent: URL Control User-agent: Zeus Link Scout User-agent: Zeus 32297 Webster Pro V2.9 Win32 User-agent: Webster Pro User-agent: EroCrawler User-agent: LinkScan/8.1a Unix User-agent: Keyword Density/0.9 User-agent: Kenjin Spider User-agent: Iron33/1.0.2 User-agent: Bookmark search tool User-agent: GetRight/4.2 User-agent: FairAd Client User-agent: Gaisbot User-agent: Aqua_Products User-agent: Radiation Retriever 1.1 User-agent: Flaming AttackBot User-agent: Oracle Ultra Search User-agent: MSIECrawler User-agent: PerMan User-agent: searchpreview User-agent: baiduspider User-agent: ia_archiver Disallow: / # Optional sitemap URL: Sitemap: http://www.mydomain.com/sitemap.xml ### END FILE ### Code (markup): Thank you!
If that is a bad-bevahing bot, it will not likely obey your robots.txt file, in that case you should .htaccess block them: order allow,deny deny from 123.456.789.255 allow from all Code (markup): *Where 123.456.789.255 is the IP address of the abuse.
Since it's the topic of discussion, is there any way to place a block on these universally from crawling your files, or do you have to catch the ip?
Mostly you will have to catch each IP. There are some programs out there that do cost $ but if you are very concerned about bots stealing your content it may be worth the price.