They eat up to 2gb in my website!!

iseree Peon

Messages:: 129

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

I have a site where the script has got arround 5 mb of code and 1 or 2 mb in images.

The following ip devoured me 2gb yesterday!! I saw it in AWStats!!:

h46.11.19.98.ip.windstream.net

Code (markup):

Is it a bot? A joke?
Anyone has experience with that type of traffic?
How can I block that bot in my robots.txt ? Anyone know it?

My current file has got the following lines:

### START FILE ###
# Bots not allowed:
User-agent: Gigabot
User-agent: Pioneer
User-agent: InternetSeer
User-agent: BBot
User-agent: Walhello appie
User-agent: WebZip
User-agent: larbin
User-agent: b2w/0.1
User-agent: psbot
User-agent: Python-urllib
User-agent: URL_Spider_Pro
User-agent: CherryPicker
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: WebBandit
User-agent: EmailWolf
User-agent: ExtractorPro
User-agent: CopyRightCheck
User-agent: Crescent
User-agent: SiteSnagger
User-agent: ProWebWalker
User-agent: CheeseBot
User-agent: LNSpiderguy
User-agent: Alexibot
User-agent: Teleport
User-agent: TeleportPro
User-agent: MIIxpc
User-agent: Telesoft
User-agent: Website Quester
User-agent: moget/2.1
User-agent: WebZip/4.0
User-agent: WebStripper
User-agent: WebSauger
User-agent: WebCopier
User-agent: NetAnts
User-agent: Mister PiX
User-agent: WebAuto
User-agent: TheNomad
User-agent: WWW-Collector-E
User-agent: RMA
User-agent: libWeb/clsHTTP
User-agent: asterias
User-agent: httplib
User-agent: turingos
User-agent: spanner
User-agent: InfoNaviRobot
User-agent: Harvest/1.5
User-agent: Bullseye/1.0
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
User-agent: CherryPickerSE/1.0
User-agent: CherryPickerElite/1.0
User-agent: WebBandit/3.50
User-agent: NICErsPRO
User-agent: Microsoft URL Control - 5.01.4511
User-agent: DittoSpyder
User-agent: Foobot
User-agent: WebmasterWorldForumBot
User-agent: SpankBot
User-agent: BotALot
User-agent: lwp-trivial/1.34
User-agent: lwp-trivial
User-agent: BunnySlippers
User-agent: Microsoft URL Control - 6.00.8169
User-agent: URLy Warning
User-agent: Wget/1.6
User-agent: Wget/1.5.3
User-agent: Wget
User-agent: LinkWalker
User-agent: cosmos
User-agent: moget
User-agent: hloader
User-agent: humanlinks
User-agent: LinkextractorPro
User-agent: Offline Explorer
User-agent: Mata Hari
User-agent: LexiBot
User-agent: Web Image Collector
User-agent: The Intraformant
User-agent: True_Robot/1.0
User-agent: True_Robot
User-agent: BlowFish/1.0
User-agent: JennyBot
User-agent: MIIxpc/4.2
User-agent: BuiltBotTough
User-agent: ProPowerBot/2.14
User-agent: BackDoorBot/1.0
User-agent: toCrawl/UrlDispatcher
User-agent: WebEnhancer
User-agent: suzuran
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: VCI
User-agent: Szukacz/1.4 
User-agent: QueryN Metasearch
User-agent: Openfind data gathere
User-agent: Openfind 
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Xenu's
User-agent: Zeus
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: RepoMonkey
User-agent: Microsoft URL Control
User-agent: Openbot
User-agent: URL Control
User-agent: Zeus Link Scout
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-agent: Webster Pro
User-agent: EroCrawler
User-agent: LinkScan/8.1a Unix
User-agent: Keyword Density/0.9
User-agent: Kenjin Spider
User-agent: Iron33/1.0.2
User-agent: Bookmark search tool
User-agent: GetRight/4.2
User-agent: FairAd Client
User-agent: Gaisbot
User-agent: Aqua_Products
User-agent: Radiation Retriever 1.1
User-agent: Flaming AttackBot
User-agent: Oracle Ultra Search
User-agent: MSIECrawler
User-agent: PerMan
User-agent: searchpreview
User-agent: baiduspider
User-agent: ia_archiver
Disallow: /

# Optional sitemap URL:
Sitemap: http://www.mydomain.com/sitemap.xml

### END FILE ###

Code (markup):

Thank you!

iseree, Jun 19, 2008 IP

gostats Peon

Messages:: 325

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#2

If that is a bad-bevahing bot, it will not likely obey your robots.txt file, in that case you should .htaccess block them:
order allow,deny
deny from 123.456.789.255
allow from all
Code (markup):
*Where 123.456.789.255 is the IP address of the abuse.

gostats, Jun 19, 2008 IP

iseree Peon

Messages:: 129

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#3

Thank you very much!

iseree, Jun 20, 2008 IP

zelphics Active Member

Messages:: 766

Likes Received:: 15

Best Answers:: 0

Trophy Points:: 68

#4

Yea, this seems to behave exactly like someones bot.

zelphics, Jun 21, 2008 IP

KuyMedia.com Banned

Messages:: 30

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

Since it's the topic of discussion, is there any way to place a block on these universally from crawling your files, or do you have to catch the ip?

KuyMedia.com, Jun 24, 2008 IP

gostats Peon

Messages:: 325

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#6

KuyMedia.com said: ↑

Since it's the topic of discussion, is there any way to place a block on these universally from crawling your files, or do you have to catch the ip?
Click to expand...

Mostly you will have to catch each IP. There are some programs out there that do cost $ but if you are very concerned about bots stealing your content it may be worth the price.

gostats, Jun 25, 2008 IP

Log in or Sign up

They eat up to 2gb in my website!!

iseree Peon

gostats Peon

iseree Peon

zelphics Active Member

KuyMedia.com Banned

gostats Peon

Useful Searches