This is the second time I've had this happen to one of the forums I maintain (SMF Based) - hundreds (climbing towards thousands) of guest logins from different IP addresses in the 74.6.*.* and 69.147.*.* address ranges... which resolves to guess who? OrgName: Inktomi Corporation OrgID: INKT Address: 701 First Ave City: Sunnyvale StateProv: CA PostalCode: 94089 Country: US NetRange: 74.6.0.0 - 74.6.255.255 CIDR: 74.6.0.0/16 NetName: INKTOMI-BLK-6 NetHandle: NET-74-6-0-0-1 Parent: NET-74-0-0-0-0 NetType: Direct Allocation NameServer: NS1.YAHOO.COM NameServer: NS2.YAHOO.COM NameServer: NS3.YAHOO.COM NameServer: NS4.YAHOO.COM NameServer: NS5.YAHOO.COM Comment: RegDate: 2006-02-13 Updated: 2007-03-09 I ended up having to ban both regions as in the past 48 hours those two IP addresses have pulled close to twenty gigs of bandwidth - and the rate was steadily increasing as more and more 'guests' from that range targeted my site... and if I watched the "who's online" you'd see different IP's in that range accessing the same threads over and over as if multiple indexing spiders were going through the site SIMULTANEOUSLY - checking and rechecking the same data over and over again. Is this 'normal'?
Ive noticed on all my sites a sharp increase in the crawling rate of yahoo's crawler as I imagine everyone has but there is a point when it just gets beyond a joke which you have obviously just showcased.
Yeah I've had a lot of hits from yahoo spiders as well... it started last week and i've seen a lot of people getting the same problem.
Yahoo has started to index my sites better but they need to do it alot more efficiently and maybe take a few pointers from google.
Maybe these entries were the visitors from Y! search results, with attached spider. They do collect the info to do the better ranking, like how long do visitor stays on the site for the particular keyword and so on. This is just my guess i don't whats this.
There is 29 Yahoo spiders on my site right now and the highest so far logged has been 52 online at the same time. I just checked my site's bandwith usage stats and I am fine(I have enough bandwith and my site is pretty new) so I won't ban them but if it gets to a point where it is ridiculous and slows down my site then I will consider banning the whole rage altogether but for now I won't ban them.
Yahoo is getting desperate to come up with better search results and they think that adding more spiders will somehow make their algorithm better (it won't). And, hey, if a couple small businesses get lagged out of business, that doesn't concern them ..
Wow thats a lot of bots. HOw many visits do you get from google etc? What is the PR and backlink count of this site?
Usually google is in and out so quick I never even notice them - and being relevent searches it comes up #1 that's fine by me. Usually with MSN they count for four logins at once for about an hour a day. "Normally" I see about 20-30 guests total, and of those I'd say half are search engines... PR 5 / 51407 backlinks - which is pretty good for a niche site about a board/miniatures game (that's spawned a card game, a dozen video games, a series of over 40 novels, a spinoff miniatures game, and a really crappy cartoon). 'normal' traffic these days is 6 gigs/20,000 visits/140,000 "pages" a day. (+20%/-10% for the peaks) - so when Yahoo up and decides to chew up over double the traffic of everyone else accessing the server - it's fairly noticable.
There was another user here on DP a few months back who had a very similar experience. He ended up tracking why it happened. Apparently he used YSM and had wiped all his campaigns and started them up again. YSM checks all campaign sites and sent quite a lot of spiders to that webmaster's site as a result. Just curious if you used YSM for this site?
Forums get ransacked bad becauase they have so many pages. I had like 170 on at one time and my forum was getting seriously slowed down to the point where I was changing what servers my sites were on and uninstalling mods. I guess it prepared me for heavier loads...
Nope - that pay per click rubbish has no place on this type of website. (nor does advertising banners of any sort - we have our own product why on earth advertise other people's rubbish?) But then - I consider 99% of the 'marketing' and adverising rubbish on the web a nonsensical scam.
how long did you have the robots.txt disallowing slurp? Most times that change can take up to 24 hours for the bots to get it again. Also did you try to just slow the bot down by using a 120 or 240 delay in robots.txt?
Since I wiped the server three weeks ago and started with a fresh install of Debian my way instead of my providers way Which is why I was suprised when they started hammering us last week - it's like it took them a while to realize that I wasn't blocking them by IP any more. Really annoying was that within a minute of my blocking the 74.6 range - that's when the 69.147 range 'took over'. Quite literally prior to my blocking 74.6.* there were no 69.147.* logged in. I ban the first, the second one starts hammering almost immediately. I swear, between them leeching five times the bandwidth of my entire user base combined, trying to push that steaming pile of crap framework as good web design, touting standards with the biggest pile of web-rot out there, lack of dynamic fonts in completely absurd baby-sizes making their sites nigh inaccessable unless I zoom in 50%... Somebody put a fork in 'em. If it wasn't for the ignorance of Joe sixpack I think yahoo would have faded into obscurity YEARS ago.