Yahoo spider damned near DoS Attack?

Discussion in 'Yahoo' started by deathshadow, Apr 1, 2008.

  1. #1
    This is the second time I've had this happen to one of the forums I maintain (SMF Based) - hundreds (climbing towards thousands) of guest logins from different IP addresses in the 74.6.*.* and 69.147.*.* address ranges... which resolves to guess who?

    OrgName: Inktomi Corporation
    OrgID: INKT
    Address: 701 First Ave
    City: Sunnyvale
    StateProv: CA
    PostalCode: 94089
    Country: US

    NetRange: 74.6.0.0 - 74.6.255.255
    CIDR: 74.6.0.0/16
    NetName: INKTOMI-BLK-6
    NetHandle: NET-74-6-0-0-1
    Parent: NET-74-0-0-0-0
    NetType: Direct Allocation
    NameServer: NS1.YAHOO.COM
    NameServer: NS2.YAHOO.COM
    NameServer: NS3.YAHOO.COM
    NameServer: NS4.YAHOO.COM
    NameServer: NS5.YAHOO.COM
    Comment:
    RegDate: 2006-02-13
    Updated: 2007-03-09

    I ended up having to ban both regions as in the past 48 hours those two IP addresses have pulled close to twenty gigs of bandwidth - and the rate was steadily increasing as more and more 'guests' from that range targeted my site... and if I watched the "who's online" you'd see different IP's in that range accessing the same threads over and over as if multiple indexing spiders were going through the site SIMULTANEOUSLY - checking and rechecking the same data over and over again.

    Is this 'normal'?
     
    deathshadow, Apr 1, 2008 IP
    ninjashoes likes this.
  2. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,999
    Best Answers:
    253
    Trophy Points:
    515
    #2
    Oh, BTW it seemed to ignore the entry in my robots.txt disallowing SLURP.
     
    deathshadow, Apr 2, 2008 IP
  3. benjaminp

    benjaminp Guest

    Messages:
    1,212
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    230
    #3
    Ive noticed on all my sites a sharp increase in the crawling rate of yahoo's crawler as I imagine everyone has but there is a point when it just gets beyond a joke which you have obviously just showcased.
     
    benjaminp, Apr 2, 2008 IP
  4. Germz

    Germz Peon

    Messages:
    1,109
    Likes Received:
    39
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Yeah I've had a lot of hits from yahoo spiders as well... it started last week and i've seen a lot of people getting the same problem.
     
    Germz, Apr 2, 2008 IP
  5. benjaminp

    benjaminp Guest

    Messages:
    1,212
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    230
    #5
    Yahoo has started to index my sites better but they need to do it alot more efficiently and maybe take a few pointers from google.
     
    benjaminp, Apr 2, 2008 IP
  6. Loonm

    Loonm Peon

    Messages:
    744
    Likes Received:
    23
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Maybe these entries were the visitors from Y! search results, with attached spider. They do collect the info to do the better ranking, like how long do visitor stays on the site for the particular keyword and so on.
    This is just my guess i don't whats this.
     
    Loonm, Apr 2, 2008 IP
  7. rep-

    rep- Active Member

    Messages:
    83
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    93
    #7
    There is 29 Yahoo spiders on my site right now and the highest so far logged has been 52 online at the same time. I just checked my site's bandwith usage stats and I am fine(I have enough bandwith and my site is pretty new) so I won't ban them but if it gets to a point where it is ridiculous and slows down my site then I will consider banning the whole rage altogether but for now I won't ban them.
     
    rep-, Apr 2, 2008 IP
  8. kenbrower

    kenbrower Well-Known Member

    Messages:
    574
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    120
    #8
    Yahoo is getting desperate to come up with better search results and they think that adding more spiders will somehow make their algorithm better (it won't). And, hey, if a couple small businesses get lagged out of business, that doesn't concern them ..
     
    kenbrower, Apr 2, 2008 IP
  9. mhmdkhamis

    mhmdkhamis Well-Known Member

    Messages:
    1,097
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    145
    #9
    yahoo spiders visit my site more than yahoo search visitors
     
    mhmdkhamis, Apr 2, 2008 IP
  10. domainer_10

    domainer_10 Peon

    Messages:
    1,720
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Wow thats a lot of bots. HOw many visits do you get from google etc? What is the PR and backlink count of this site?
     
    domainer_10, Apr 2, 2008 IP
  11. godsofchaos

    godsofchaos Peon

    Messages:
    2,595
    Likes Received:
    124
    Best Answers:
    0
    Trophy Points:
    0
    #11
    yeah this is a "hot topic" lately, everyone seems to be DoS-ed by Yahoo! which is odd!
     
    godsofchaos, Apr 2, 2008 IP
  12. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,999
    Best Answers:
    253
    Trophy Points:
    515
    #12
    Usually google is in and out so quick I never even notice them - and being relevent searches it comes up #1 that's fine by me. Usually with MSN they count for four logins at once for about an hour a day. "Normally" I see about 20-30 guests total, and of those I'd say half are search engines...

    PR 5 / 51407 backlinks - which is pretty good for a niche site about a board/miniatures game (that's spawned a card game, a dozen video games, a series of over 40 novels, a spinoff miniatures game, and a really crappy cartoon). 'normal' traffic these days is 6 gigs/20,000 visits/140,000 "pages" a day. (+20%/-10% for the peaks) - so when Yahoo up and decides to chew up over double the traffic of everyone else accessing the server - it's fairly noticable.
     
    deathshadow, Apr 2, 2008 IP
  13. Claudek

    Claudek Well-Known Member

    Messages:
    1,379
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    165
    #13
    There was another user here on DP a few months back who had a very similar experience. He ended up tracking why it happened. Apparently he used YSM and had wiped all his campaigns and started them up again. YSM checks all campaign sites and sent quite a lot of spiders to that webmaster's site as a result.

    Just curious if you used YSM for this site?
     
    Claudek, Apr 2, 2008 IP
  14. ninjashoes

    ninjashoes Well-Known Member

    Messages:
    1,401
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    138
    #14
    Forums get ransacked bad becauase they have so many pages. I had like 170 on at one time and my forum was getting seriously slowed down to the point where I was changing what servers my sites were on and uninstalling mods. I guess it prepared me for heavier loads...
     
    ninjashoes, Apr 2, 2008 IP
  15. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,999
    Best Answers:
    253
    Trophy Points:
    515
    #15
    Nope - that pay per click rubbish has no place on this type of website. (nor does advertising banners of any sort - we have our own product why on earth advertise other people's rubbish?)

    But then - I consider 99% of the 'marketing' and adverising rubbish on the web a nonsensical scam.
     
    deathshadow, Apr 3, 2008 IP
  16. masterson5

    masterson5 Banned

    Messages:
    24
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    I've been getting the same problems myself. Looks like it's time for dedicated hosting
     
    masterson5, Apr 4, 2008 IP
  17. FightRice

    FightRice Peon

    Messages:
    1,082
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #17
    how long did you have the robots.txt disallowing slurp? Most times that change can take up to 24 hours for the bots to get it again.

    Also did you try to just slow the bot down by using a 120 or 240 delay in robots.txt?
     
    FightRice, Apr 4, 2008 IP
  18. snowbird

    snowbird Notable Member

    Messages:
    3,036
    Likes Received:
    395
    Best Answers:
    0
    Trophy Points:
    290
    #18
    I have the same problem as well. Yahoo is just eating up bandwidth...
     
    snowbird, Apr 4, 2008 IP
  19. strokemymouse

    strokemymouse Peon

    Messages:
    29
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #19
    I must be lucky :D yahoo isnt bothering me :D
     
    strokemymouse, Apr 4, 2008 IP
  20. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,999
    Best Answers:
    253
    Trophy Points:
    515
    #20
    Since I wiped the server three weeks ago and started with a fresh install of Debian my way instead of my providers way

    Which is why I was suprised when they started hammering us last week - it's like it took them a while to realize that I wasn't blocking them by IP any more.

    Really annoying was that within a minute of my blocking the 74.6 range - that's when the 69.147 range 'took over'. Quite literally prior to my blocking 74.6.* there were no 69.147.* logged in. I ban the first, the second one starts hammering almost immediately.

    I swear, between them leeching five times the bandwidth of my entire user base combined, trying to push that steaming pile of crap framework as good web design, touting standards with the biggest pile of web-rot out there, lack of dynamic fonts in completely absurd baby-sizes making their sites nigh inaccessable unless I zoom in 50%...

    Somebody put a fork in 'em. If it wasn't for the ignorance of Joe sixpack I think yahoo would have faded into obscurity YEARS ago.
     
    deathshadow, Apr 4, 2008 IP