1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Excessive spidering of site

Discussion in 'Bing' started by mudnik, Feb 18, 2005.

  1. #1
    I have a big problem. The MSN spiderbot and yahoo spiderbot has been spidering a few pages of my sites rather excessively. The pages are pretty useless pages:

    http://www.internetmlm.net

    /Members_List-index-letter-D-sortby-uname-authid-584ee0d621f48c9ab0b4a7d8241daaf5.html

    /Members_List-index-letter-O-sortby-uname-authid-329dd717c93377ceb91190d411e82a0c.html

    The spidering is so bad that my webhost even shut down my entire site for consuming too much CPU. I have since relocated my site to another webhost.

    Top Process %CPU 17.0 [www.internetmlm.net] [/Members_List-index-letter-L-sortby-url-authid-557f90d9d3aa]
    Top Process %CPU 14.0 [www.internetmlm.net] [/Members_List-index-letter-All-sortby-url-authid-c14de1784c]
    Top Process %CPU 12.8 [www.internetmlm.net] [/Members_List-index-letter-X-sortby-url-authid-eda30773091f]

    How do I stop them from accessing these pages?

    Tried disallow members* but it didn't do the trick.
     
    mudnik, Feb 18, 2005 IP
  2. Web Gazelle

    Web Gazelle Well-Known Member

    Messages:
    3,590
    Likes Received:
    259
    Best Answers:
    0
    Trophy Points:
    155
    #2
    Disallow the spiders from going to those pages in your robots.txt file.
     
    Web Gazelle, Feb 18, 2005 IP
  3. honey

    honey Prominent Member

    Messages:
    15,555
    Likes Received:
    712
    Best Answers:
    0
    Trophy Points:
    325
    #3
    disallowing via robots.txt should work.
     
    honey, Feb 18, 2005 IP
  4. mudnik

    mudnik Peon

    Messages:
    147
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    What's the exact text to use?
    I tried disallow members* but it didn't work. Can I use wildcards?
     
    mudnik, Feb 18, 2005 IP
  5. Web Gazelle

    Web Gazelle Well-Known Member

    Messages:
    3,590
    Likes Received:
    259
    Best Answers:
    0
    Trophy Points:
    155
    #5
    You can also disallow spiders from crawling pages using meta tags
     
    Web Gazelle, Feb 18, 2005 IP
  6. daboss

    daboss Guest

    Messages:
    2,249
    Likes Received:
    151
    Best Answers:
    0
    Trophy Points:
    0
    #6
    try putting this inside of the header of your webpage...

    <meta name="robots" content="noindex">

    the above should tell se robots not to index the particular page...
     
    daboss, Feb 18, 2005 IP
  7. Chrissicom

    Chrissicom Guest

    Messages:
    261
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #7
    in robots.txt

    User-agent: *
    disallow: /members/ or members.htm etc.
     
    Chrissicom, Feb 19, 2005 IP
  8. Sirxl

    Sirxl Peon

    Messages:
    270
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #8
    thanks for info, guys
     
    Sirxl, Feb 27, 2005 IP
  9. nfzgrld

    nfzgrld Peon

    Messages:
    524
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Are those session ID strings there? If so that could be your problems. Session IDs can sometimes make spiders get stuck, especially if the ID changes every time it hits. Find a way to turn off the session IDs when the bots hit and this problem might just go away that fast.
     
    nfzgrld, Feb 27, 2005 IP
  10. mudnik

    mudnik Peon

    Messages:
    147
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    How do I turn off session IDs?
     
    mudnik, Mar 1, 2005 IP
  11. Cyclops

    Cyclops sensei

    Messages:
    1,241
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Sorry if going a little off topic but is the MSN bot the same as the Yahoo bot.
    On my Sites Admin stats the Yahoo bot is constantly there under multiple IP addresses gobbling up heaps of bandwidth. I never see any reference to the MSN bot.

    However in my Cpanel stats Yahoo doesn't show up at all but MSN does.

    The Google bot has been showing up twice a day for the past two months.
     
    Cyclops, Mar 1, 2005 IP
  12. Web Gazelle

    Web Gazelle Well-Known Member

    Messages:
    3,590
    Likes Received:
    259
    Best Answers:
    0
    Trophy Points:
    155
    #12
    Yahoo and MSN are different bots.
     
    Web Gazelle, Mar 1, 2005 IP