Unauthorized bots?

Discussion in 'robots.txt' started by girbaud, Jan 17, 2006.

  1. #1
    Do you know some unauthorized bots?

    try to look at this one Robots blog
     
    girbaud, Jan 17, 2006 IP
  2. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #2
    There are loads of them floating around. What is your question exactly?

    I can tell you that Brett Tabke's solution (ban them all) is a bad idea, as he discovered.

    I can also tell you that trying to build a huge robots.txt file in an effort to ban all bad bots is also a major waste of time. The bad bots don't even read robots.txt, unless they're trying to get a list of directories and files you'd rather not have anyone see so they can zip and and have a really close look at them.
     
    minstrel, Jan 17, 2006 IP
    GRIM likes this.
  3. girbaud

    girbaud Peon

    Messages:
    293
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #3
    can you give me an example of bad bots?
     
    girbaud, Jan 18, 2006 IP
  4. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,740
    Likes Received:
    1,702
    Best Answers:
    78
    Trophy Points:
    510
    #4
    Those that harvesy e-mail addresses are among the collection of bad bots.

    Shannon
     
    Smyrl, Jan 18, 2006 IP
  5. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #5
    See bad bots and rogue bots.

    For example:

     
    minstrel, Jan 18, 2006 IP
  6. wrmineo

    wrmineo Peon

    Messages:
    3,087
    Likes Received:
    379
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Awesome resources and information there Minstrel - thanks!

    Ironic reference source there girbaud - here's a specific exclusion I run in many of my sites:

    User-agent: WebmasterWorldForumBot
    Disallow: /
     
    wrmineo, Jan 18, 2006 IP
  7. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #7
    That's hilarious!! :D
     
    minstrel, Jan 18, 2006 IP
  8. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #8
    Yes, but keep it in perspective: Most people will never see most of those.
     
    minstrel, Jan 18, 2006 IP
  9. girbaud

    girbaud Peon

    Messages:
    293
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Thanks a lot for those nice responses.
    minstrel always gives good resources :D

    Thanks a lot people!
     
    girbaud, Jan 19, 2006 IP
  10. wrmineo

    wrmineo Peon

    Messages:
    3,087
    Likes Received:
    379
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Yes, Minstrel always has good answers and resources; he's well organized obviously and that's the idiosynchratic trait that sets him above others.

    Now ... all ass-kissing and joking aside, I do use some very specific exlusions of what I consider bad bots.

    Here's a lengthy example.
     
    wrmineo, Jan 19, 2006 IP
  11. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #11
    These will block some good bots too, including Googlebot/2.1 :

    This blocks a popular dead link checker which I and many other people use to update websites and delete dead links. It generates reports when it's done, and generally, unless it's a very special site, if an error message is generated it is simpler for me to delete the link than to investigate why it's generating an error. This, if I had a link to you on my site, after I ran Xenu link checker, you'd lose a backlink - probably not what you want.

    Most of the rest are bots that are almost certainly going to ignore the robots.txt file anyway. For that reason, I always advise AGAINST this type of robots.txt file -- you are not going to stop the bad bots and you may well inadvertently block some good ones.
     
    minstrel, Jan 19, 2006 IP
  12. wrmineo

    wrmineo Peon

    Messages:
    3,087
    Likes Received:
    379
    Best Answers:
    0
    Trophy Points:
    0
    #12
    More phenomenal information Minstrel thanks!

    This is a lengthy example just for smart minds like yours to pick apart and advise on :)

    This robot file is actually from a client site that had some specific concerns and issues so we "appeased" them with an eye candy robots file to subdue some fears.

    Luckily it has stopped it from getting good results thus far.

    Googlebot 138+37 1.01 MB 19 Jan 2006 - 01:31 
    MSNBot 125+41 1.69 MB 19 Jan 2006 - 02:48 
    Inktomi Slurp 79+72 594.98 KB 19 Jan 2006 - 01:52 
    Unknown robot (identified by 'crawl') 32+40 402.34 KB 17 Jan 2006 - 12:58 
    Unknown robot (identified by hit on 'robots.txt') 0+53 244.14 KB 19 Jan 2006 - 01:28 
    Unknown robot (identified by 'spider') 10+5 138.46 KB 19 Jan 2006 - 01:05 
    Harvest 9+1 102.24 KB 07 Jan 2006 - 20:07 
    AskJeeves 2+2 34.97 KB 14 Jan 2006 - 05:30 
    Alexa (IA Archiver) 2+2 34.97 KB 16 Jan 2006 - 23:00 
    Walhello appie 1+1 14.68 KB 18 Jan 2006 - 15:22 
    Unknown robot (identified by 'robot') 1+1 4.61 KB 19 Jan 2006 - 02:41 
    
    Code (markup):
    The real concern was spam and DOS attacks as it's a political campaign website, but I will take a further look at the information you provided and reinitiate the conversation with the candidate.

    Are they missing potential traffic would be the primary concern and point that might make them reconsider.

    Thanks.
     
    wrmineo, Jan 19, 2006 IP
  13. blue_angel

    blue_angel Well-Known Member

    Messages:
    1,174
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    130
    #13
    Look for Unknown robot in a search engine there are many many articles relative with that
     
    blue_angel, Jun 8, 2009 IP
  14. linkdealer

    linkdealer Active Member

    Messages:
    138
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    90
    #14
    You can check few at seocrazy.blogspot.com/2008/04/spammy-robots-list.html
     
    linkdealer, Jun 8, 2009 IP