Dealing with bad bots...

Discussion in 'Site & Server Administration' started by SERPalert, Oct 4, 2005.

  1. #1
    I've built some of my own stats software.

    Now my site is quiet niche, no visitor would ever ever ever come to my site every day for a whole month.

    I'm seeing some odd behavior.

    Take last month as a example.

    65.36.241.75 - 30 days, 707 page hits total. Only ever hit my index page.
    wfp2.almaden.ibm.com - 30 days, 30 hits total. Only ever hit my index page.
    66.155.231.209 - 22 days, 44 hits total. Only ever hit my index page.
    66-194-6-84.gen.twtelecom.net - 16 days, 17 hits total. Only ever hit my index page.

    All the above ips are in the US, I'm in the UK.

    Who are they, why are they coming to my site? Why should I let them continue if they don't serve a purpose, I assume they're wasting bandwidth.

    Any advice or suggestions?
     
    SERPalert, Oct 4, 2005 IP
  2. mcfox

    mcfox Wind Maker

    Messages:
    7,526
    Likes Received:
    716
    Best Answers:
    0
    Trophy Points:
    360
    #2
    I can't say for sure but they look like spoofed IP's to me, for example, the subdomain; wfp2.almaden.ibm.com.

    I don't know the purpose. Perhaps someone else can shed some light?
     
    mcfox, Oct 4, 2005 IP
  3. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #3
    The alamden one is a legit bot that indexes information for businesses and intranets.

    The others are ISPs, I think - not bots but visitors? Several different visitors?

    65.36.241.75
    66.155.231.209
    66.194.6.84
     
    minstrel, Oct 4, 2005 IP
    SERPalert likes this.
  4. SERPalert

    SERPalert Guest

    Messages:
    1,003
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Seems odd they'd only visit my index page though, and not navigate anywhere else.
     
    SERPalert, Oct 4, 2005 IP
  5. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #5
    Maybe not what they were looking for?

    My girlfriend uses AOL. When she starts it up, it loads a portal page that features certain hot news items and sites. It's possible (guessing here because I don't use those ISPs if that's what they are) that susbscribers are clicking on a link, getting to the home page, and deciding it's not what they're interested in pursuing?
     
    minstrel, Oct 4, 2005 IP
  6. SERPalert

    SERPalert Guest

    Messages:
    1,003
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Idea! I'll get the useragent. Never thought of that, duh
     
    SERPalert, Oct 4, 2005 IP
  7. SERPalert

    SERPalert Guest

    Messages:
    1,003
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Ok this is perhaps shedding more light (?)

    65.36.241.75 - - [04/Oct/2005:00:51:05 +0100] "HEAD / HTTP/1.1" 200 0 "-" "InternetSeer.com"

    66.155.231.209 - - [04/Oct/2005:04:54:01 +0100] "GET /robots.txt HTTP/1.1" 302 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; MSIECrawler)"
    66.155.231.209 - - [04/Oct/2005:04:54:02 +0100] "GET / HTTP/1.1" 200 21047 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; MSIECrawler)"

    Cant find the last one in my logs...

    Internetseer? MSIEbot?

    Smells dodgy....

    <edit>
    Wow msiebot is when someone adds to their favourites, awsome!
    http://www.webmasterworld.com/forum11/2360.htm
     
    SERPalert, Oct 4, 2005 IP
  8. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #8
    There you go! :D
     
    minstrel, Oct 4, 2005 IP
  9. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #9
    This is a website uptime / downtime checker - you must have subscribed to the free service at one time.
     
    minstrel, Oct 4, 2005 IP
  10. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
  11. cgo85

    cgo85 Peon

    Messages:
    380
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #11
    will this hurt me getting indexed if I use this in my robots.txt:

    User-agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
    Disallow: /

    Because that bot is killing my bandwith.
     
    cgo85, Nov 1, 2005 IP
  12. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #12
    What bot is that?
     
    minstrel, Nov 1, 2005 IP
  13. cgo85

    cgo85 Peon

    Messages:
    380
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #13
    I don't know... all I know is that this agent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" has been killing my sites bandwith the last couple of days. Is there anyway to block it without blocking visitors or msn,G, or Y! bots? Fairly new to robots.txt related issues so bare with me.
     
    cgo85, Nov 1, 2005 IP
  14. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #14
    But how do you know it's a bot? or any single "visitor"?

    It may be several human visitors using a version of MSIE, no?
     
    minstrel, Nov 1, 2005 IP
  15. cgo85

    cgo85 Peon

    Messages:
    380
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #15
    cause it's under one host: Host: 68.58.242.24

    Should I just block that I.P? If so, how would I do so?
     
    cgo85, Nov 1, 2005 IP
  16. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #16
    http://www.whois.sc/68.58.242.24

    It's an ISP, not a bot.

    What kind of numbers are you getting from there and what files are being requested?

    If the number of hits is extraordinary, you might want to contact the ISP and ask them what's going on with one or more of their subscribers.
     
    minstrel, Nov 1, 2005 IP
  17. cgo85

    cgo85 Peon

    Messages:
    380
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #17
    they've requested thousands of basic content website pages today... Should I block them? Is that robots.txt thing I did above valid or will it hurt me? Thanks for your help.
     
    cgo85, Nov 1, 2005 IP
  18. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #18
    Are you on a Linux/Unix server running Apache? Can you create or edit an .htacess file?

    If so, add these lines to the top of the .htaccess file:

    <Limit GET POST>
    order allow,deny
    deny from 68.58.242.24
    allow from all
    </Limit>
    Code (markup):
    But I'd still suggest you contact Comcast and alert them that something is up and going through their IP address.
     
    minstrel, Nov 1, 2005 IP
  19. frankm

    frankm Active Member

    Messages:
    915
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    83
    #19
    If they are killing your bandwidth they will probably not obay robots.txt. just reject the IP address, that's what I do :) no single user will get 1000s of pages from my site in a couple of hours, and if it cannot identify itself as a nice robot (Googlebot et al) I ignore that IP address
     
    frankm, Nov 1, 2005 IP
  20. cgo85

    cgo85 Peon

    Messages:
    380
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #20
    yeah... i have a htaccess in use right now cause I have a mod_rewrite going. How can I "reject"/block the IP address?



    This is what I've been dealing with:

    /directory-forclosure-AR.html
    Http Code: 200 Date: Nov 01 20:44:17 Http Version: HTTP/1.1 Size in Bytes: 23387
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /directory-forclosure-AK.html
    Http Code: 200 Date: Nov 01 20:44:19 Http Version: HTTP/1.1 Size in Bytes: 23282
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /directory-apartment-VA.html
    Http Code: 200 Date: Nov 01 20:44:20 Http Version: HTTP/1.1 Size in Bytes: 26761
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /directory-apartment-UT.html
    Http Code: 200 Date: Nov 01 20:44:21 Http Version: HTTP/1.1 Size in Bytes: 26649
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /directory-apartment-TX.html
    Http Code: 200 Date: Nov 01 20:44:22 Http Version: HTTP/1.1 Size in Bytes: 23073
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /directory-apartment-TN.html
    Http Code: 200 Date: Nov 01 20:44:25 Http Version: HTTP/1.1 Size in Bytes: 23179
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /city-carloan-IL-Waukegan.html
    Http Code: 200 Date: Nov 01 20:44:25 Http Version: HTTP/1.1 Size in Bytes: 29412
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /directory-apartment-SC.html
    Http Code: 200 Date: Nov 01 20:44:27 Http Version: HTTP/1.1 Size in Bytes: 23309
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

    |
    |
    |
    /directory-apartment-RI.html
    Http Code: 200 Date: Nov 01 20:44:29 Http Version: HTTP/1.1 Size in Bytes: 26859
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)



    But there is thousands of them just today!
     
    cgo85, Nov 1, 2005 IP