Block all the bad bots from your site!!!

Discussion in 'Apache' started by ketan9, Oct 31, 2007.

  1. #1
    If you would like to block all the unwanted user agents that scrap your site or the bots that you don't want to access your site then in your http.conf file add the following statements. Below I have added a big list of user agents that I deny access, you may want to edit the list depending on your requirements.
    Hope this would help you
     
    ketan9, Oct 31, 2007 IP
  2. KalvinB

    KalvinB Peon

    Messages:
    2,787
    Likes Received:
    78
    Best Answers:
    0
    Trophy Points:
    0
    #2
    If I'm going to use a bot to scrape a site, I set the user agent to a valid IE user agent. These types of lists don't stop much of anything and just add extra processing time on your web-server.

    If someone is scraping your site you're better off blocking their IP at the network level. I used to do that with my Windows 2000 server. I just routed IPs to never never land using Windows built in functions for that sort of thing. It's a lot more efficient than making apache do it. Linux can also block IPs at the network level.

    You could even have your site keep track of what IPs are downloading what and auto block their IP if you felt so inclined.
     
    KalvinB, Oct 31, 2007 IP
    bogart likes this.
  3. Pixelrage

    Pixelrage Peon

    Messages:
    5,083
    Likes Received:
    128
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I think this can be done in robots.txt as well, I saw a robots generator that did something to this extent.
     
    Pixelrage, Oct 31, 2007 IP
    bogart likes this.
  4. ketan9

    ketan9 Active Member

    Messages:
    548
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    58
    #4
    I agree with you. Although finding the ip and blocking them manually takes time and big effort. I am looking for a way to do it automatically meaning, if someone consumes too much of bandwidth, stop him from using the site altogether and couldn't find a better way to do it. Let me know if you have a better approach.
     
    ketan9, Oct 31, 2007 IP
  5. KalvinB

    KalvinB Peon

    Messages:
    2,787
    Likes Received:
    78
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Apache 2 I believe has bandwidth throttling.

    I also design my sites so everything (except images and js) goes through index.php, even downloads. So if I feel a need to I can log per IP usage and issue the Windows command to reroute IPs automatically if an IP uses more bandwidth per day than allowed.
     
    KalvinB, Nov 1, 2007 IP