1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Do You Block Spiders?

Discussion in 'All Other Search Engines' started by dscurlock, Apr 2, 2016.

  1. #1
    Do you block certain spiders?
    If so, why?

    I have been getting hits like:

    Hostname: spider-199-21-99-199.yandex.com

    Hostname: ec2-52-4-176-40.compute-1.amazonaws.com

    Hostname: static.227.10.9.176.clients.your-server.de

    Montego Bay, Jamaica (multiple hits)
    Hostname: 207.204.122.220

    (France)
    Hostname: 195-154-240-246.rev.poneytelecom.eu

    (more foreign, then US, and I have no
    real need for foreign, as my target would be US....)
     
    dscurlock, Apr 2, 2016 IP
  2. billzo

    billzo Well-Known Member

    Messages:
    961
    Likes Received:
    278
    Best Answers:
    15
    Trophy Points:
    113
    #2
    I block bots that are not recognized search engines because I neither need nor want them consuming server resources. Why should any webmaster pay to have bots crawl their site when they get nothing out of it? No revenue, no search engine traffic, nothing.
     
    billzo, Apr 2, 2016 IP
  3. dscurlock

    dscurlock Prominent Member

    Messages:
    4,564
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    300
    #3
    I can understand that, however, I know common bots like google, msn, yahoo...
    what I do not know is what the other bots do such as majestic12.co.uk (germany)
    which was the last bot to hit the site. I guess a need to find a list of bots to ban...
    otherwise, I could end up banning a bot that could end up being useful, or maybe not...
    I just dont know...

    I also noticed I get some traffic from flipboard, never heard of it until
    I see a bot that had flipboard in the title, apparently it is similar to pintrest,
    they spider your site for content snippets, and I am not sure if this is good
    or bad since this is how content can spread, it sends a few visitors...
     
    Last edited: Apr 2, 2016
    dscurlock, Apr 2, 2016 IP
  4. Bitpalace

    Bitpalace Greenhorn

    Messages:
    53
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    13
    #4
    We are blocking spiders on some of our own websites, but not on customer werbsites.

    Inserting this code into .htaccess or better directly into the VirtualHost-configuration of your website will block all useless bots and spiders that are only consuming your traffic and slowing down your website:

    RewriteBase /

    RewriteCond %{HTTP_USER_AGENT} (AhrefsBot|spbot|DigExt|Sogou) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (MegaIndex.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (MJ12|MJ12bot|MJ12Bot|Ezooms|CCBot|TalkTalk|Ahrefs) [NC]
    RewriteRule .* - [F]
     
    Bitpalace, Apr 3, 2016 IP
  5. dscurlock

    dscurlock Prominent Member

    Messages:
    4,564
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    300
    #5
    certainly there are more, right?
    unless these are the biggest bandwidth killers....
    I noticed several MJ spiders on earlier...

    google can have sex with my site all they want...
    other then yahoo, msn, etc. I do not think there are
    many more that are actually very useful to be honest....
     
    dscurlock, Apr 3, 2016 IP
  6. Ditmar

    Ditmar Well-Known Member

    Messages:
    162
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    105
    #6
    This is a Russian Search Engine Yandex
     
    Ditmar, Apr 3, 2016 IP
  7. dscurlock

    dscurlock Prominent Member

    Messages:
    4,564
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    300
    #7
    I seen 5+ on earlier, I guess I should ban them...
    I do not see how Russian traffic would apply to a US market.
    that would be similar to me buying a something in Russia
    on a search rather then buying it on a site in the US.
     
    dscurlock, Apr 3, 2016 IP
  8. sarahk

    sarahk iTamer Staff

    Messages:
    28,500
    Likes Received:
    4,460
    Best Answers:
    123
    Trophy Points:
    665
    #8
    Many years ago I had a site that tracked spiders and I'd research them and publish my findings. It quickly got out of hand and the project was abandoned.

    What I discovered (back then)
    • Large companies had their own internal search engines
    • Universities had experimental search engines
    • Google had bots crawling from a seemingly unlimited number of IPs
    • There were lots of countries who had their own search engines in their own language
    • Not all bots requested or complied with robots.txt
    • Not all bots had an info url in their useragent
    • None of the bots flooded my shared servers
    • I could waste a lifetime trying to keep track of all the bots and I'd be no further ahead
    If you've got so little bandwidth allocated to you that bots stealing your bandwidth becomes a problem then it's time to upgrade your hosting. On a cost benefit analysis the cost of the hosting will be far less than the value of your time wasted blocking bots. If you can't afford the hosting then you need to review your business plan and see if the business is worth continuing with in the first place.
     
    sarahk, Apr 3, 2016 IP
    dcristo likes this.
  9. dscurlock

    dscurlock Prominent Member

    Messages:
    4,564
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    300
    #9
    how much bw that they are stealing was never my concern...
    but it does not mean I like wasting resources over worthless bots that provide zero value.
    not exactly sure what my bw is at the moment, but that never crossed my mind.

    I would imagine if you just simply neglect them, and they take more and more, then
    as the site grows, so does the bots, unless you simply just like giving away free resources...

    Why would anyone want to appear in foreign search engines if that is not their target? regardless how good the server is or not, i do not need anymore traffic
    on the server then need be at this time, on a test, i know my site can take
    more then 125 concurrent visitors at any given time, however, if 100 of those
    visitors are from russia because of yandex, how does that benefit me? I need
    us traffic on my site, not traffic as a result of useless foreign search engines....

     
    Last edited: Apr 3, 2016
    dscurlock, Apr 3, 2016 IP
  10. sarahk

    sarahk iTamer Staff

    Messages:
    28,500
    Likes Received:
    4,460
    Best Answers:
    123
    Trophy Points:
    665
    #10
    And that's the rub. I've been down at my local supermarket as they've hauled 5 bottles of wine out of a woman's backpack as she was trying to leave. They let her go, didn't call the cops etc because the value of the time taken just to handle things right there and then exceeded the benefit - add in the time dealing with lawyers, turning up in court etc and prosecution becomes really expensive.

    Think of the junk bots as shoplifters. You know it goes on, you don't like it but the cost in time of stopping the problem exceeds any benefit.
     
    sarahk, Apr 3, 2016 IP
  11. billzo

    billzo Well-Known Member

    Messages:
    961
    Likes Received:
    278
    Best Answers:
    15
    Trophy Points:
    113
    #11
    When anyone is on a limited resource like a VPS or even a dedicated server and they are nearing max capacity, one of the first places to look for reductions is by blocking those useless bots.
     
    billzo, Apr 3, 2016 IP
  12. dscurlock

    dscurlock Prominent Member

    Messages:
    4,564
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    300
    #12
    I am on shared hosting, and to be honest, I was not concerned about bw until @sarahk brought it up...
    now I guess I got to go look, damn. I have other reasons to block evil bots then just bw. Shared
    hosting can take more then you think; I think last year or the year before, I had topped out with
    about 10 sites before I started having issues with the host, then common sense told me that you
    just can not effectively manage that many sites, and looking after one is far more effective. I have been
    on a VPS also, even a VPS can take a good beating, if you are on a vps, and you are near shutdown,
    then something really must be wrong, because I know even a vps can take quite a bit....

    besides, I am watching bots live, so when I am not doing anything, such as
    watching tv, then it is no more work to ban them as they visit the site....

    Someone just a whois query on my site, I am ok with that, no problem...
    You see, I am not a complete bastard you know....
     
    dscurlock, Apr 3, 2016 IP
    sarahk likes this.
  13. Ditmar

    Ditmar Well-Known Member

    Messages:
    162
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    105
    #13
    Yandex.com is made for US Traffic, not Russian, but it not popular in US.
     
    Ditmar, Apr 4, 2016 IP
  14. dscurlock

    dscurlock Prominent Member

    Messages:
    4,564
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    300
    #14
    popular or not, if it is made for us traffic, then no real need to block it unless I find it abusing my site.
     
    dscurlock, Apr 4, 2016 IP
  15. Kaas

    Kaas Member

    Messages:
    37
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #15
    Duckduckgo at some point was in partnership with Yandex. Have you heard about this engine?
     
    Kaas, May 7, 2016 IP