1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Getting flooded with Search Engine Spiders...

Discussion in 'Search Engine Optimization' started by dscurlock, Oct 14, 2017.

  1. #1
    my site is not even officially open for business...
    however, within 1-2 days of getting the base platform
    setup; the site is getting flooded with spiders from all
    over the world; I am not even promoting the site yet...

    Russia
    China
    vietnam
    Switzerland
    Singapore
    Tawian
    India
    Ireland
    and other countries I can not pronounce...

    I am not sure why any of these spiders would be of
    any value....

    WTF is going on, any ideas?
     
    dscurlock, Oct 14, 2017 IP
  2. qwikad.com

    qwikad.com Illustrious Member Affiliate Manager

    Messages:
    7,151
    Likes Received:
    1,656
    Best Answers:
    29
    Trophy Points:
    475
    #2
    Google "referral spam". It's a stupid practice that's not even working anymore, yet many still do it to create fake backlinks in your server log to boost their sites' SEO. Personally, I believe Google algorithm has long caught up with these practices and is not counting those backlinks as legit.

    By the way you can block them using something like (put it at the very top of your site):

    
    <?php
    
    // ---------------------------------------------------------------------------------------------------------------
    
    
    
    // Banned IP Addresses and Bots - Redirects banned visitors who make it past the .htaccess and or robots.txt files to an URL.
    
    // The $banned_ip_addresses array can contain both full and partial IP addresses, i.e. Full = 123.456.789.101, Partial = 123.456.789. or 123.456. or 123.
    
    // Use partial IP addresses to include all IP addresses that begin with a partial IP addresses. The partial IP addresses must end with a period.
    
    // The $banned_bots, $banned_unknown_bots, and $good_bots arrays should contain keyword strings found within the User Agent string.
    
    // The $banned_unknown_bots array is used to identify unknown robots (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-).
    
    // The $good_bots array contains keyword strings used as exemptions when checking for $banned_unknown_bots. If you do not want to utilize the $good_bots array such as
    
    // $good_bots = array(), then you must remove the the keywords strings 'bot.','bot/','bot-' from the $banned_unknown_bots array or else the good bots will also be banned.
    
       $banned_ip_addresses = array('');
    
       $banned_bots = array('.ru','AhrefsBot','crawl','crawler','DotBot','linkdex','majestic','meanpath','PageAnalyzer','robot','rogerbot','semalt','SeznamBot','spider');
    
       $banned_unknown_bots = array('bot ','bot_','bot+','bot:','bot,','bot;','bot\\','bot.','bot/','bot-');
    
       $good_bots = array('Google','Googlebot','MSN','bing','bingbot','Slurp','Yahoo','DuckDuck');
    
       $banned_redirect_url = 'https://somesite.com';
    
    
    
    // Visitor's IP address and Browser (User Agent)
    
       $ip_address = $_SERVER['REMOTE_ADDR'];
    
       $browser = $_SERVER['HTTP_USER_AGENT'];
    
    
    
    // Declared Temporary Variables
    
       $ipfound = $piece = $botfound = $gbotfound = $ubotfound = '';
    
    
    
    // Checks for Banned IP Addresses and Bots
    
       if($banned_redirect_url != ''){
    
         // Checks for Banned IP Address
    
            if(!empty($banned_ip_addresses)){
    
              if(in_array($ip_address, $banned_ip_addresses)){$ipfound = 'found';}
    
              if($ipfound != 'found'){
    
                $ip_pieces = explode('.', $ip_address);
    
                foreach ($ip_pieces as $value){
    
                  $piece = $piece.$value.'.';
    
                  if(in_array($piece, $banned_ip_addresses)){$ipfound = 'found'; break;}
    
                }
    
              }
    
              if($ipfound == 'found'){header("location: $banned_redirect_url"); exit();}
    
            }
    
    
    
         // Checks for Banned Bots
    
            if(!empty($banned_bots)){
    
              foreach ($banned_bots as $bbvalue){
    
                $pos1 = stripos($browser, $bbvalue);
    
                if($pos1 !== false){$botfound = 'found'; break;}
    
              }
    
              if($botfound == 'found'){header("location: $banned_redirect_url"); exit();}
    
            }
    
    
    
         // Checks for Banned Unknown Bots
    
            if(!empty($good_bots)){
    
              foreach ($good_bots as $gbvalue){
    
                $pos2 = stripos($browser, $gbvalue);
    
                if($pos2 !== false){$gbotfound = 'found'; break;}
    
              }
    
            }
    
            if($gbotfound != 'found'){
    
              if(!empty($banned_unknown_bots)){
    
                foreach ($banned_unknown_bots as $bubvalue){
    
                  $pos3 = stripos($browser, $bubvalue);
    
                  if($pos3 !== false){$ubotfound = 'found'; break;}
    
                }
    
                if($ubotfound == 'found'){header("location: $banned_redirect_url"); exit();}
    
              }
    
            }
    
       }
    
    
    
    // ---------------------------------------------------------------------------------------------------------------
    
    ?>
    
    Code (markup):
     
    qwikad.com, Oct 15, 2017 IP
    dscurlock likes this.
  3. phpmillion

    phpmillion Member

    Messages:
    145
    Likes Received:
    11
    Best Answers:
    4
    Trophy Points:
    45
    #3
    As qwikad.com correctly explained, most of this is related to referer links (spam). However, I would add some more points:

    This method still actually works, but not with Google, of course. Unfortunately, some search engines still count these links as normal links, which helps getting "refering" sites ranked slightly better. The reason this refer spam still works because there are thousands of improperly configured servers around the world that allow stats to be accessed by anyone. While no one in right mind would allow to view his stats by public, many website owners (especially on shared hosting) don't even realize that opening theirdomain.com/stats/year/month/ (or similar) URL will open a detailed traffic chart with clickable backlinks to websites that "send traffic" to theirdomain.com. So if you write a script that sends such fake referers to thousands of different domains, you instantly get thousands of backlinks. Sure, their value is very questionable, but for most crappy link is much better than no link.

    Another reason for such traffic is security exploits. For example, when I look for potential security exploits (because my job is security-related), I write scripts that find and identify sites powered by some platform (let's say Wordpress) and then scan them. However, I would never mark my security scanner as "some super cool security tool, I'm going to find exploits in your domain" or similar for very obvious reasons - any smart webmaster will ban my IPs immediately. Hence, I make scripts to randomly rotate user agents and mask them as spiders of small search engines. This way, I stay under the radar.
     
    phpmillion, Oct 16, 2017 IP
    qwikad.com likes this.
  4. dscurlock

    dscurlock Prominent Member

    Messages:
    4,564
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    300
    #4
    last night I installed a spam blocker, seems to be working...this morning I see only
    3 spiders from other countries hit the site. I have also noticed something else, wordfence
    seems to log human visitors; I cant say if they are real or fake hits as they have
    come in every 1-3 hours, and can not tell where they are coming from as the visior
    hit is shown by google, and lists googles IP, so how can I tell where real visitors are
    coming from? highly suspicious going from spider spam to real visitors; If they are
    real then may I am onto something, however, I should not be getting visitors right
    now, site is not set to index, and I am not promoting the site in any manner, yet...
     
    dscurlock, Oct 16, 2017 IP
  5. jr777

    jr777 Member

    Messages:
    95
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    25
    #5
    I get the same thing. Search engines I've never even heard of, plus lots of "guests" who are obviously bots. I don't know what software you're using, but there are lots of plugins for banning bots that aren't explicitly allowed so maybe you can find something like that that will work on your site.
     
    jr777, Oct 29, 2017 IP