my site is not even officially open for business... however, within 1-2 days of getting the base platform setup; the site is getting flooded with spiders from all over the world; I am not even promoting the site yet... Russia China vietnam Switzerland Singapore Tawian India Ireland and other countries I can not pronounce... I am not sure why any of these spiders would be of any value.... WTF is going on, any ideas?
Google "referral spam". It's a stupid practice that's not even working anymore, yet many still do it to create fake backlinks in your server log to boost their sites' SEO. Personally, I believe Google algorithm has long caught up with these practices and is not counting those backlinks as legit. By the way you can block them using something like (put it at the very top of your site): <?php // --------------------------------------------------------------------------------------------------------------- // Banned IP Addresses and Bots - Redirects banned visitors who make it past the .htaccess and or robots.txt files to an URL. // The $banned_ip_addresses array can contain both full and partial IP addresses, i.e. Full = 123.456.789.101, Partial = 123.456.789. or 123.456. or 123. // Use partial IP addresses to include all IP addresses that begin with a partial IP addresses. The partial IP addresses must end with a period. // The $banned_bots, $banned_unknown_bots, and $good_bots arrays should contain keyword strings found within the User Agent string. // The $banned_unknown_bots array is used to identify unknown robots (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-). // The $good_bots array contains keyword strings used as exemptions when checking for $banned_unknown_bots. If you do not want to utilize the $good_bots array such as // $good_bots = array(), then you must remove the the keywords strings 'bot.','bot/','bot-' from the $banned_unknown_bots array or else the good bots will also be banned. $banned_ip_addresses = array(''); $banned_bots = array('.ru','AhrefsBot','crawl','crawler','DotBot','linkdex','majestic','meanpath','PageAnalyzer','robot','rogerbot','semalt','SeznamBot','spider'); $banned_unknown_bots = array('bot ','bot_','bot+','bot:','bot,','bot;','bot\\','bot.','bot/','bot-'); $good_bots = array('Google','Googlebot','MSN','bing','bingbot','Slurp','Yahoo','DuckDuck'); $banned_redirect_url = 'https://somesite.com'; // Visitor's IP address and Browser (User Agent) $ip_address = $_SERVER['REMOTE_ADDR']; $browser = $_SERVER['HTTP_USER_AGENT']; // Declared Temporary Variables $ipfound = $piece = $botfound = $gbotfound = $ubotfound = ''; // Checks for Banned IP Addresses and Bots if($banned_redirect_url != ''){ // Checks for Banned IP Address if(!empty($banned_ip_addresses)){ if(in_array($ip_address, $banned_ip_addresses)){$ipfound = 'found';} if($ipfound != 'found'){ $ip_pieces = explode('.', $ip_address); foreach ($ip_pieces as $value){ $piece = $piece.$value.'.'; if(in_array($piece, $banned_ip_addresses)){$ipfound = 'found'; break;} } } if($ipfound == 'found'){header("location: $banned_redirect_url"); exit();} } // Checks for Banned Bots if(!empty($banned_bots)){ foreach ($banned_bots as $bbvalue){ $pos1 = stripos($browser, $bbvalue); if($pos1 !== false){$botfound = 'found'; break;} } if($botfound == 'found'){header("location: $banned_redirect_url"); exit();} } // Checks for Banned Unknown Bots if(!empty($good_bots)){ foreach ($good_bots as $gbvalue){ $pos2 = stripos($browser, $gbvalue); if($pos2 !== false){$gbotfound = 'found'; break;} } } if($gbotfound != 'found'){ if(!empty($banned_unknown_bots)){ foreach ($banned_unknown_bots as $bubvalue){ $pos3 = stripos($browser, $bubvalue); if($pos3 !== false){$ubotfound = 'found'; break;} } if($ubotfound == 'found'){header("location: $banned_redirect_url"); exit();} } } } // --------------------------------------------------------------------------------------------------------------- ?> Code (markup):
As qwikad.com correctly explained, most of this is related to referer links (spam). However, I would add some more points: This method still actually works, but not with Google, of course. Unfortunately, some search engines still count these links as normal links, which helps getting "refering" sites ranked slightly better. The reason this refer spam still works because there are thousands of improperly configured servers around the world that allow stats to be accessed by anyone. While no one in right mind would allow to view his stats by public, many website owners (especially on shared hosting) don't even realize that opening theirdomain.com/stats/year/month/ (or similar) URL will open a detailed traffic chart with clickable backlinks to websites that "send traffic" to theirdomain.com. So if you write a script that sends such fake referers to thousands of different domains, you instantly get thousands of backlinks. Sure, their value is very questionable, but for most crappy link is much better than no link. Another reason for such traffic is security exploits. For example, when I look for potential security exploits (because my job is security-related), I write scripts that find and identify sites powered by some platform (let's say Wordpress) and then scan them. However, I would never mark my security scanner as "some super cool security tool, I'm going to find exploits in your domain" or similar for very obvious reasons - any smart webmaster will ban my IPs immediately. Hence, I make scripts to randomly rotate user agents and mask them as spiders of small search engines. This way, I stay under the radar.
last night I installed a spam blocker, seems to be working...this morning I see only 3 spiders from other countries hit the site. I have also noticed something else, wordfence seems to log human visitors; I cant say if they are real or fake hits as they have come in every 1-3 hours, and can not tell where they are coming from as the visior hit is shown by google, and lists googles IP, so how can I tell where real visitors are coming from? highly suspicious going from spider spam to real visitors; If they are real then may I am onto something, however, I should not be getting visitors right now, site is not set to index, and I am not promoting the site in any manner, yet...
I get the same thing. Search engines I've never even heard of, plus lots of "guests" who are obviously bots. I don't know what software you're using, but there are lots of plugins for banning bots that aren't explicitly allowed so maybe you can find something like that that will work on your site.