OK, now here's what I don't get... Shouldn't it be much easier to shut these guys down? For instance, no one really cares about the MFA sites that are on page one thousand of the SERPS - they care about these types of sites that get 60 of the top 100 places (and have millions and millions and millions of content-less pages). So, wouldn't it be easy for Google to make three lists: 1. A list of all their AdSense publishers. This is the list the others work from. 2. Taking the first list, find the top 1,000 sites (picking an arbitrary number here) by number of pages indexed. 3. Make another list of the top 1,000 sites that have the highest results in their placement algorithms (PR, links, etc...), basically a list of sites that their system marks as the most respected/best for their search results (ignoring specific search terms - just an aggregate of all search queries - the sites with the most Google juice across the board). Then, every day, Google assigns 1 person to visit every site on each list. It only takes a few seconds to see if a site is just spam - and if it is, they just remove it from the system. Heck, hire 10 people to do this and expand each lists to the top 10,000. That wouldn't solve the MFA problem, but it would make their top search results about a thousand times better. Cheers, Bob
PS. And, I always thought this lapse in Google's logic was brilliant: Let's assume there are only two types of sites: black-hat (MFAs, etc...) and white-hat (everyone else). Google wants to get rid of the black-hats and keep the white-hats. So, what can we assume: The black-hats will ALL be designed perfectly - they'll have great SEO, tons of inbound links, etc... All of them! Because, every black-hat will be run by an expert SEO. The white-hats will be all over the board. Some will have terrible SEO, some won't. Some will be optimized, some won't. So, basically, you have the bad guys doing only A - and you have the good guys doing A and B. So, what does Google do - they punish all the sites that do B!
Not all spam relies on Adsense though. Good idea. Or, they could just do a manual check on the top 5000 Alexa results - I think there would be almost the same sites in both lists.
Yeah, I just said AdSense, cause Google wouldn't have access to anybody else's publishers. And, isn't that what Google would want? The quality of their results to go up - while everyone else's stay the same? ;o) And, the top 5000 from Alexa would work perfectly too. So, if in just a matter of minutes, we can figure out two extremely cheap and easy ways to get rid of the majority of the worst offenders - is Google really doing anything about the problem?...
Here comes another spammer... http://www.google.com/search?num=100&&q=site:sapo.pt 4,530,000 results!!! As soon as they hit a billion, it might be time to rince and repeat, over at Digg!! They got PR. Looks like they've been doing this for awhile!!!
Is my motto this week to change my sites from: domain.com/page.html to page.domain.com? Is that the lesson I should take away from life this week?
That's the first lesson! Once completed, you can move onto lesson #2: 1.page.domain.com 2.page.domain.com 3.page.domain.com ... 1000000000.page.domain.com 1.page2.domain.com 2.page2.domain.com 3.page2.domain.com ...
Personally, I wouldn't. Whatever advantage to indexing speed there was will probably be removed soon. I don't think there was much advantage in the ranking of the sub-subdomains.
Yeah. I agree its a loophole waiting to be closed. I'd never make that change for various reasons. Not the least of which is, I'm lazy and its too much work for little reward
I don't think it's as simple as that, a lot of spammers fly under the radar by having networks of sites that on their own aren't large enough to attract as much attention as the billion-page guy, but nevertheless go into thousands of pages each. I think the thing that will stop spam dead in its tracks is when the average price of a domain exceeds the money that can be made from one MFA site. Now, domains are dead cheap. But for a MFA to break even, it needs thousands of pages, each targetting a different set of keywords. So Google needs to pay extra attention to sites with thousands of pages. What I'm thinking is that a special algorithm kicks in at certain page thresholds, to detect spamminess. The sort of algorithm that is probably too processor-intensive to use on all the index. Like this: 1. An advanced dupe-content detector that parses out the most frequent keywords and examines what's left for similarity to other pages on the rest of the site that have also been stripped of their most frequent keywords. 2. A gobbledegook detector to search for poor grammar, and flag up pages with a high percentage of ungrammatical sentences. 3. Most sentences include at least some stop-words. An unusual lack of these suggests spam. 4. Something to look for misspellings as a percentage of text. A lot of spammy sites have long lists of these, so anything over 20% should trigger a filter. On their own, none of these things is necessarily spam, but it could be used to find sites which it would be a good idea to throttle further indexing, and flag up for manual scrutiny.
You know what would be simpler? When a site hits 1 million + indexed, just do a freaking manual check
Whoa hold on there.. Sapo.pt looks legit. Loosely translated: From 1994 to 2005, something something. Here is the history of the number 1 Portuguese portal. They've been in the top 1000 for a long long long time (+5 years), even as high as 200: http://www.alexaholic.com/sapo.pt
Sapo is a free hosting service similar to Hostrocket, where each site they offer is a subdomain off of the main one. Nintendo grabbed that from a thread in another forum here (at least I think he probably did). The page count is probably different than mine, it's near the end, post #55. The actual subdomain I was referring to is this one. Apparently Sapo gives you unlimited sub-sub domains as well. There are others, that one is just bothering me because it is a) Cluttering up pages I'm competing with b) one that I myself reported to G for spamming a while back, and c) at the time I reported it was taking up 17% of the listed serps. Since G seems to be relying on posts to find the spammers, I figured mentioning them might actually get something done. -Michael
Heres another bluddy spammer, site:freett.com . Check out there alexa ranking its insane. http://www.alexa.com/data/details/traffic_details?q=freett.com&url=freett.com
http://www.digg.com/technology/Results_of_a_MASSIVE_Google_Ban_-_7_BILLION_Pages IF this is da real Matt!!!! I thought he was on vacation!! What's he doing in my Digg article!!!
Uhm, pardon the question but would this in any way be related to domain kiting? See www.bobparsons.com
http://www.bobparsons.com/DomainKiting.html These domains were registered in May. I think that's way too long ago to still work and not of paid for them. http://www.bobparsons.com/MayKiting.html The SPAM domains have been around way longer than five days!!!