New SPAM sites...billions of results!!!!

mvandemar Notable Member

Messages:: 2,409

Likes Received:: 307

Best Answers:: 0

Trophy Points:: 230

#881

Nintendo said: ↑

Forget about Monday...it's already banned!! Maybe we should turn this thread in to also talking about the site:domain.com screw-up!!! Then they might fix that!!!!
Click to expand...

Damn, that was like, 13 hours. Way too long guys, gotta move faster than that. Time em Ninno, here's another one:
http://www.google.com/search?hl=en&q=site:48s2jg3mm.info
53,000 results at the time of this post.

-Michael

mvandemar, Jul 8, 2006 IP

IamNed Peon

Messages:: 2,707

Likes Received:: 276

Best Answers:: 0

Trophy Points:: 0

#882

There are too god d4mn many of them. You cant get rid of them all. There are somethign like 20^40 combinations of possible 20 character .info addresses so even if just a minute fraction of them are spammed it will take years to find them all and delete them.

IamNed, Jul 8, 2006 IP

Nintendo ♬ King of da Wackos ♬

Messages:: 12,890

Likes Received:: 1,064

Best Answers:: 0

Trophy Points:: 430

#883

mvandemar said: ↑

http://www.google.com/search?hl=en&q=site:48s2jg3mm.info
53,000 results at the time of this post.
Click to expand...

Do you still get 53,000, or get a measly five? I get five in Google.com, and then 53,000 in the datacenters. Hopefully they are already starting to ban them.

Nintendo, Jul 8, 2006 IP

Twan Well-Known Member

Messages:: 1,665

Likes Received:: 32

Best Answers:: 0

Trophy Points:: 155

#884

Nintendo: I have 53K results.

Twan, Jul 8, 2006 IP

Nintendo ♬ King of da Wackos ♬

Messages:: 12,890

Likes Received:: 1,064

Best Answers:: 0

Trophy Points:: 430

#885

grrr!! Hopefully

Results 1 - 5 of 5 English pages from 48s2jg3mm.info for . (0.27 seconds)

is what Google is working on making it!!!

Nintendo, Jul 8, 2006 IP

KLB Peon

Messages:: 1,167

Likes Received:: 68

Best Answers:: 0

Trophy Points:: 0

#886

Nintendo said: ↑

Forget about Monday...it's already banned!! Maybe we should turn this thread in to also talking about the site:domain.com screw-up!!! Then they might fix that!!!!
Click to expand...

If someone at Google who really does have the power to ban spam sites is reading and reacting to this thread, I think people should continue to focus on uncovering all of the sub-domain spammers out there. The more of the million page sub-domain spammers we can get purged from Google's index, the better it will be for all of us once the next round of PR updates take place. Afterall, if PR really is a relative scale based on the total number of pages/links on the Internet then the fewer sub-domain spam pages there remain the easier it will be to get higher PRs.

KLB, Jul 8, 2006 IP

Art Peon

Messages:: 711

Likes Received:: 17

Best Answers:: 0

Trophy Points:: 0

#887

KLB said: ↑

If someone at Google who really does have the power to ban spam sites is reading and reacting to this thread, I think people should continue to focus on uncovering all of the sub-domain spammers out there. The more of the million page sub-domain spammers we can get purged from Google's index, the better it will be for all of us once the next round of PR updates take place. Afterall, if PR really is a relative scale based on the total number of pages/links on the Internet then the fewer sub-domain spam pages there remain the easier it will be to get higher PRs.
Click to expand...

I think this is one of the biggest issues with PR.

PR is an outcome of good linking, not a factor in determining it!

There is almost no incentive to have a high PR factor (even moreso now with the co-op being devalued!). I can admit to looking forward to each PR update as it meant exponentially more times weight, but I've since pulled it (the co-op) from all my sites.

I still don't understand the rush for PR. Understandably, it's really useful as an indicator or a performance indicator for selling SEO services, link building teams, or those purchasing sites.

I end up with PR0 sites with 1000 pageviews per day earning many times more than sites with 30,000/day PR4+.

Art, Jul 8, 2006 IP

KLB Peon

Messages:: 1,167

Likes Received:: 68

Best Answers:: 0

Trophy Points:: 0

#888

Art said: ↑

I still don't understand the rush for PR. Understandably, it's really useful as an indicator or a performance indicator for selling SEO services, link building teams, or those purchasing sites.
Click to expand...

Really the problem with PR and Alexa ratings is that regardless of how relevant they really are, one must have good scores with both in order to be able to directly successfully sell one's ad space. I think many of us would prefer to ignore the two ratings, but are forced to pay attention to them simply because we need to sell our ad-space.

KLB, Jul 9, 2006 IP

Nintendo ♬ King of da Wackos ♬

Messages:: 12,890

Likes Received:: 1,064

Best Answers:: 0

Trophy Points:: 430

#889

Nintendo said: ↑

Do you still get 53,000, or get a measly five? I get five in Google.com, and then 53,000 in the datacenters. Hopefully they are already starting to ban them.
Click to expand...

It looks like Google was banning 48s2jg3mm.info when they started showing just five results...aka with in 40 minutes that the link was posted!!!

Nintendo, Jul 10, 2006 IP

Cyclops sensei

Messages:: 1,241

Likes Received:: 72

Best Answers:: 0

Trophy Points:: 0

#890

Didn't even mention Nintendo or link to this thread Digital Point does get a mention as breaking the story
It's worth a read just to get up to speed.

Pushing Bad Data
Google's Latest Black Eye
By Eric Lester (c) 2006-06-26 Google stopped counting, or at least publicly displaying, the number of pages it indexed in September of 05, after a school-yard "measuring contest" with rival Yahoo. That count topped out around 8 billion pages before it was removed from the homepage. News broke recently through various SEO forums that Google had suddenly, over the past few weeks, added another few billion pages to the index. This might sound like a reason for celebration, but this "accomplishment" would not reflect well on the search engine that achieved it.
What had people buzzing was the nature of the fresh, new few billion pages. They were blatant spam- containing Pay-Per-Click (PPC) ads, scraped content, and they were, in many cases, showing up well in the search results. They pushed out far older, more established sites in doing so. A Google representative responded via forums to the issue by calling it a "bad data push," something that met with various groans throughout the SEO community.

How did someone manage to dupe Google into indexing so many pages of sp@m in such a short period of time? I'll provide a high level overview of the process, but don't get too excited. Like a diagram of a nuclear explosive, it isn't going to teach you how to make the real thing, you're not going to be able to run off and do it yourself after reading this article. Yet it makes for an interesting tale, one that illustrates the ugly problems cropping up with ever increasing frequency in the world's most popular search engine.

A Dark and Stormy Night

Our story begins deep in the heart of Moldva, sandwiched scenically between Romania and the Ukraine. In between fending off local vampire attacks, an enterprising local had a brilliant idea and ran with it, presumably away from the vampires... His idea was to exploit how Google handled subdomains, and not just a little bit, but in a big way.

The heart of the issue is that currently, Google treats subdomains much the same way as it treats full domains- as unique entities. This means it will add the homepage of a subdomain to the index and return at some point later to do a "deep crawl." Deep crawls are simply the spider following links from the domain's homepage deeper into the site until it finds everything or gives up and comes back later for more.

Briefly, a subdomain is a "third-level domain." You've probably seen them before, they look something like this: subdomain.domain.com. Wikipedia, for instance, uses them for languages; the English version is "en.wikipedia.org", the Dutch version is "nl.wikipedia.org." Subdomains are one way to organize large sites, as opposed to multiple directories or even separate domain names altogether.

So, we have a kind of page Google will index virtually "no questÃons asked." It's a wonder no one exploited this situation sooner. Some commentators believe the reason for that may be this "quirk" was introduced after the recent "Big Daddy" update. Our Eastern European friend got together some servers, content scrapers, spambots, PPC accounts, and some all-important, very inspired scripts, and mixed them all together thusly...

Five Billion Served - And Counting...

First, our hero here crafted scripts for his servers that would, when GoogleBot dropped by, start generating an essentially endless number of subdomains, all with a single page containing keyword-rich scraped content, keyworded links, and PPC ads for those keywords. Spambots are sent out to put GoogleBot on the scent via referral and comment sp@m to tens of thousands of blogs around the world. The spambots provide the broad setup, and it doesn't take much to get the dominos to fall.

GoogleBot finds the spammed links and, as is its purpose in life, follows them into the network. Once GoogleBot is sent into the web, the scripts running the servers simply keep generating pages- page after page, all with a unique subdomain, all with keywords, scraped content, and PPC ads. These pages get indexed and suddenly you've got yourself a Google index 3-5 billion pages heavier in under 3 weeks.

Reports indicate, at first, the PPC ads on these pages were from Adsense, Google's own PPC service. The ultimate irony then is Google benefits financially from all the impressions being charged to Adsense users as they appear across these billions of sp@m pages. The Adsense revenues from this endeavor were the point, after all. Cram in so many pages that, by sheer force of numbers, people would find and clÃck on the ads in those pages, making the sp@mmer a nice profÃt in a very short amount of time.

Billions or MillÃons? What is Broken?

Word of this achievement spread like wildfire from the DigitalPoint Forums. It then spread like wildfire into the SEO community, to be specific. The "general public" is, as of yet, out of the loop, and will probably remain so. A response by a Google engineer appeared on a Threadwatch thread about the topic, calling it a "bad data push". Basically, the company line was they have not, in fact, added 5 billion pages. Later claims include assurances the issue will be fixed algorithmically. Those following the situation (by tracking the known domains the sp@mmer was using) see only that Google is removing them from the index manually.

The tracking is accomplished using the "site:" command. A command that, theoretically, displays the total number of indexed pages from the site you specify after the colon. Google has already admitted there are problems with this command, and "5 billion pages", they seem to be claiming, is merely another symptom of it. These problems extend beyond merely the site: command, but the display of the number of results for many queries, which some feel are highly inaccurate and in some cases fluctuate wildly. Google admits they have indexed some of these spammy subdomains, but so far haven't provided any alternate numbers to dispute the 3-5 billion shown initially via the site: command.
Over the past week the number of the spammy domains & subdomains indexed has steadily dwindled as Google personnel remove the listings manually. There's been no official statement that the "loophole" is closed. This poses the obvious problem that, since the way has been shown, there will be a number of copycats rushing to cÃ¤sh in before the algorithm is changed to deal with it.

Conclusions

There are, at minimum, two things broken here. The site: command and the obscure, tiny bit of the algorithm that allowed billions (or at least millÃons) of sp@m subdomains into the index. Google's current priority should probably be to close the loophole before they're buried in copycat spammers. The issues surrounding the use or misuse of Adsense are just as troubling for those who might be seeing little return on their advertising budget this month.

Do we "keep the faith" in Google in the face of these events? Most likely, yes. It is not so much whether they deserve that faith, but that most people will nevÃ«r know this happened. Days after the story broke there's still very little mention in the "mainstream" press. Some tech sites have mentioned it, but this isn't the kind of story that will end up on the evening news, mostly because the background knowledge required to understand it goes beyond what the average citizen is able to muster. The story will probably end up as an interesting footnote in that most esoteric and neoteric of worlds, "SEO History."

Cyclops, Jul 10, 2006 IP

georgiecasey Member

Messages:: 56

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#891

Can't believe this thread is still going. I read up until page 43.

georgiecasey, Jul 10, 2006 IP

Dekker Peon

Messages:: 4,185

Likes Received:: 287

Best Answers:: 0

Trophy Points:: 0

#892

Let's all chip in and put this on a PRweb or press release wire...

Dekker, Jul 10, 2006 IP

Cristian Mezei Notable Member

Messages:: 3,332

Likes Received:: 355

Best Answers:: 0

Trophy Points:: 213

#893

SVZ said: ↑

Let's all chip in ....
Click to expand...

Can I chip out ?

Cristian Mezei, Jul 10, 2006 IP

wrkalot Well-Known Member

Messages:: 285

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 108

#894

mvandemar said: ↑

Damn, that was like, 13 hours. Way too long guys, gotta move faster than that. Time em Ninno, here's another one:
http://www.google.com/search?hl=en&q=site:48s2jg3mm.info
53,000 results at the time of this post.

-Michael
Click to expand...

It's gone......

EDIT: Nevermind I see that I'm a lot slower than G

wrkalot, Jul 10, 2006 IP

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#895

Mmmmm... chips...

minstrel, Jul 10, 2006 IP

Nintendo ♬ King of da Wackos ♬

Messages:: 12,890

Likes Received:: 1,064

Best Answers:: 0

Trophy Points:: 430

#896

Cyclops said: ↑

Pushing Bad Data
Google's Latest Black Eye
Click to expand...

What's the URL to that??!! I got the E-Mail about that, but it's HTML so I can't read it!!

Nintendo, Jul 10, 2006 IP

Cristian Mezei Notable Member

Messages:: 3,332

Likes Received:: 355

Best Answers:: 0

Trophy Points:: 213

#897

Nintendo said: ↑

but it's HTML so I can't read it!!
Click to expand...

Get Googlebot to translate it to you.

Cristian Mezei, Jul 10, 2006 IP

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#898

There's about a million copies of this by now

http://www.google.com/search?source...&q=Pushing+Bad+Data+Google's+Latest+Black+Eye

I stand corrected:

Results 1 - 10 of about 4,260,000 for Pushing Bad Data Google's Latest Black Eye.
Click to expand...

minstrel, Jul 10, 2006 IP

Nintendo ♬ King of da Wackos ♬

Messages:: 12,890

Likes Received:: 1,064

Best Answers:: 0

Trophy Points:: 430

#899

More like a measly 232.
http://www.google.com/search?num=100&q="Pushing+Bad+Data+Google's+Latest+Black+Eye"

Nintendo, Jul 10, 2006 IP

KLB likes this.

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#900

Pffttt... real search men don't use those pretty little quote marks

minstrel, Jul 10, 2006 IP

Log in or Sign up

New SPAM sites...billions of results!!!!

mvandemar Notable Member

IamNed Peon

Nintendo ♬ King of da Wackos ♬

Twan Well-Known Member

Nintendo ♬ King of da Wackos ♬

KLB Peon

Art Peon

KLB Peon

Nintendo ♬ King of da Wackos ♬

Cyclops sensei

georgiecasey Member

Dekker Peon

Cristian Mezei Notable Member

wrkalot Well-Known Member

minstrel Illustrious Member

Nintendo ♬ King of da Wackos ♬

Cristian Mezei Notable Member

minstrel Illustrious Member

Nintendo ♬ King of da Wackos ♬

minstrel Illustrious Member

Useful Searches