New SPAM sites...billions of results!!!!

Discussion in 'Google' started by Nintendo, Jun 17, 2006.

  1. lorien1973

    lorien1973 Notable Member

    Messages:
    12,206
    Likes Received:
    601
    Best Answers:
    0
    Trophy Points:
    260
    #721
    In all serious, here is a good lesson from this.

    There is no stay required in the sanbox, it happens because of a faulty (or perhaps reasonable) link strategy. If a site gets a ton of inbound links from different (sub)domains at once, it can bypass the sanbox as it is considered to be different (a new fad or whatever) - an other example is the "numa numa" video. It hit google immediately, because it got a ton of inbounds all at once. As did these sites. Tons of inbounds (legitimate or not), instant ranking on google. Didn't see this dude wondering about when the sandbox would release him.

    So, IMHO, when starting a new site, it makes good sense (maybe not long term sense, but who knows) to get as many inbound links as possible to avoid any "sandboxing"
     
    lorien1973, Jun 24, 2006 IP
    latehorn likes this.
  2. pjbrunet

    pjbrunet Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #722
    There's a website called "turnitin.com" or something like that, so teachers can catch cut-and-paste plagiarism. Google would need to do that on a much larger scale, and faster. Looking for dupes, more credit should be given to the first occurrence of a phrase. This would encourage better writing and discourage scraping. I'm sure I'll see eventually, at least in my lifetime ;)

    I have also seen algorithms that will read some text and estimate the level of education of the writer. I wouldn't be surprised if Google has something like this already, at least on a rudimentary level.

    Google already checks for profanity, so searching for select grammar mistakes and blatant mispellings would seem feasible too. Google isn't just in the search biz--their biz is promoting the Internet too. That's why they want everyone to have Internet access: more people online = more money. The Internet would be more attractive if, on the whole, people were motivated to write well. Hell, I'll code it myself if I have to. A script to rank the blogosphere by level of education would be disruptive for sure. If you steal my idea, I want credit ;)
     
    pjbrunet, Jun 24, 2006 IP
  3. Obelia

    Obelia Notable Member

    Messages:
    2,083
    Likes Received:
    171
    Best Answers:
    0
    Trophy Points:
    210
    #723
    There are a couple of problems with this, but neither are insurmountable: people who write on really technical or niche topics are likely to include a lot of jargon, which a spellchecker would not recognise. So the threshold for misspellings needs to be fairly high. The other problem might be people who record no language in their html, or mix several languages in one document, or who accidentally put in the wrong language encoding.
     
    Obelia, Jun 24, 2006 IP
  4. pjbrunet

    pjbrunet Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #724
    This is what I was thinking of--something like a "Readability Test"

    http://juicystudio.com/services/readability.php?url=
     
    pjbrunet, Jun 24, 2006 IP
  5. Manish Pandey

    Manish Pandey Well-Known Member

    Messages:
    656
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    133
    #725
    Well certainly this has to happen... and i guess the google engineers now have to reconsider there algo and tweak them a little so that we dont have to fear of certain spam any more...
     
    Manish Pandey, Jun 24, 2006 IP
  6. lorien1973

    lorien1973 Notable Member

    Messages:
    12,206
    Likes Received:
    601
    Best Answers:
    0
    Trophy Points:
    260
    #726
    Then we can fear a whole new type of spam!

    To quote an apropo Star Wars line:

     
    lorien1973, Jun 24, 2006 IP
    Obelia likes this.
  7. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #727
    Nintendo, Jun 24, 2006 IP
  8. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #728
    anthonycea, Jun 24, 2006 IP
  9. lorien1973

    lorien1973 Notable Member

    Messages:
    12,206
    Likes Received:
    601
    Best Answers:
    0
    Trophy Points:
    260
    #729
    zeezo at least has some content; I think its a different animal. About makes subdomains too, but it has content as well.
     
    lorien1973, Jun 24, 2006 IP
  10. jimboot

    jimboot Active Member

    Messages:
    146
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    58
    #730
    Yep and mine shows
    Results 1 - 10 of about 23,920,000,000 for a. (0.20 seconds)
    Which is similar to the before the site was removed.
     
    jimboot, Jun 24, 2006 IP
  11. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #731

    Right, that was the first one ever discovered that was a big time scam, Google took many of the back to back results out of the SERP's but left the sites indexed.

    I really don't need to get into it because everyone that was here then remembers the story!
     
    anthonycea, Jun 24, 2006 IP
  12. markhutch

    markhutch Peon

    Messages:
    357
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    0
    #732
    Back to the "Bad Data Push" statement last weekend. I have noticed a decrease in the number of pages from my web sites that were corrupted up until that "push". They are now coming out of supplemental status as they are recrawled by Googlebot. I seemed to have had a ton of pages with errors in the cache version of the page title. Sorry if this has already been covered in this thread. I’ve been working on other projects all week and don’t have time to read this entire HUGE thread tonight.
     
    markhutch, Jun 24, 2006 IP
  13. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #733
    I am slowly - VERY slowly - seeing some recovery too, Mark. Over the past few days, some of those non-existent pages are dropping out and real currently existing pages are being added.

    I don't know what they did but somebody is NOT going to get his Christmas bonus at the Googleplex this year. :eek:
     
    minstrel, Jun 24, 2006 IP
  14. old_expat

    old_expat Peon

    Messages:
    188
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #734
    How can this be so when there are so many domains/sites on shared IPs?
     
    old_expat, Jun 24, 2006 IP
  15. markhutch

    markhutch Peon

    Messages:
    357
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    0
    #735
    I’m glad to hear that I’m not the only one seeing improvement. When I say improvement I’m talking about an additional 50 pages or so out of 1000, but at least it is progress. I think Google had to push that bad data back into the index before Googlebot could recrawl it. More than likely when their software detects errors it removes those pages from the index. I’ll bet they had millions of pages with similar errors on them. These errors were not the fault of webmasters, but how Google fetched those pages and stored them to begin with. I guess the good news in that Googlebot is extremely fast at crawling and I hope that things will get back to normal before too much longer.
     
    markhutch, Jun 24, 2006 IP
  16. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #736
    Nintendo, Jun 24, 2006 IP
  17. MikeSwede

    MikeSwede Peon

    Messages:
    601
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #737
    is actually that they had so MANY pages indexed.
    There are thousands of sites like that that is sneaking under the radar with 50-75,000 pages indexed in one domain but they have maybe hundreds of them in the index. I have reported a lot of them because they have scraped content from my sites but Google doesn't give a s**t about it. All they care about is that they are the leading SE with most number of pages so it looks good and that they can show their ads on spam sites and make some $$$
    I don't understand WTF they are thinking when original sites with good content are deleted but sites with scraped content from my site are still there?? How weird is that???:confused:
     
    MikeSwede, Jun 24, 2006 IP
  18. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #738
    Oh yes they do...if it makes the index page of Digg!!! Then Spamoogle quickly banns them!!
     
    Nintendo, Jun 24, 2006 IP
  19. MikeSwede

    MikeSwede Peon

    Messages:
    601
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #739
    Ahhh.... but then there has to be 1 billion pages or so, right :)
     
    MikeSwede, Jun 24, 2006 IP
  20. Nintendo

    Nintendo ♬ King of da Wackos ♬

    Messages:
    12,890
    Likes Received:
    1,064
    Best Answers:
    0
    Trophy Points:
    430
    #740
    It looks like it. It looks like five million SPAM pages isn't enough!!!
     
    Nintendo, Jun 24, 2006 IP