Scrapper site data on DP

Discussion in 'General Chat' started by adacprogramming, Feb 8, 2008.

  1. #1
    I thought this was interesting. This came up in my Google alerts. Looks like someone scrapping all of the places on DP where URLs were posted. Looks like it's classifying them all as providers of paid links.
    http:// www. realranker. com/orient/vegas/paid.txt (remove the spaces)

    In case they move the file heres a sample of it. the file is huge.
    
    www.sendmoneytransfer.com/@#@http://forums.digitalpoint.com/showthread.php?t=299998
    www.winn-and-sims.com/2004/11/winn-and-sims-anonymously-slanders-me.html@#@http://forums.digitalpoint.com/showthread.php?t=299993
    www.zoomcities.com@#@http://forums.digitalpoint.com/showthread.php?t=299993
    www.auctionadvice.com/getting_started/ebay-registration.php@#@http://forums.digitalpoint.com/showthread.php?t=299953
    www.yourfunnymedia.com@#@http://forums.digitalpoint.com/showthread.php?t=299953
    www.cambodiaxp.com/forum/@#@http://forums.digitalpoint.com/showthread.php?t=299953
    www.auctionadvice.com/@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.digitalpointing.com@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.jerlene.net@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.yourfunnymedia.com@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.cambodiaxp.com/forum/@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.alivedirectory.com/blog@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.alivedirectory.com@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.digitalpointing.com@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.jerlene.net@#@http://forums.digitalpoint.com/showthread.php?t=299950
    www.auction-registration.com/@#@http://forums.digitalpoint.com/showthread.php?t=299936
    blog.aplus.net/@#@http://forums.digitalpoint.com/showthread.php?t=299936
    carrospt.blogspot.com/@#@http://forums.digitalpoint.com/showthread.php?t=299936
    blog.aplus.net/@#@http://forums.digitalpoint.com/showthread.php?t=299936
    carrospt.blogspot.com/@#@http://forums.digitalpoint.com/showthread.php?t=299936
    www.elegantdirectory.com/blog@#@http://forums.digitalpoint.com/showthread.php?t=299936
    www.highstuff.com@#@http://forums.digitalpoint.com/showthread.php?t=299936
    www.affiliateuniverse.info@#@http://forums.digitalpoint.com/showthread.php?t=299936
    www.business-directory-northeast.co.uk@#@http://forums.digitalpoint.com/showthread.php?t=299936
    blog.aplus.net/@#@http://forums.digitalpoint.com/showthread.php?t=299936
    
    
    Code (markup):

     
    adacprogramming, Feb 8, 2008 IP
  2. scylla

    scylla Notable Member

    Messages:
    1,025
    Likes Received:
    33
    Best Answers:
    1
    Trophy Points:
    225
    #2
    Looks to be harmless. He's probably hoping to get more keyword results, who knows...
     
    scylla, Feb 8, 2008 IP
  3. InfiniteTech

    InfiniteTech Active Member

    Messages:
    380
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    70
    #3
    Probably by the Google Index Bot to ban those domains?
     
    InfiniteTech, Feb 9, 2008 IP
  4. redhits

    redhits Notable Member

    Messages:
    3,023
    Likes Received:
    277
    Best Answers:
    0
    Trophy Points:
    255
    #4
    Hello , I am the programmer who did that website, and yes you are very fxxx right.


    I am classffing the websites :) , and not only from digitalpoint but from all the web.

    1. We are creating a system more like pagerank , to give each website a rank.

    2. the tool will also alert if a website was selling backlinks in buy/sell area from digitalpoint/sitepoint , or was participating in massive web directory submission , or in link farms etc.


    *** UPDATE
    there were much more files like that marking 'paid websites' , so i said to remove them so nobody else can use my database.
    I worked a few weeks to get that data (crawling the whole dp took almost half of day,etc)

    Anyway you can still watch the cache of that file for a few more days here:

    http://209.85.135.104/search?q=cach...id.txt+site:realranker.com&hl=en&ct=clnk&cd=5
     
    redhits, Feb 11, 2008 IP