Scraper Sites

Discussion in 'HTML & Website Design' started by -Achilles-, Nov 30, 2006.

  1. #1
    I have been doing some research on scaper sites. Basically I want to experiment with them. I know, I'm a horrible person, but I know alot of big name guys have blogs for example that pull content from other sites and use it on their sites. What programs do they use to set this up? Is it difficult to do? I'm not really interested in creating tons of spam sites as I know its against adsense terms and conditions, but I'm extremely curious as to how these programs and sites work. I have search google for the last hour - 2 hours and can't find any information on how they are actually set up. Could anyone help me out?

    I'm not really interested in ethics here people so please don't post saying its not a wise thing to do, its unethical, etc, etc. I'm sure many of you have been curious about stuff that you know you shouldn't be :)

    Thanks for any help you guys can offer, and its appreciated!
     
    -Achilles-, Nov 30, 2006 IP
  2. Nick_Mayhem

    Nick_Mayhem Notable Member

    Messages:
    3,486
    Likes Received:
    338
    Best Answers:
    0
    Trophy Points:
    290
    #2
    Then why don't you create that site on localhost and test it out :)

    It can surely satisfy your curiosity.
     
    Nick_Mayhem, Nov 30, 2006 IP
  3. -Achilles-

    -Achilles- Banned

    Messages:
    166
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I never said I was going to make it a public site..... I said I just was curious how they work and stuff and have been searching the web for how they work and can't find anything on the nature of it. Like which tools are used, besides just the fact of scraper tools are used. I mean I know that much lol. But specifically which tools are the best for doing it with? Any you guys personally like over others?
     
    -Achilles-, Dec 1, 2006 IP
  4. PayItForward

    PayItForward Peon

    Messages:
    752
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The first page of this link should answer most, if not all of your question.
     
    PayItForward, Dec 1, 2006 IP
  5. Nick_Mayhem

    Nick_Mayhem Notable Member

    Messages:
    3,486
    Likes Received:
    338
    Best Answers:
    0
    Trophy Points:
    290
    #5
    Whenever you scrape a site make sure you pass it through the markov chain algo.
     
    Nick_Mayhem, Dec 1, 2006 IP
  6. -Achilles-

    -Achilles- Banned

    Messages:
    166
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Nope, actually it didn't mention much I didn't already know. Let me try to rephrase...As I've stated I've been searching like crazy on the subject of scraper sites and I understand how to create rss feeds from site by taking their content. But for example. If anyone listens to shoemoney's radio shows, he had a guest on, can't quite remember who, who owns this site. http://www.ucrave.com He said its basically a scraper site. They pull info from another source and put it ont his blog. So my questions is how did he make a rss feed look like that? He can't be using a screen scrape program that creates a rss feed from another website....Could he? I mean I've never seen an rss feed look like that nor have I ever seen a program that allows you to make a rss feed look like that.
     
    -Achilles-, Dec 1, 2006 IP
  7. Nick_Mayhem

    Nick_Mayhem Notable Member

    Messages:
    3,486
    Likes Received:
    338
    Best Answers:
    0
    Trophy Points:
    290
    #7
    Get RSS to HTML parser. It will clear the doubts.
     
    Nick_Mayhem, Dec 1, 2006 IP
  8. PayItForward

    PayItForward Peon

    Messages:
    752
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    0
    #8
    It would be quite expensive to get a clone of the ucrave script created. It would take quite a bit of PHP knowledge if you were to try it yourself.
     
    PayItForward, Dec 1, 2006 IP
  9. dragnet

    dragnet Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    There are certian content that wants to be scraped and other that doesnt. For example, drop shippers would love for you to sculk thier content onto your site. Look into the tools on this cool site.

    http://www.simplifiedsec.com/index.html

    This company has threads on DP.
     
    dragnet, Dec 1, 2006 IP