1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Wikipedia Clones and Duplicate content

Discussion in 'Content Management' started by lappy512, Jan 4, 2006.

  1. #1
    I just had a thought while I found one of many wikipedia clones on the web.

    Isn't Google supposed to penalize for duplicate content? If so, why are there so many websites like:
    www.informationblast.com
    www.answers.com
    www.biocrawler.com

    All from less than a minute of searching.

    Is it because wikipedia's content changes quickly, and they have archives of wikipedia's content? But still, wouldn't large sections appear to be plagurised?
     
    lappy512, Jan 4, 2006 IP
  2. MattEvers

    MattEvers Notable Member

    Messages:
    1,792
    Likes Received:
    137
    Best Answers:
    0
    Trophy Points:
    260
    #2
    Is Answers.com a wiki scraper? Seems like a great domain to be a scraper site...
     
    MattEvers, Jan 4, 2006 IP
  3. mark1

    mark1 Peon

    Messages:
    372
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #3
    yeah - when I was researching for a uni project answers came up a lot. It was interesting to see that it was duplicate content from wikipedia but the answers website was rating higher in the results!
     
    mark1, Jan 5, 2006 IP
  4. cormac

    cormac Peon

    Messages:
    3,662
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Doesnt wikipedia have index restrictions? I remember Google was pissin their pants to index all of wikipedia but were blocked only recently.

    Those folks at answers.com most have had a difficult time copying & pasting all those articles :rolleyes:
     
    cormac, Jan 5, 2006 IP
    Sharpseo likes this.
  5. lappy512

    lappy512 Peon

    Messages:
    277
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Crawl-delay: 1
    for wikipedia. Google is not blocked.

    http://download.wikimedia.org/
    Wikipedia's database is dumped here. (Oh noes! I've just probabaly fostered another creation of a wikipedia clone!)
     
    lappy512, Jan 5, 2006 IP
  6. jwbond

    jwbond Guest

    Messages:
    89
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #6
    yes, but even G's processing power is finite.

    think of how many pages there are on the web. now think of the processing power it would take to compare one page to the rest of the web. now multiply that processing time by the amount of pages in the web and you can see how much it would take to do so.

    dup content is primarily penalized when it is within the same site. it is rarely checked from site to site, and even then it is only done for a handful of sites.
     
    jwbond, Jan 5, 2006 IP
  7. Seiya

    Seiya Peon

    Messages:
    4,666
    Likes Received:
    404
    Best Answers:
    0
    Trophy Points:
    0
    #7
    a spider can do it easily either directly or through cache
     
    Seiya, Jan 5, 2006 IP
  8. cormac

    cormac Peon

    Messages:
    3,662
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Sorry I actually jumped the gun here without proper knowledge.

    Would I be right to say that Google offered to host Wikipedia on their own servers?

    I was also wrong to say about answers.com copy & pasting :eek:

    http://meta.wikimedia.org/wiki/Wikimedia_partners_and_hosts
     
    cormac, Jan 5, 2006 IP