Wikipedia Clones and Duplicate content

lappy512 Peon

Messages:: 277

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#1

I just had a thought while I found one of many wikipedia clones on the web.

Isn't Google supposed to penalize for duplicate content? If so, why are there so many websites like:
www.informationblast.com
www.answers.com
www.biocrawler.com

All from less than a minute of searching.

Is it because wikipedia's content changes quickly, and they have archives of wikipedia's content? But still, wouldn't large sections appear to be plagurised?

lappy512, Jan 4, 2006 IP

MattEvers Notable Member

Messages:: 1,792

Likes Received:: 137

Best Answers:: 0

Trophy Points:: 260

#2

Is Answers.com a wiki scraper? Seems like a great domain to be a scraper site...

MattEvers, Jan 4, 2006 IP

mark1 Peon

Messages:: 372

Likes Received:: 12

Best Answers:: 0

Trophy Points:: 0

#3

yeah - when I was researching for a uni project answers came up a lot. It was interesting to see that it was duplicate content from wikipedia but the answers website was rating higher in the results!

mark1, Jan 5, 2006 IP

cormac Peon

Messages:: 3,662

Likes Received:: 222

Best Answers:: 0

Trophy Points:: 0

#4

Doesnt wikipedia have index restrictions? I remember Google was pissin their pants to index all of wikipedia but were blocked only recently.

Those folks at answers.com most have had a difficult time copying & pasting all those articles

cormac, Jan 5, 2006 IP

Sharpseo likes this.

lappy512 Peon

Messages:: 277

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#5

Crawl-delay: 1
for wikipedia. Google is not blocked.

http://download.wikimedia.org/
Wikipedia's database is dumped here. (Oh noes! I've just probabaly fostered another creation of a wikipedia clone!)

lappy512, Jan 5, 2006 IP

jwbond Guest

Messages:: 89

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#6

lappy512 said:

Isn't Google supposed to penalize for duplicate content?
Click to expand...

yes, but even G's processing power is finite.

think of how many pages there are on the web. now think of the processing power it would take to compare one page to the rest of the web. now multiply that processing time by the amount of pages in the web and you can see how much it would take to do so.

dup content is primarily penalized when it is within the same site. it is rarely checked from site to site, and even then it is only done for a handful of sites.

jwbond, Jan 5, 2006 IP

Seiya Peon

Messages:: 4,666

Likes Received:: 404

Best Answers:: 0

Trophy Points:: 0

#7

turfsniffer said:

Doesnt wikipedia have index restrictions? I remember Google was pissin their pants to index all of wikipedia but were blocked only recently.

Those folks at answers.com most have had a difficult time copying & pasting all those articles
Click to expand...

a spider can do it easily either directly or through cache

Seiya, Jan 5, 2006 IP

cormac Peon

Messages:: 3,662

Likes Received:: 222

Best Answers:: 0

Trophy Points:: 0

#8

lappy512 said:

Crawl-delay: 1
for wikipedia. Google is not blocked
Click to expand...

Sorry I actually jumped the gun here without proper knowledge.

Would I be right to say that Google offered to host Wikipedia on their own servers?

I was also wrong to say about answers.com copy & pasting

http://meta.wikimedia.org/wiki/Wikimedia_partners_and_hosts

cormac, Jan 5, 2006 IP

Log in or Sign up

Wikipedia Clones and Duplicate content

lappy512 Peon

MattEvers Notable Member

mark1 Peon

cormac Peon

lappy512 Peon

jwbond Guest

Seiya Peon

cormac Peon

Useful Searches