How to make use of wiki data?

Old Welsh Guy Notable Member

Messages:: 2,699

Likes Received:: 291

Best Answers:: 0

Trophy Points:: 205

#1

I want to set up a niche wiki, and seed it with data from wikipedia. I know this is ok under the terms of licence, but am looking for advice/help from anyone who knows how to do it.

I understand there is a massive wiki dump with ALL the dat, but I only want to use certain sections of it. Can anyone give any advice or guidance or a point in the right directoion of scripts that will help me achieve this.

Thanks in advance

Old Welsh Guy, Jun 16, 2006 IP

donteatchicken Well-Known Member

Messages:: 432

Likes Received:: 28

Best Answers:: 0

Trophy Points:: 118

#2

http://raa.ruby-lang.org/project/wikipedia2html/

Let me know if it works, I've yet to try it..

donteatchicken, Jun 18, 2006 IP

Old Welsh Guy Notable Member

Messages:: 2,699

Likes Received:: 291

Best Answers:: 0

Trophy Points:: 205

#3

Thanks for that, but looking at it, it works on the whole dump There are other scripts that do that already. I know I can scrape the data, but that is not the done thing, I want to make use of it properly.

Old Welsh Guy, Jun 19, 2006 IP

KalvinB Peon

Messages:: 2,787

Likes Received:: 78

Best Answers:: 0

Trophy Points:: 0

#4

You can download the wikipedia data dumps from wikipedia.

http://en.wikipedia.org/wiki/Wikipedia:Download

You then install MediaWiki on your own server and run the import script included with MediaWiki. The directions are on the Wikipedia site.

The whole process took several days on my PIII 900 to import all the articles (there are over 1 million) but once it was loaded it seems to be working smoothly enough.

Since Wikipedia is not organized by categories it'll be very difficult to do a niche version of it. If you only want certain pages you'll most likely have to manually grab the page source and post it into your own wiki page.

You could just as well take the whole thing and then set up a blog on the front page and link to the articles you're interested in. Create categories for your blog entries to index the pages for your niche that way.

Google loves the blog+wiki combo

And with a million+ articles you'll do rather well in search engines if you can get Google to index more than just the articles you point to with the blog.

KalvinB, Jun 20, 2006 IP

toykilla Peon

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

You can export all the pages you need

en.wikipedia.org/wiki/Special:Export

Then import them onto your wiki. There is a problem in that you will have to manually grab and upload the pictures to your own server.

You can mass export by going to a category page on wikipedia and copying all the pages listed in that category and pasting them into the export form.

toykilla, Jun 22, 2006 IP

Old Welsh Guy Notable Member

Messages:: 2,699

Likes Received:: 291

Best Answers:: 0

Trophy Points:: 205

#6

toykilla said: ↑

You can export all the pages you need

en.wikipedia.org/wiki/Special:Export

Then import them onto your wiki. There is a problem in that you will have to manually grab and upload the pictures to your own server.

You can mass export by going to a category page on wikipedia and copying all the pages listed in that category and pasting them into the export form.
Click to expand...

Run that by me again? I visited that page, but it says it has been disabled.

How about this idea then, Does anyone know of a wiki scraper script I can use? I will download the dump, install it on my OWN server in media wiki, then I can scrape the data from MY OWN version of wikipedia.

I have emphasised my own, as I have absolutely no intention of scraping live wiki data, but I have a feling this is the easiest way to do it. To this end I will mirror wiki on a server of my own, so it is only my own bandwidth and resources that are being scraped.

Old Welsh Guy, Jun 22, 2006 IP

DXL Peon

Messages:: 380

Likes Received:: 21

Best Answers:: 0

Trophy Points:: 0

#7

Why don't you just write a script that opens the file, parses the XML and then checks if the "text" element contains a reference to your niche's category?

DXL, Jun 22, 2006 IP

Log in or Sign up

How to make use of wiki data?

Old Welsh Guy Notable Member

donteatchicken Well-Known Member

Old Welsh Guy Notable Member

KalvinB Peon

toykilla Peon

Old Welsh Guy Notable Member

DXL Peon

Useful Searches