View Full Version : How can I create my own Search Engine?
misohoni
Sep 18th 2004, 3:35 am
I bet you are thinking, ah it's been done...but I want to create something for my country only and base it on META tags.
I've seen sites like Hoppa and searchpole and want to create something like that.
Could anyone suggest any software/advice on what I need to do?
Thanks guys
Old Welsh Guy
Sep 18th 2004, 4:39 am
There are good scripts you can use out there, but you say search engine, do you mean a full blown search engine with its own spider that runs around adding sites, or are you talking about a site with spidered sites included, but manually apporved, that has a search engine algorithm?
I would recommend the fluid dynamics search engine, it is a good thing, written in perl, so is mlimited in the volume of pages it will run with before getting slow. It has its own spider, and you can add weight to the algo on things like , newness of page, words in metas, copy repetition etc.
http://www.xav.com/scripts/search/
misohoni
Sep 18th 2004, 5:31 am
thanks, it's pretty good. Yep I'm looking to manually approve and charge people if they want to submit it quickly.
nriweb
Sep 22nd 2004, 11:20 pm
Curious ... which country are u building the search engine for .... i would suggest you to download the dmoz.org dir and use the category info to just filter out sites not related to ur country...
misohoni
Sep 24th 2004, 9:28 am
hi, how can I download the Dmoz directory?
Do you mean I should copy the structure?
Sholva
Sep 24th 2004, 9:56 am
http://rdf.dmoz.org/
It's big...so hope you have a fast connection, I'd recommending downloading directly to your web server than to your desktop first ;) (if you can do that)
misohoni
Sep 29th 2004, 8:44 pm
thanks for this, I'll give it a go! Any other tips on creating a Search Engine/Promoting it etc., I've developed the rough code for it - so at least it's a start.
disgust
Sep 29th 2004, 9:15 pm
sure, I'd try this:
start with dmoz's data. crawl each site there. only keep the "main" domain pages- ie store www.google.com, but no internal pages. have your bot follow everything in the dmoz mirror you set up. after that, have it continue to crawl. maybe set up some sort of cap on how much space you want it to take before it'll give up. once you have that done, I'm sure you'd have enough to play around with.
if you allocated 2 gigs, and each page was 50KB, you could get something like 40,000 sites' main page indexed.
donteatchicken
Apr 14th 2006, 10:07 am
Can you legally run a full blown spider on your server?
What scripts are out there to do this?
Thanks in advance.
eklim8
Apr 15th 2006, 10:03 pm
If my memory still serve me right, i found a search engine script coded using asp.net/c++. It's great nevertheless. The programmer used 1 day to code it. He's an expert btw. :)
Mystique
Apr 15th 2006, 10:32 pm
This is the best search engine software I've ever tried:
www.webwizguide.com/asp/sample_scripts/site_search_script.asp
users submit a site, you approve them manually and surfers can rate the sites as well.
Only con: you need a Windows based server because it is coded in ASP using MS access database.
misohoni
Apr 15th 2006, 10:41 pm
Thanks, just wonder what the quality will be after a day's work...I can actually answer my own question now, I made my own search engine...
finaldestination
Apr 16th 2006, 2:31 pm
misohoni please explain what u di?
also if u cud share the code with us
frankcow
Apr 19th 2006, 11:04 am
I created and posted the ASP code to create a spider here: http://www.justin-cook.com/wp/2006/04/14/how-to-write-a-spiderbot-with-asp/
Psychotomus
Apr 20th 2006, 4:36 pm
if you allocated 2 gigs, and each page was 50KB, you could get something like 40,000 sites' main page indexed.
about 27,000 html pages that are 40kb is 1GB. It also took about 10 hours to download them all at about 200kb/sec.
your need a lot of computers and bandwidth to create a good spider. Id say 10 computers would be sufficent enough.
Nick_Mayhem
Apr 22nd 2006, 10:00 am
There are many options like:
mngosearch, Datapark, ASPSeek, Sphider, phpDig, SMEMETA, Curryguide script, Jomo PPC Search.
I have used all of the above and have experience in it. If you are thinking anything commercial then let me know. I can help you out.
paulmcdonald
May 7th 2006, 7:10 pm
Hi everyone. it looks that there are some pretty switched on cookies here.
im looking to develop a search engine based on post below
users submit a site, you approve them manually and surfers can rate the sites as well.
required to run on a Windows based server using MYSQL
and have additional modules added based on fuctionalitys alexa has for example a user can request a reindex of there individulal website
Would anyone be interested in building such a search engine.
Final end development would everyones to use who was part of this development
It also must be able to index selected domain extentions
Any information on pre built search engines or anyone interested in helping me build this are more than welcome to contact me
:)
Phynder
May 7th 2006, 8:02 pm
required to run on a Windows based server
I was interested till I saw this part of the post. Golly, I would be willing to setup a nice little linux box if there were enough people interested.
But, the reality is... What will YASE (yet another search engine) really do? What is the business model? Or is this a "fun" project? If it is a fun project, then I am all for it.
Nick_Mayhem
May 7th 2006, 8:48 pm
If you are thinking about users to submit a site then it is not called search engine. It is called Directory and then in it the users surf and rate the things. If you want it to run on windows and want to go with ASP then I can suggest UUDir which is also developed by a DPer here. Or if you want to go with PHP then you can get esyndicat which is free one.
It has that alexa and Page rank type of thing in it.
But If you want registered members to rate and review then there will be some custom coding involved in it. As for approving and other features. that can be done easily.
are you interested commercially? If yes then you can PM me I can develop this kind of things.
Phynder
May 8th 2006, 8:32 am
I think everyone at DP already has their own directory. I am not sure just because people submit web sites, that makes it a directory - help me to understand that distinction.
MrSupplier
May 8th 2006, 8:35 am
search @ sf.net for opensource spider projects
ajarchibald3
May 20th 2006, 7:21 pm
I'm currently using nutch for developing a vertical search engine. Anyone else out there using nutch.
masterc19
Jul 2nd 2009, 2:06 am
yes i have a domain name it is www. ineedtoknowitall.com perfect name for a search engine
and i would love to create a full blown search engine for it like google.com but i do not know how to build a search engine could anyone tell me how to do this thanks :confused:
Vilims
Jul 2nd 2009, 4:45 am
Thanks for valuable information on search engine programming and i will try also for it.
thanks again
daringtakers
Jul 2nd 2009, 6:17 am
Creating a full blown search engine isn't an easy task, it involves lots lots lots of complexities like creating a spider, parsers, indexing/searching algorithms, and lot's of other stuff.
Forget search engine... Developing an efficient spider only, is a big project.
I will suggest don't try to do,specially when your are doing it alone, instead use google custom search and save your self.
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.