I bet you are thinking, ah it's been done...but I want to create something for my country only and base it on META tags. I've seen sites like Hoppa and searchpole and want to create something like that. Could anyone suggest any software/advice on what I need to do? Thanks guys
There are good scripts you can use out there, but you say search engine, do you mean a full blown search engine with its own spider that runs around adding sites, or are you talking about a site with spidered sites included, but manually apporved, that has a search engine algorithm? I would recommend the fluid dynamics search engine, it is a good thing, written in perl, so is mlimited in the volume of pages it will run with before getting slow. It has its own spider, and you can add weight to the algo on things like , newness of page, words in metas, copy repetition etc. http://www.xav.com/scripts/search/
thanks, it's pretty good. Yep I'm looking to manually approve and charge people if they want to submit it quickly.
Curious ... which country are u building the search engine for .... i would suggest you to download the dmoz.org dir and use the category info to just filter out sites not related to ur country...
http://rdf.dmoz.org/ It's big...so hope you have a fast connection, I'd recommending downloading directly to your web server than to your desktop first (if you can do that)
thanks for this, I'll give it a go! Any other tips on creating a Search Engine/Promoting it etc., I've developed the rough code for it - so at least it's a start.
sure, I'd try this: start with dmoz's data. crawl each site there. only keep the "main" domain pages- ie store www.google.com, but no internal pages. have your bot follow everything in the dmoz mirror you set up. after that, have it continue to crawl. maybe set up some sort of cap on how much space you want it to take before it'll give up. once you have that done, I'm sure you'd have enough to play around with. if you allocated 2 gigs, and each page was 50KB, you could get something like 40,000 sites' main page indexed.
Can you legally run a full blown spider on your server? What scripts are out there to do this? Thanks in advance.
If my memory still serve me right, i found a search engine script coded using asp.net/c++. It's great nevertheless. The programmer used 1 day to code it. He's an expert btw.
This is the best search engine software I've ever tried: www.webwizguide.com/asp/sample_scripts/site_search_script.asp users submit a site, you approve them manually and surfers can rate the sites as well. Only con: you need a Windows based server because it is coded in ASP using MS access database.
Thanks, just wonder what the quality will be after a day's work...I can actually answer my own question now, I made my own search engine...
I created and posted the ASP code to create a spider here: http://www.justin-cook.com/wp/2006/04/14/how-to-write-a-spiderbot-with-asp/
about 27,000 html pages that are 40kb is 1GB. It also took about 10 hours to download them all at about 200kb/sec. your need a lot of computers and bandwidth to create a good spider. Id say 10 computers would be sufficent enough.
There are many options like: mngosearch, Datapark, ASPSeek, Sphider, phpDig, SMEMETA, Curryguide script, Jomo PPC Search. I have used all of the above and have experience in it. If you are thinking anything commercial then let me know. I can help you out.
Hi everyone. it looks that there are some pretty switched on cookies here. im looking to develop a search engine based on post below users submit a site, you approve them manually and surfers can rate the sites as well. required to run on a Windows based server using MYSQL and have additional modules added based on fuctionalitys alexa has for example a user can request a reindex of there individulal website Would anyone be interested in building such a search engine. Final end development would everyones to use who was part of this development It also must be able to index selected domain extentions Any information on pre built search engines or anyone interested in helping me build this are more than welcome to contact me
I was interested till I saw this part of the post. Golly, I would be willing to setup a nice little linux box if there were enough people interested. But, the reality is... What will YASE (yet another search engine) really do? What is the business model? Or is this a "fun" project? If it is a fun project, then I am all for it.
If you are thinking about users to submit a site then it is not called search engine. It is called Directory and then in it the users surf and rate the things. If you want it to run on windows and want to go with ASP then I can suggest UUDir which is also developed by a DPer here. Or if you want to go with PHP then you can get esyndicat which is free one. It has that alexa and Page rank type of thing in it. But If you want registered members to rate and review then there will be some custom coding involved in it. As for approving and other features. that can be done easily. are you interested commercially? If yes then you can PM me I can develop this kind of things.
I think everyone at DP already has their own directory. I am not sure just because people submit web sites, that makes it a directory - help me to understand that distinction.