How can I create my own Search Engine?

Discussion in 'Programming' started by misohoni, Sep 18, 2004.

  1. #1
    I bet you are thinking, ah it's been done...but I want to create something for my country only and base it on META tags.

    I've seen sites like Hoppa and searchpole and want to create something like that.

    Could anyone suggest any software/advice on what I need to do?

    Thanks guys
     
    misohoni, Sep 18, 2004 IP
  2. Old Welsh Guy

    Old Welsh Guy Notable Member

    Messages:
    2,699
    Likes Received:
    291
    Best Answers:
    0
    Trophy Points:
    205
    #2
    There are good scripts you can use out there, but you say search engine, do you mean a full blown search engine with its own spider that runs around adding sites, or are you talking about a site with spidered sites included, but manually apporved, that has a search engine algorithm?

    I would recommend the fluid dynamics search engine, it is a good thing, written in perl, so is mlimited in the volume of pages it will run with before getting slow. It has its own spider, and you can add weight to the algo on things like , newness of page, words in metas, copy repetition etc.

    http://www.xav.com/scripts/search/
     
    Old Welsh Guy, Sep 18, 2004 IP
  3. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #3
    thanks, it's pretty good. Yep I'm looking to manually approve and charge people if they want to submit it quickly.
     
    misohoni, Sep 18, 2004 IP
  4. nriweb

    nriweb Well-Known Member

    Messages:
    422
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    108
    #4
    Curious ... which country are u building the search engine for .... i would suggest you to download the dmoz.org dir and use the category info to just filter out sites not related to ur country...
     
    nriweb, Sep 22, 2004 IP
  5. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #5
    hi, how can I download the Dmoz directory?

    Do you mean I should copy the structure?
     
    misohoni, Sep 24, 2004 IP
  6. Sholva

    Sholva Peon

    Messages:
    154
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6
    http://rdf.dmoz.org/

    It's big...so hope you have a fast connection, I'd recommending downloading directly to your web server than to your desktop first ;) (if you can do that)
     
    Sholva, Sep 24, 2004 IP
  7. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #7
    thanks for this, I'll give it a go! Any other tips on creating a Search Engine/Promoting it etc., I've developed the rough code for it - so at least it's a start.
     
    misohoni, Sep 29, 2004 IP
  8. disgust

    disgust Guest

    Messages:
    2,417
    Likes Received:
    133
    Best Answers:
    0
    Trophy Points:
    0
    #8
    sure, I'd try this:

    start with dmoz's data. crawl each site there. only keep the "main" domain pages- ie store www.google.com, but no internal pages. have your bot follow everything in the dmoz mirror you set up. after that, have it continue to crawl. maybe set up some sort of cap on how much space you want it to take before it'll give up. once you have that done, I'm sure you'd have enough to play around with.

    if you allocated 2 gigs, and each page was 50KB, you could get something like 40,000 sites' main page indexed.
     
    disgust, Sep 29, 2004 IP
  9. donteatchicken

    donteatchicken Well-Known Member

    Messages:
    432
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    118
    #9
    Can you legally run a full blown spider on your server?

    What scripts are out there to do this?

    Thanks in advance.
     
    donteatchicken, Apr 14, 2006 IP
  10. eklim8

    eklim8 Peon

    Messages:
    59
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10
    If my memory still serve me right, i found a search engine script coded using asp.net/c++. It's great nevertheless. The programmer used 1 day to code it. He's an expert btw. :)
     
    eklim8, Apr 15, 2006 IP
  11. Mystique

    Mystique Well-Known Member

    Messages:
    2,579
    Likes Received:
    94
    Best Answers:
    2
    Trophy Points:
    195
    #11
    This is the best search engine software I've ever tried:

    www.webwizguide.com/asp/sample_scripts/site_search_script.asp

    users submit a site, you approve them manually and surfers can rate the sites as well.

    Only con: you need a Windows based server because it is coded in ASP using MS access database.
     
    Mystique, Apr 15, 2006 IP
  12. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #12
    Thanks, just wonder what the quality will be after a day's work...I can actually answer my own question now, I made my own search engine...
     
    misohoni, Apr 15, 2006 IP
  13. finaldestination

    finaldestination Guest

    Messages:
    212
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #13
    misohoni please explain what u di?

    also if u cud share the code with us
     
    finaldestination, Apr 16, 2006 IP
  14. frankcow

    frankcow Well-Known Member

    Messages:
    4,859
    Likes Received:
    265
    Best Answers:
    0
    Trophy Points:
    180
    #14
    frankcow, Apr 19, 2006 IP
  15. Psychotomus

    Psychotomus Guest

    Messages:
    427
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #15
    about 27,000 html pages that are 40kb is 1GB. It also took about 10 hours to download them all at about 200kb/sec.


    your need a lot of computers and bandwidth to create a good spider. Id say 10 computers would be sufficent enough.
     
    Psychotomus, Apr 20, 2006 IP
  16. Nick_Mayhem

    Nick_Mayhem Notable Member

    Messages:
    3,486
    Likes Received:
    338
    Best Answers:
    0
    Trophy Points:
    290
    #16
    There are many options like:

    mngosearch, Datapark, ASPSeek, Sphider, phpDig, SMEMETA, Curryguide script, Jomo PPC Search.

    I have used all of the above and have experience in it. If you are thinking anything commercial then let me know. I can help you out.
     
    Nick_Mayhem, Apr 22, 2006 IP
  17. paulmcdonald

    paulmcdonald Guest

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Hi everyone. it looks that there are some pretty switched on cookies here.
    im looking to develop a search engine based on post below

    users submit a site, you approve them manually and surfers can rate the sites as well.
    required to run on a Windows based server using MYSQL
    and have additional modules added based on fuctionalitys alexa has for example a user can request a reindex of there individulal website

    Would anyone be interested in building such a search engine.
    Final end development would everyones to use who was part of this development
    It also must be able to index selected domain extentions

    Any information on pre built search engines or anyone interested in helping me build this are more than welcome to contact me


    :)
     
    paulmcdonald, May 7, 2006 IP
  18. Phynder

    Phynder Well-Known Member

    Messages:
    2,603
    Likes Received:
    145
    Best Answers:
    0
    Trophy Points:
    178
    #18
    I was interested till I saw this part of the post. Golly, I would be willing to setup a nice little linux box if there were enough people interested.

    But, the reality is... What will YASE (yet another search engine) really do? What is the business model? Or is this a "fun" project? If it is a fun project, then I am all for it.
     
    Phynder, May 7, 2006 IP
  19. Nick_Mayhem

    Nick_Mayhem Notable Member

    Messages:
    3,486
    Likes Received:
    338
    Best Answers:
    0
    Trophy Points:
    290
    #19
    If you are thinking about users to submit a site then it is not called search engine. It is called Directory and then in it the users surf and rate the things. If you want it to run on windows and want to go with ASP then I can suggest UUDir which is also developed by a DPer here. Or if you want to go with PHP then you can get esyndicat which is free one.

    It has that alexa and Page rank type of thing in it.

    But If you want registered members to rate and review then there will be some custom coding involved in it. As for approving and other features. that can be done easily.

    are you interested commercially? If yes then you can PM me I can develop this kind of things.
     
    Nick_Mayhem, May 7, 2006 IP
  20. Phynder

    Phynder Well-Known Member

    Messages:
    2,603
    Likes Received:
    145
    Best Answers:
    0
    Trophy Points:
    178
    #20
    I think everyone at DP already has their own directory. I am not sure just because people submit web sites, that makes it a directory - help me to understand that distinction.
     
    Phynder, May 8, 2006 IP