How do you create a search engine?

Discussion in 'All Other Search Engines' started by TriKri, Jul 9, 2007.

  1. #1
    How do you create a search engine? How do they work? Which programming languages do you have to use? I suppose you have to use some kind of database like MySQL besides the programming language(s).

    What is a web crawler? What's the difference between a web crawler and a search engine?

    Thanks
     
    TriKri, Jul 9, 2007 IP
    ajitjc likes this.
  2. Soccerplayur012

    Soccerplayur012 Peon

    Messages:
    1,291
    Likes Received:
    55
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Lots and lots of PHP I'd assume and an algorithm for searches.

    web crawler looks at your sites and indexes them (I believe)
     
    Soccerplayur012, Jul 9, 2007 IP
  3. LinkBliss

    LinkBliss Peon

    Messages:
    697
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Depending on your needs, you might use one of the "Search Your Site" features from Google or Yahoo.

    What do you want this for?

    There is no requirement on what language to use - or what database (e.g. my MySQL) - or whatever. You could use anything.

    I think Google uses mostly Python and MySQL, but I'm certain Microsoft does not use either of these technologies to power http://search.msn.com :)

    A web crawler is the software piece that goes out and finds data on the internet -- and feeds all the data it finds to a database. The search engine would be commonly thought of as the Interface that queries the database and returns the results, although technically the webcrawler could be considered one element of the whole search engine system (and call the interface just the interface).

    I'm no expert in this, but if you want to find an existing free package for implementing a search engine, I would just search for "search engine software" in Google and you can see the result --

    http://www.google.com/search?q=search+engine+software

    I've heard of ht://Dig - that might be the way to go. Also I recognize a lot of other things that come up both within the organic and paid search results.

    In any case, tell us more about what want to do, that might help us make a recommendation.... do you want to set up a little competitor to Ask, Yahoo Search, etc. and offer up the whole web ? - or just want to crawl and set up a search engine for an intranet or small site or set of sites?

    best
    Eric
     
    LinkBliss, Jul 9, 2007 IP
  4. TriKri

    TriKri Peon

    Messages:
    19
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks for the answers!

    LinkBliss: I am just intrested in how a search engine works. How to crawl the web, what information you should store, is it some combination of different kind of information that is intresting, etc. Maybe I will learn something useful! However, different technologies is alway intresting to learn and read about.
     
    TriKri, Jul 10, 2007 IP
  5. Linknz

    Linknz Peon

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Hi well it took three years to create my New Zealand based search engine, in terms of code we used mostly php with a few files written in C++
    We do use a large mysql database (presently around sixty gig in size)
    The webspider or crawler is the part of the search engine that visits and then indexes the websites into the database which is then sorted by the other software that forms the search engine.
     
    Linknz, Jul 11, 2007 IP
  6. androomidaa

    androomidaa Well-Known Member

    Messages:
    649
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    110
    #6
    thanks for the info. answered my questions indeed.
     
    androomidaa, Jul 12, 2007 IP
  7. designerz

    designerz Banned

    Messages:
    669
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #7
    bunch of readymade scripts are available....you can try any one...
     
    designerz, Jul 19, 2007 IP
  8. webguy84

    webguy84 Well-Known Member

    Messages:
    815
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    128
    #8
    Can anyone recommend a good free or paid search engine script?
     
    webguy84, Jul 20, 2007 IP
  9. ala101

    ala101 Well-Known Member

    Messages:
    863
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    110
    #9
    is there any books or tutorials that teaches how to code a search engine ?
    iam interested in creating my own search engine from scratch because i believe i can implement better algorithms than the existed ones but i need something to start from ..
     
    ala101, Jul 20, 2007 IP
  10. MattEvers

    MattEvers Notable Member

    Messages:
    1,792
    Likes Received:
    137
    Best Answers:
    0
    Trophy Points:
    260
    #10
    I would start with the different parts. First make the software, then the spider, etc.

    There is no guideline since this is a major undertaking.

    That being said, if you need a guideline, you probably aren't ready to do such a project.
     
    MattEvers, Jul 20, 2007 IP
  11. seospider

    seospider Peon

    Messages:
    368
    Likes Received:
    39
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Well this may not a good idea unless you have lots of money to invest. Program is not a big deal but as the search engine grows, you will need huge hardwares.
     
    seospider, Jul 20, 2007 IP
  12. slashpix

    slashpix Banned

    Messages:
    158
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Why you you want to create a search engine?

    The market is taught! Your will be successful only if you spend A LOT of money on promotions and have something unique to offer.
     
    slashpix, Jul 20, 2007 IP
  13. slashpix

    slashpix Banned

    Messages:
    158
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    0
    #13
    BTW, sphider is a very good search engine script with it's own crawler and it's free.
     
    slashpix, Jul 20, 2007 IP
  14. codeber

    codeber Peon

    Messages:
    578
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #14
    I've tried a few.

    Didnt do much what I was after I wrote a mini-search engine for specific website types.
    I can tell you they require lots of memory/space etc to make the best possible search engine.

    Most won't be written in php because they are alot more server intensive than c
     
    codeber, Jul 20, 2007 IP
  15. tezza42

    tezza42 Peon

    Messages:
    7
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I have to agree with SEOSPIDER creating a search engine to even come close to the likes of google, yahoo and msn would be expensive.
    Stick to a search engine for a certain niche would be better.
     
    tezza42, Jul 22, 2007 IP
  16. aidantrent

    aidantrent Peon

    Messages:
    19
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Absolutely agreed. I wrote EveKnows myself using a Perl front-end and web crawler, MySQL database, and custom full-text indexer in C. By restricting the niche to adult web sites, I was able to keep the complexity in check (my crawler ignores anything that doesn't looks like an adult gallery... makes spidering so much easier!). I can only imagine how difficult it would be to design and implement a general-purpose search engine that could compete with the likes of Google...

    I would like to point out that the hardware warnings people have been mentioning aren't really a hurdle when you get started. Just make sure you have a dedicated server with plenty of RAM and you should be fine. The important bit is to keep the full-text index living in memory; as long as you can do that, you should be alright. Once you have enough traffic to necessitate setting up a server farm, you'll also be raking in enough advertising revenue that the server costs will be negligible.
     
    aidantrent, Jul 27, 2007 IP
  17. Badlands07

    Badlands07 Well-Known Member

    Messages:
    2,471
    Likes Received:
    100
    Best Answers:
    0
    Trophy Points:
    155
    #17
    If you would like to play around with a pretty good META search script you may want to check out K Search (http://turn-k.net/k-search). I think the basic version is about $60. I have used it on one of my sites at works pretty well (is sometimes a little slow).

    Also, another solution that looks pretty interesting (have not tried it yet myself), but will run a little more (around $250) is available at Chatologica search scripts (http://chatologica.com/site/index.php).
     
    Badlands07, Jul 27, 2007 IP
  18. orbit7693

    orbit7693 Active Member

    Messages:
    306
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    60
    #18
    It would require loads of PHP, and a lot of MySQL. You would also have to have submitters with keywords that people might search. It would basically be a keyword search, not exactly how the PHP and MySQL would all play out but I know for a fact it would be a lot of money to get a custom one done.
     
    orbit7693, Jul 27, 2007 IP
  19. aidantrent

    aidantrent Peon

    Messages:
    19
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #19
    It depends on how you set things up. If you design something that can scale well, then the full-text indexer will be the big part, and it will need to be written in C or another compiled language. If you don't intend the engine to grow too large (say, less than 100k entries), you can just enable MySQL's full-text search feature for the columns with text; that's just a couple of SQL commands, it won't add any complexity to your interface.

    Your interface can be incredibly small. Mine is less than 500 lines of Perl, with another 1,000 lines for the web crawler, and the average search speed is less than a tenth of a second :)
     
    aidantrent, Jul 27, 2007 IP
  20. PowerExtreme

    PowerExtreme Banned

    Messages:
    2,118
    Likes Received:
    75
    Best Answers:
    0
    Trophy Points:
    0
    #20
    i dont think they use my sql
     
    PowerExtreme, Jul 27, 2007 IP