How do you create a search engine? How do they work? Which programming languages do you have to use? I suppose you have to use some kind of database like MySQL besides the programming language(s). What is a web crawler? What's the difference between a web crawler and a search engine? Thanks
Lots and lots of PHP I'd assume and an algorithm for searches. web crawler looks at your sites and indexes them (I believe)
Depending on your needs, you might use one of the "Search Your Site" features from Google or Yahoo. What do you want this for? There is no requirement on what language to use - or what database (e.g. my MySQL) - or whatever. You could use anything. I think Google uses mostly Python and MySQL, but I'm certain Microsoft does not use either of these technologies to power http://search.msn.com A web crawler is the software piece that goes out and finds data on the internet -- and feeds all the data it finds to a database. The search engine would be commonly thought of as the Interface that queries the database and returns the results, although technically the webcrawler could be considered one element of the whole search engine system (and call the interface just the interface). I'm no expert in this, but if you want to find an existing free package for implementing a search engine, I would just search for "search engine software" in Google and you can see the result -- http://www.google.com/search?q=search+engine+software I've heard of ht://Dig - that might be the way to go. Also I recognize a lot of other things that come up both within the organic and paid search results. In any case, tell us more about what want to do, that might help us make a recommendation.... do you want to set up a little competitor to Ask, Yahoo Search, etc. and offer up the whole web ? - or just want to crawl and set up a search engine for an intranet or small site or set of sites? best Eric
Thanks for the answers! LinkBliss: I am just intrested in how a search engine works. How to crawl the web, what information you should store, is it some combination of different kind of information that is intresting, etc. Maybe I will learn something useful! However, different technologies is alway intresting to learn and read about.
Hi well it took three years to create my New Zealand based search engine, in terms of code we used mostly php with a few files written in C++ We do use a large mysql database (presently around sixty gig in size) The webspider or crawler is the part of the search engine that visits and then indexes the websites into the database which is then sorted by the other software that forms the search engine.
is there any books or tutorials that teaches how to code a search engine ? iam interested in creating my own search engine from scratch because i believe i can implement better algorithms than the existed ones but i need something to start from ..
I would start with the different parts. First make the software, then the spider, etc. There is no guideline since this is a major undertaking. That being said, if you need a guideline, you probably aren't ready to do such a project.
Well this may not a good idea unless you have lots of money to invest. Program is not a big deal but as the search engine grows, you will need huge hardwares.
Why you you want to create a search engine? The market is taught! Your will be successful only if you spend A LOT of money on promotions and have something unique to offer.
I've tried a few. Didnt do much what I was after I wrote a mini-search engine for specific website types. I can tell you they require lots of memory/space etc to make the best possible search engine. Most won't be written in php because they are alot more server intensive than c
I have to agree with SEOSPIDER creating a search engine to even come close to the likes of google, yahoo and msn would be expensive. Stick to a search engine for a certain niche would be better.
Absolutely agreed. I wrote EveKnows myself using a Perl front-end and web crawler, MySQL database, and custom full-text indexer in C. By restricting the niche to adult web sites, I was able to keep the complexity in check (my crawler ignores anything that doesn't looks like an adult gallery... makes spidering so much easier!). I can only imagine how difficult it would be to design and implement a general-purpose search engine that could compete with the likes of Google... I would like to point out that the hardware warnings people have been mentioning aren't really a hurdle when you get started. Just make sure you have a dedicated server with plenty of RAM and you should be fine. The important bit is to keep the full-text index living in memory; as long as you can do that, you should be alright. Once you have enough traffic to necessitate setting up a server farm, you'll also be raking in enough advertising revenue that the server costs will be negligible.
If you would like to play around with a pretty good META search script you may want to check out K Search (http://turn-k.net/k-search). I think the basic version is about $60. I have used it on one of my sites at works pretty well (is sometimes a little slow). Also, another solution that looks pretty interesting (have not tried it yet myself), but will run a little more (around $250) is available at Chatologica search scripts (http://chatologica.com/site/index.php).
It would require loads of PHP, and a lot of MySQL. You would also have to have submitters with keywords that people might search. It would basically be a keyword search, not exactly how the PHP and MySQL would all play out but I know for a fact it would be a lot of money to get a custom one done.
It depends on how you set things up. If you design something that can scale well, then the full-text indexer will be the big part, and it will need to be written in C or another compiled language. If you don't intend the engine to grow too large (say, less than 100k entries), you can just enable MySQL's full-text search feature for the columns with text; that's just a couple of SQL commands, it won't add any complexity to your interface. Your interface can be incredibly small. Mine is less than 500 lines of Perl, with another 1,000 lines for the web crawler, and the average search speed is less than a tenth of a second