How would you build a search engine?

Discussion in 'Programming' started by Anveto, May 13, 2014.

  1. #1
    I recently wrote a short blog post on how to create a very simple search engine with PHP and MySQL.

    Of course if you were actually making a search engine you would put a lot more thought into what you use to build your search engine and more importantly, how you structure and store your data.

    What are your thoughts on this? What programming language(s) would you use? How would you index and rank websites? How would you rank results? How would you make money off of it?
     
    Anveto, May 13, 2014 IP
  2. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #2
    I would simply find something else to do - making a professional search engine in today's market would be more or less impossible. You'd need either a very specific niche to search in, or a bottomless pit of cash to build this on - as for making money, it would probably be next to impossbile to break even, not to mention make a profit.
     
    PoPSiCLe, May 13, 2014 IP
    matt_62 likes this.
  3. Anveto

    Anveto Well-Known Member

    Messages:
    697
    Likes Received:
    40
    Best Answers:
    19
    Trophy Points:
    195
    #3
    I think you bring up some good points. However I would like to focus more on the programming aspects although I do admit that I did ask about profit.

    I figure you would basically have to profit from very targeted ads or by selling user information, I won't go into wether it is ethical or legal :)
     
    Anveto, May 13, 2014 IP
  4. tylerman169

    tylerman169 Member

    Messages:
    92
    Likes Received:
    0
    Best Answers:
    2
    Trophy Points:
    43
    #4
    I would recommend that you choose a niche for a search engine rather than trying to make the next "Google". Here are some examples of niche search engines you might choose. A recipe Search Engine where you input what ingredients you own and it returns a list of recipes that you can make with those ingredients. Create a search engine that searches for a specified file type such as a MP3, SWF, or PDF search engine. However, many search engines such as the examples I provided already exist so if you want to make money/become popular then you will have to get creative and come up with a useful search engine that has not already been created.

    As for making a search engine, the easy way would be to create a Google Custom Search engine. However if you want to build one from scratch, there are a few basic steps you need to complete:

    1) Choose a web spider (Nutch, Heritrix, GRUB) that is basically an internet bot that you give it a small lists of seed sites and it will start at those sites and visit each hyperlink on the page and keeps on crawling until it reaches the max depth that you specified it to crawl.

    2) Create an index of all the webpages that your crawler crawled

    3) Create a User Interface that allows the user to input a search term that will search through the index you created for the specified term

    However, the challenge of a search engine is to come up with accurate results in a timely manner. This can be quite challenging as popular search engines, such as Google, use a very complex algorithm that is very complex.
     
    tylerman169, May 13, 2014 IP
  5. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #5
    The language aspect is not what you should be looking at. If you want to create the next big thing, you need to have a better algorithm than Google/Msn/Yahoo. Once that is done and you have a proof of concept, you can easily rake in millions in investment and beat Google in the marketplace in no time.

    When Google started it wasn't much of what language they used or how much money they had, but rather how good it worked. This is the thing you should concentrate on, it will be pretty hard to do since the biggest companies (Google/Msn/Yahoo) are constantly updating their algorithm to make it better, but definitely possible if you are smart enough.
     
    ThePHPMaster, May 17, 2014 IP
  6. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #6
    The problem with creating a better/smarter algorithm isn't necessarily the idea - it's the implementation, and the datasources the algorithm is supposed to use - you'll need a big sample to get decent results, and even if you utilize cloud-services, the data-mining and searching will need quite a bit of hardware to run properly.
    Not to mention that the current way of running a search engine is being more or less run by Google and their counterparts. What you should be thinking about is creating a whole new way of searching. Something akin to Wolfram Alpha, perhaps, but more of a human approach - natural language searches, smart data comparison, etc.
     
    PoPSiCLe, May 18, 2014 IP
  7. Anveto

    Anveto Well-Known Member

    Messages:
    697
    Likes Received:
    40
    Best Answers:
    19
    Trophy Points:
    195
    #7
    Thanks for all the great replies.

    It would be interesting to look into concepts that differ from google, I guess image search engines like TinEye (http://www.tineye.com/) is one example, however I would not consider the approach better than Googles.

    When I wrote this I was thinking more along what data you store (titles, meta descriptions, keywords, full source code, etc.) in order to rank search results better but with all the data Google can store it has probably already been done really well. Another thing would be how you store the data, the speeds at which google returns search results is really good for example.
     
    Anveto, May 18, 2014 IP
  8. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #8
    Not to mention the ability to manage multiple data-sources, not necessarily directly linked - for instance, creating a natural search engine able to see patterns and create weighted results based on common language questions would be great - and does not really exist in today's market. Mostly because it's still too complicated to achieve using current technologies. As for niche searches like TinEye, they're good, but their sample amount is too small, and Google's Image Search usually returns better and more results. Which is often the problem - upstarts / alternatives to Google and the other big ones are sometimes better, but suffer under the lack of resources.
     
    PoPSiCLe, May 18, 2014 IP
  9. Alterview

    Alterview Member

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    33
    #9
    I would start with looking at how Google (or any other SE) started out. Try to replicate it and improve in your own way. That makes starting easier then having to think of everything yourself, and you will bend everything to your ideas anyway.

    And, try to start small(ish). Choose one content type to start with (like webpages), images and other content types can be included later. Don't spend too much time on choosing the "right" language. The right language is the one you know the best (if it's suitable ofcourse). You can always switch or improve later!
     
    Alterview, Jun 11, 2014 IP
  10. ETA

    ETA Notable Member

    Messages:
    428
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    250
    #10
    I would do the opposite to Google and the other SE's and try look for a niche.

    Duck Duck Go has recently been getting a lot of attention due to its user privacy feature.

    Maybe you could create a USP within your search engine?
     
    ETA, Jun 11, 2014 IP
  11. Arshid_K_V

    Arshid_K_V Active Member

    Messages:
    37
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    63
    #11
    Search engines are working following steps
    - crawling content from websites
    - Indexing content
    - SERP (Search Engine Rank Pages)

    All search engines are checking back link and contents.

    Some search engines (very very small) are working based on php. You can make search engine , But you need invest more time , powerful algorithm and high bandwidth.
    My Nigerian friend is earning by making search engine. He is build with php and traffic coming from facebook ads.
     
    Arshid_K_V, Jun 13, 2014 IP
  12. NetStar

    NetStar Notable Member

    Messages:
    2,471
    Likes Received:
    541
    Best Answers:
    21
    Trophy Points:
    245
    #12
    I'm sure he owns the only search engine in the world that pays you to use it.... you only have to send back 25% of the check they send you..
     
    NetStar, Jun 17, 2014 IP
  13. InnovaTonic

    InnovaTonic Member

    Messages:
    8
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    48
    #13
    If we are talking about a simple one just for testing how it would work then first thing first I would model it all out and make sure there are separate layers of each thing (MVC or similar) the view of the search engine would be a separate task, then the backend where all of the data is being stored (would need to set up my tables such so that searching is fast and efficient). Then I would go ahead and start working on the algorithms which could be an algorithm engine consisting of multiple algorithms:

    This way I would have the ability to keep adding more and more algorithms because it would be plugged right into the algorithm engine so the extensibility and maintenance would be taken care of like that.

    For the databases I would do a lot of research to see what the best way to organize them in is in the real world since it would be the core of the project. If the databases are not set up correctly I will not have the opportunity to search for anything easily.

    There is many other things I would do but this is the concept of how I would start which I think is the most important thing when starting any project.
     
    InnovaTonic, Jun 28, 2014 IP
  14. kaleelkr

    kaleelkr Active Member

    Messages:
    278
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    95
    #14
    Developing a search engine code is simple. any programmer can develop a search engine function. but, its hard to make a database like google. need lots off effort for that. for a small team it is almost impossible. and also showing good result, google have many idea to show good result like back links and may there more hidden plans in google .
     
    kaleelkr, Jun 28, 2014 IP