How Search Engines Work?

Discussion in 'All Other Search Engines' started by Nokia999, Oct 24, 2005.

  1. #1
    Do you have any thing to share about Search Engines workings.
    How it crawls website.
    How SE get know about new webservers and after all new websites on that webserver.
    Can we develop our own mini Search Engine.
     
    Nokia999, Oct 24, 2005 IP
  2. AntiSPY

    AntiSPY Peon

    Messages:
    69
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi man,.. you have so much posts but you don't know how it works...
    did you try think a little? :eek:

    > How it crawls website.
    It just get into website and than crawl it :)

    > How SE get know about new webservers and after all new websites on that webserver.
    It knows about new from links which situated at other sites...

    > Can we develop our own mini Search Engine
    You can do it only YOURSELF :) Because for search-engine working it needs only special script (there are even some FREE scripts of web-spiders) and webserver. Plus biiig quantity of trafic of course, because the BASE which create your spider will be BIG !

    So as you see it is not too difficult.
    Try search Google with kw'ds like this or this...

    Also there are some ebooks which you can read...
    Well, just look around, friend ;)
    And you'll see the truth !

    ps. sorry my eng...
     
    AntiSPY, Oct 24, 2005 IP
  3. Blogmaster

    Blogmaster Blood Type Dating Affiliate Manager

    Messages:
    25,924
    Likes Received:
    1,354
    Best Answers:
    0
    Trophy Points:
    380
    #3
    Search engines prefer good neighborhoods, the spiders follow them and find other good neighborhoods. It's really a cool process. Something that inspires me. There are a few free search engine scripts out there, but none really good.
     
    Blogmaster, Oct 24, 2005 IP
  4. SEO Jeff

    SEO Jeff Active Member

    Messages:
    534
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    90
    #4
    The best search engines are either paid software like dtSearch or custom built. If you want to custom build a spider to insert content into a SQL Server 2005 database to be processed then you can use .NET's Internet Classes. Very good if you ask me. I've built mini-spiders to do automated task like Relevency / Link checking.
     
    SEO Jeff, Oct 24, 2005 IP
  5. frankm

    frankm Active Member

    Messages:
    915
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    83
    #5
    SE's are made up of 3 things

    1) spider (get data from the net)
    2) database (stores the spidered data)
    3) algorithms (what is #1 for a certain keyword/phrase)

    in the early days (excite/lycos/av) the spider made the difference: faster gathering of data, better results. later on, the database (how much can you actually (av/hotbot/google)) and now it's just about algos. msn, yahoo and google are all capable of storing huge amounts of data, spidering them a lot etc.
     
    frankm, Oct 24, 2005 IP
  6. neha_patelx9x

    neha_patelx9x Peon

    Messages:
    59
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    wow ... pretty interesting :)
     
    neha_patelx9x, Oct 24, 2005 IP
  7. SEO Jeff

    SEO Jeff Active Member

    Messages:
    534
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    90
    #7
    Very good way of putting it Frank.

     
    SEO Jeff, Oct 24, 2005 IP
  8. Nokia999

    Nokia999 Guest

    Messages:
    1,488
    Likes Received:
    74
    Best Answers:
    0
    Trophy Points:
    0
    #8
    This is high profile secret.
    Nobody knows Nobody will tell yuo :D.
    But if you are asking about script then you will find number of useless ones.
     
    Nokia999, Oct 27, 2005 IP
  9. relixx

    relixx Active Member

    Messages:
    946
    Likes Received:
    54
    Best Answers:
    0
    Trophy Points:
    70
    #9
    It requests a page from a server, it then scans the code for whatever it's told to scan for, and when it finds another link, it requests that page from the server, etc, etc, etc.

    Large Search Engines with billions of page in their indexes don't (can't!) rely on things like hard drives and databases (eg, mySQL, PostgreSQL) as they are too slow. They store the majority of the info in memory (as RAM is faster than hardrives) and run the search queries in real-time. That is why they can return results for millions of search queries simultaneously. Everything is in memory.
     
    relixx, Oct 27, 2005 IP
  10. WhatiFind

    WhatiFind offline

    Messages:
    1,789
    Likes Received:
    257
    Best Answers:
    0
    Trophy Points:
    180
    #10
    WhatiFind, Oct 27, 2005 IP
  11. SEO Jeff

    SEO Jeff Active Member

    Messages:
    534
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    90
    #11
    I develop a spider class in .NET that holds all the coding stuff like first URL to crawl, HTML Parsing, Filter stuff, etc. Then I use that class in all sorts of things like a Windows Service that can run all the time and use it also in the Search Engine administration program that I also write. That's how I develop search engines.
     
    SEO Jeff, Oct 27, 2005 IP
  12. anjali_ny2005

    anjali_ny2005 Peon

    Messages:
    28
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    wow it's interesting
     
    anjali_ny2005, Oct 27, 2005 IP
  13. Nokia999

    Nokia999 Guest

    Messages:
    1,488
    Likes Received:
    74
    Best Answers:
    0
    Trophy Points:
    0
    #13
    I remember my teacher told me about SE workings.In short i can tell you Search Engines heavily involve usage of Data Structures especially Link Lists.
     
    Nokia999, Nov 1, 2005 IP
  14. mika

    mika Active Member

    Messages:
    136
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    58
    #14
    I thought about how to get new websites added that are actually unknown and don't have any links yet. You might just set up a DNS server or get access to one and just get all the new domains that are being registered. On the same way you might keep your database cleaner by deleting the entries that are no longer available on the DNS server.
     
    mika, Nov 4, 2005 IP
  15. mytechlab

    mytechlab Peon

    Messages:
    109
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #15
    your site is ok and simple.. maybe you want to change the phpbb logo with your own one.
     
    mytechlab, Nov 4, 2005 IP
  16. Mia

    Mia R.I.P. STEVE JOBS

    Messages:
    23,694
    Likes Received:
    1,167
    Best Answers:
    0
    Trophy Points:
    440
    #16
    Mia, Nov 4, 2005 IP
  17. Tonystreet

    Tonystreet Peon

    Messages:
    8
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Pagerankings are huge - you can find the pagerank of your web site at many web sites. I personally use pagerank.net. The higher your pagerank, the higher your search appears on engines such as Google.
     
    Tonystreet, Nov 6, 2005 IP
  18. Blogmaster

    Blogmaster Blood Type Dating Affiliate Manager

    Messages:
    25,924
    Likes Received:
    1,354
    Best Answers:
    0
    Trophy Points:
    380
    #18
    Not so. Has something to do with it, but not a lot.
     
    Blogmaster, Nov 6, 2005 IP
  19. AntiSPY

    AntiSPY Peon

    Messages:
    69
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Tonystreet
    High PageRank - high speed of indexing..
    Influence PR on a place in SERP is not such strong as you say.
     
    AntiSPY, Nov 9, 2005 IP
  20. runarb

    runarb Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Not true for most large internett search engines.

    Most search engines uses inverted indexes. For example if you have these 4 documents:

    1: "i love you"
    2: "god is love"
    3: "love is blind"
    4: "blind justice"

    One can create a index where you have all the words in the documents and in which documents they occur, like this:

    blind          3,4
    god            2
    is             2,3
    justice        4
    love           1,2,3
    i              1
    you            1
    Code (markup):

    To find in which of those documents have the word "love" in it, you can now read the index and se that it is documents 1, 2 and 3. This is easy end fast.

    Reading the index from disk is fast because it only require one disk seek.

    To scale up one then uses many search nodes in parallel that each contain a portion of all the pages one have. Then to answer a query one sends the query to all the nodes. The nodes response with there best pages. Then all the results are merged to gather, and 10 or sow pages is shown to the user.
     
    runarb, Nov 12, 2005 IP