How Search Engines Work?

Nokia999 Guest

Messages:: 1,488

Likes Received:: 74

Best Answers:: 0

Trophy Points:: 0

#1

Do you have any thing to share about Search Engines workings.
How it crawls website.
How SE get know about new webservers and after all new websites on that webserver.
Can we develop our own mini Search Engine.

Nokia999, Oct 24, 2005 IP

AntiSPY Peon

Messages:: 69

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#2

Hi man,.. you have so much posts but you don't know how it works...
did you try think a little?

> How it crawls website.
It just get into website and than crawl it

> How SE get know about new webservers and after all new websites on that webserver.
It knows about new from links which situated at other sites...

> Can we develop our own mini Search Engine
You can do it only YOURSELF Because for search-engine working it needs only special script (there are even some FREE scripts of web-spiders) and webserver. Plus biiig quantity of trafic of course, because the BASE which create your spider will be BIG !

So as you see it is not too difficult.
Try search Google with kw'ds like this or this...

Also there are some ebooks which you can read...
Well, just look around, friend
And you'll see the truth !

ps. sorry my eng...

AntiSPY, Oct 24, 2005 IP

Blogmaster Blood Type Dating Affiliate Manager

Messages:: 25,924

Likes Received:: 1,354

Best Answers:: 0

Trophy Points:: 380

#3

Search engines prefer good neighborhoods, the spiders follow them and find other good neighborhoods. It's really a cool process. Something that inspires me. There are a few free search engine scripts out there, but none really good.

Blogmaster, Oct 24, 2005 IP

SEO Jeff Active Member

Messages:: 534

Likes Received:: 17

Best Answers:: 0

Trophy Points:: 90

#4

The best search engines are either paid software like dtSearch or custom built. If you want to custom build a spider to insert content into a SQL Server 2005 database to be processed then you can use .NET's Internet Classes. Very good if you ask me. I've built mini-spiders to do automated task like Relevency / Link checking.

SEO Jeff, Oct 24, 2005 IP

frankm Active Member

Messages:: 915

Likes Received:: 63

Best Answers:: 0

Trophy Points:: 83

#5

SE's are made up of 3 things

1) spider (get data from the net)
2) database (stores the spidered data)
3) algorithms (what is #1 for a certain keyword/phrase)

in the early days (excite/lycos/av) the spider made the difference: faster gathering of data, better results. later on, the database (how much can you actually (av/hotbot/google)) and now it's just about algos. msn, yahoo and google are all capable of storing huge amounts of data, spidering them a lot etc.

frankm, Oct 24, 2005 IP

neha_patelx9x Peon

Messages:: 59

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#6

wow ... pretty interesting

neha_patelx9x, Oct 24, 2005 IP

SEO Jeff Active Member

Messages:: 534

Likes Received:: 17

Best Answers:: 0

Trophy Points:: 90

#7

Very good way of putting it Frank.

frankm said:

SE's are made up of 3 things

1) spider (get data from the net)
2) database (stores the spidered data)
3) algorithms (what is #1 for a certain keyword/phrase)

in the early days (excite/lycos/av) the spider made the difference: faster gathering of data, better results. later on, the database (how much can you actually (av/hotbot/google)) and now it's just about algos. msn, yahoo and google are all capable of storing huge amounts of data, spidering them a lot etc.
Click to expand...

SEO Jeff, Oct 24, 2005 IP

Nokia999 Guest

Messages:: 1,488

Likes Received:: 74

Best Answers:: 0

Trophy Points:: 0

#8

This is high profile secret.
Nobody knows Nobody will tell yuo .
But if you are asking about script then you will find number of useless ones.

Nokia999, Oct 27, 2005 IP

relixx Active Member

Messages:: 946

Likes Received:: 54

Best Answers:: 0

Trophy Points:: 70

#9

Nokia999 said:

How it crawls website.
Click to expand...

It requests a page from a server, it then scans the code for whatever it's told to scan for, and when it finds another link, it requests that page from the server, etc, etc, etc.

Large Search Engines with billions of page in their indexes don't (can't!) rely on things like hard drives and databases (eg, mySQL, PostgreSQL) as they are too slow. They store the majority of the info in memory (as RAM is faster than hardrives) and run the search queries in real-time. That is why they can return results for millions of search queries simultaneously. Everything is in memory.

relixx, Oct 27, 2005 IP

WhatiFind offline

Messages:: 1,789

Likes Received:: 257

Best Answers:: 0

Trophy Points:: 180

#10

How do search engines work.. http://www.webmarketweek.com/article/46.php
I've used a script to create a site search for a website of mine. It should also be a good search engine. The script will spider the websites in your database every given time, has great ranking search function. (weight to a site just like any other search engine) http://www.isearchthenet.com/isearch/

WhatiFind, Oct 27, 2005 IP

SEO Jeff Active Member

Messages:: 534

Likes Received:: 17

Best Answers:: 0

Trophy Points:: 90

#11

I develop a spider class in .NET that holds all the coding stuff like first URL to crawl, HTML Parsing, Filter stuff, etc. Then I use that class in all sorts of things like a Windows Service that can run all the time and use it also in the Search Engine administration program that I also write. That's how I develop search engines.

SEO Jeff, Oct 27, 2005 IP

anjali_ny2005 Peon

Messages:: 28

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#12

wow it's interesting

anjali_ny2005, Oct 27, 2005 IP

Nokia999 Guest

Messages:: 1,488

Likes Received:: 74

Best Answers:: 0

Trophy Points:: 0

#13

I remember my teacher told me about SE workings.In short i can tell you Search Engines heavily involve usage of Data Structures especially Link Lists.

Nokia999, Nov 1, 2005 IP

mika Active Member

Messages:: 136

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 58

#14

I thought about how to get new websites added that are actually unknown and don't have any links yet. You might just set up a DNS server or get access to one and just get all the new domains that are being registered. On the same way you might keep your database cleaner by deleting the entries that are no longer available on the DNS server.

mika, Nov 4, 2005 IP

mytechlab Peon

Messages:: 109

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#15

your site is ok and simple.. maybe you want to change the phpbb logo with your own one.

mytechlab, Nov 4, 2005 IP

Mia R.I.P. STEVE JOBS

Messages:: 23,694

Likes Received:: 1,167

Best Answers:: 0

Trophy Points:: 440

#16

Nokia999 said:

Do you have any thing to share about Search Engines workings.
How it crawls website.
How SE get know about new webservers and after all new websites on that webserver.
Can we develop our own mini Search Engine.
Click to expand...

I kinda like this link:

http://www.google.com/search?q=how+...ient=firefox-a&rls=org.mozilla:en-US:official

Mia, Nov 4, 2005 IP

Tonystreet Peon

Messages:: 8

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#17

Pagerankings are huge - you can find the pagerank of your web site at many web sites. I personally use pagerank.net. The higher your pagerank, the higher your search appears on engines such as Google.

Tonystreet, Nov 6, 2005 IP

Blogmaster Blood Type Dating Affiliate Manager

Messages:: 25,924

Likes Received:: 1,354

Best Answers:: 0

Trophy Points:: 380

#18

Tonystreet said:

The higher your pagerank, the higher your search appears on engines such as Google.
Click to expand...

Not so. Has something to do with it, but not a lot.

Blogmaster, Nov 6, 2005 IP

AntiSPY Peon

Messages:: 69

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#19

Tonystreet
High PageRank - high speed of indexing..
Influence PR on a place in SERP is not such strong as you say.

AntiSPY, Nov 9, 2005 IP

runarb Peon

Messages:: 1

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#20

They store the majority of the info in memory (as RAM is faster than hardrives) and run the search queries in real-time. That is why they can return results for millions of search queries simultaneously. Everything is in memory.
Click to expand...

Not true for most large internett search engines.

Most search engines uses inverted indexes. For example if you have these 4 documents:

1: "i love you"
2: "god is love"
3: "love is blind"
4: "blind justice"

One can create a index where you have all the words in the documents and in which documents they occur, like this:
blind          3,4
god            2
is             2,3
justice        4
love           1,2,3
i              1
you            1
Code (markup):
To find in which of those documents have the word "love" in it, you can now read the index and se that it is documents 1, 2 and 3. This is easy end fast.

Reading the index from disk is fast because it only require one disk seek.

To scale up one then uses many search nodes in parallel that each contain a portion of all the pages one have. Then to answer a query one sends the query to all the nodes. The nodes response with there best pages. Then all the results are merged to gather, and 10 or sow pages is shown to the user.

runarb, Nov 12, 2005 IP

Log in or Sign up

How Search Engines Work?

Nokia999 Guest

AntiSPY Peon

Blogmaster Blood Type Dating Affiliate Manager

SEO Jeff Active Member

frankm Active Member

neha_patelx9x Peon

SEO Jeff Active Member

Nokia999 Guest

relixx Active Member

WhatiFind offline

SEO Jeff Active Member

anjali_ny2005 Peon

Nokia999 Guest

mika Active Member

mytechlab Peon

Mia R.I.P. STEVE JOBS

Tonystreet Peon

Blogmaster Blood Type Dating Affiliate Manager

AntiSPY Peon

runarb Peon

Useful Searches