:::The Core Of How Google Works:::

nohaber Well-Known Member

Messages:: 276

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 138

#41

Well, isn't using PageRank for ordering the URLs intelligent, smart, clever or whatever you call it? Can you invent a smarter algorithm that would work well for a large-scale search engine? If that's so elementary, you might apply for a job at some of the leading search engines

I actually thought you might have some background in it. However based on your answers, I think it is safe to say you probably have no real world experience with spidering applications.
Click to expand...

I have no programming experience with spidering applications, and of course, I've never claimed I've had. I've had enough experience with algorithmically tough applications. I have been in projects with some of the best Bulgarian programmers, including national programming champions. I have learned to respect the capable programmers.

That's one of the main themes of my postings. Respect for the ones at Google and the other engines. People are writing about one of the finest peices of software on the planet as if its something elementary to put up. People with zero programming (let alone algorithms) are claiming to be experts and know what Google does. It's just ridiculous.

Let me finish with a passage from the original paper:
"Also, because of the huge amount of data involved, unexpected things will happen. For example, our system tried to crawl an online game. This resulted in lots of garbage messages in the middle of their game! It turns out this was an easy problem to fix. But this problem had not come up until we had downloaded tens of millions of pages. Because of the immense variation in web pages and servers, it is virtually impossible to test a crawler without running it on large part of the Internet. Invariably, there are hundreds of obscure problems which may only occur on one page out of the whole web and cause the crawler to crash, or worse, cause unpredictable or incorrect behavior. Systems which access large parts of the Internet need to be designed to be very robust and carefully tested. Since large complex systems such as crawlers will invariably cause problems, there needs to be significant resources devoted to reading the email and solving these problems as they come up."

nohaber, Sep 23, 2004 IP

WilliamC Well-Known Member

Messages:: 252

Likes Received:: 27

Best Answers:: 0

Trophy Points:: 118

#42

nohaber said:

Well, isn't using PageRank for ordering the URLs intelligent, smart, clever or whatever you call it? Can you invent a smarter algorithm that would work well for a large-scale search engine? If that's so elementary, you might apply for a job at some of the leading search engines
Click to expand...

You missed my point completely. The spider did not just decide to use PageRank for URL ordering. It's creators did. They are smart yes, as I stated well above, the spider is not.

That's one of the main themes of my postings. Respect for the ones at Google and the other engines. People are writing about one of the finest peices of software on the planet as if its something elementary to put up. People with zero programming (let alone algorithms) are claiming to be experts and know what Google does. It's just ridiculous.
Click to expand...

I agree they are among the best written pieces of software out, never claimed them to be simple, I merely said that the basics were not difficult to understand by any coder with any experience. But by posting things that make the software look intelligent as an AI just adds to the confusion and misconceptions.

WilliamC, Sep 23, 2004 IP

Mel Peon

Messages:: 369

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 0

#43

The question in my mind is does the spider decide where it wants to go next or does it just take its list or seqential URLs to be crawled from the URL server?

Mel, Sep 23, 2004 IP

WilliamC Well-Known Member

Messages:: 252

Likes Received:: 27

Best Answers:: 0

Trophy Points:: 118

#44

My guess would be that the best way to handle it would be to keep dragging its urls from the url server, adding new urls as it finds them as it goes.

WilliamC, Sep 23, 2004 IP

niknakgroup Peon

Messages:: 32

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#45

The link this thread refers to is dead. Somebody unstick this!

niknakgroup, Apr 16, 2005 IP

Bobby Easland Guest

Messages:: 50

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#46

Hey Tony...welcome to the board

Bobby

Bobby Easland, Apr 16, 2005 IP

basicus Peon

Messages:: 148

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#47

I actually found this thread quite interesting..

basicus, Apr 16, 2005 IP

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#48

It's not just the link that's dead...

minstrel, Apr 17, 2005 IP

lorien1973 likes this.

WilliamC Well-Known Member

Messages:: 252

Likes Received:: 27

Best Answers:: 0

Trophy Points:: 118

#49

minstrel said:

It's not just the link that's dead...
Click to expand...

factual, but still crude.

WilliamC, Apr 18, 2005 IP

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#50

williamc said:

minstrel said:

It's not just the link that's dead...
Click to expand...

factual, but still crude.
Click to expand...

...as opposed to your post, which is, as always, pointless but still annoying.

minstrel, Apr 18, 2005 IP

rob777 Peon

Messages:: 70

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#51

About the Duplicate content and Google.

I wish Google would do something about the duplicate content in the search results.

Do a Google search for "new york real estate law" and you will see on the first 5 pages, about half the sites are Identical, but have different domain names. All the titles will say "Real Estate 8". Do a search on "Real Estate 8" and Google has 5,110 results.

Somebody definatly found a way to exploit Google.

They are all link directories that all go to a another link page, etc. The sites are just traffic grabbers, but yet Google is spitting out tons of them in the search results.

rob777, Apr 18, 2005 IP

Blogmaster Blood Type Dating Affiliate Manager

Messages:: 25,924

Likes Received:: 1,354

Best Answers:: 0

Trophy Points:: 380

#52

basicus said:

I actually found this thread quite interesting..
Click to expand...

it is, even though the infamous guy who started it ....

Blogmaster, Apr 18, 2005 IP

Blogmaster Blood Type Dating Affiliate Manager

Messages:: 25,924

Likes Received:: 1,354

Best Answers:: 0

Trophy Points:: 380

#53

I have to agree and disagree with nohaber on a few things though. Expertise can be a very relative and highly overused term. However, not everyone that got into the game early wants to wind up working for a major corporation but rather build something for himself.
I know a lot of people I consider experts. But the majority of those calling themselves that ... no, you're right they are not.

Blogmaster, Apr 18, 2005 IP

basicus Peon

Messages:: 148

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#54

Well. For some of us it can be interesting to read old and dead stuff I guess

basicus, Apr 19, 2005 IP

Blogmaster Blood Type Dating Affiliate Manager

Messages:: 25,924

Likes Received:: 1,354

Best Answers:: 0

Trophy Points:: 380

#55

When you look at old you need to differentiate between out dated and distinguished. This is more the second one.

Blogmaster, Apr 19, 2005 IP

Log in or Sign up

:::The Core Of How Google Works:::

nohaber Well-Known Member

WilliamC Well-Known Member

Mel Peon

WilliamC Well-Known Member

niknakgroup Peon

Bobby Easland Guest

basicus Peon

minstrel Illustrious Member

WilliamC Well-Known Member

minstrel Illustrious Member

rob777 Peon

Blogmaster Blood Type Dating Affiliate Manager

Blogmaster Blood Type Dating Affiliate Manager

basicus Peon

Blogmaster Blood Type Dating Affiliate Manager

Useful Searches