Digital Point Forums
Money Transfer

Go Back   Digital Point Forums > Search Engines > Google
Google Analytics
Log In to view
your analytics

Reply
 
Thread Tools
  #1  
Old Jul 25th 2008, 4:32 pm
webtarded webtarded is offline
Champion of the Naaru
 
Join Date: Feb 2008
Posts: 199
webtarded is on a distinguished road
Thumbs up Cool article about the scope of google's web crawling. Over 1 trillion links a day

Not sure if it's been posted yet, but definitely worth reading.

http://news.cnet.com/8301-1023_3-9999814-93.html

Quote:
July 25, 2008 1:21 PM PDT
Google reveals scope of Web-crawling task
Posted by Stephen Shankland

It's a pity the National Security Agency can't talk about its computational challenges, because it's leaving a lot of the boasting rights to Google.

(Credit: Paul Ford)

In a blog posting on Friday the company shared some detail about the challenges of one aspect of its search operation, the Web indexing and processing that must take place before the results are delivered to users. The short version: Google has no choice but to think big.

First comes surfing. "We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links," said software engineers Jesse Alpert and Nissan Hajaj. "Even after removing...exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day."

Next comes analyzing the "link graph"--the mathematical representation of what links to what. That's a key foundation of Google's PageRank algorithm, which brought the company's search engine to prominence by assigning importance to those pages that other important pages point toward.

In the early days of Google, computing PageRank for the company's collection of a mere 26 million pages took a workstation "a couple hours," and the results would be used for some unspecified period of time. Today, Google surfs the Web continuously and recalculates the link graph "several times per day."

"This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections," the engineers said.

Google likes to talk about how users have choice and competition just one click away, and that's a fair point. But the blog post also makes it even clearer just how high barriers to entry are in the search market. That's one of the reasons Yahoo's BOSS (build your own search service) program is intriguing: it lets search start-ups take advantage of Yahoo's crawling, indexing, and search technology in exchange for advertising or revenue-sharing partnerships.
Reply With Quote
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What if I exceed google's limit of 30 links per day? louiecorp Link Development 30 Jul 3rd 2009 6:29 pm
Anyone use Google's AJAX Api for translations...pretty cool detz Programming 1 Mar 21st 2008 11:19 am
How to increase Google's crawling frequency brainpulse Search Engine Optimization 12 Jan 20th 2007 11:59 am
Cool article about inbound links Annie7 Link Development 7 Sep 27th 2006 7:03 am
Good magazines which might publish an article about (cool) web site adamovic General Marketing 0 Sep 4th 2006 11:51 am


All times are GMT -8. The time now is 12:17 pm.