DP-ers, Reveal All You Know About Google

Discussion in 'Google' started by MathArt, Mar 6, 2008.

  1. #1
    Hi all,
    I'm very very interested in knowing how Google works. Yes, I know about PageRank and stuff, but I thought I'd bank in on the collective wisdom of all DP forummers in knowing more about Google.

    Here are some key concepts that is affiliated with Google:

    PageRank

    A citation-based ranking system. A PR is actually an assigned code for the probability a random surfer will surf to your site. The random surfer is assumed to be of iid-distribution. It is based on the number of links your site has, and the number of citation your site makes. It makes use of Markov chains.

    MapReduce and BigTable+GFS+BerkleyDB HA
    They're methods for parallel computing and database storing. Normal RDBMS doesn't work for storing huge amounts of data. Cheapskate people (and Yahoo!) will use Hadoop.

    MinHash

    Google's method of clustering and sorting data. Visibly used in Google News, but I suspect it's also used in clustering sites to assign PageRank.

    What else do we know? TrustRank is, AFAIK, untrue. So is the SandBox. The so called 'sandbox' is a result of some variant MinHash + some form of spam fighting tool (probably a Markov chain), I believe. I'm very very curious about knowing more on Google's matching system, and also more about its current generation of search engine, TeraGoogle - how much has been implemented.

    Some people say PR is just for decoration, but I beg to differ. I think PR is vital in SERPs scoring. I want to know what other things are missing from the equation.

    Also, I am interested in how Google is fighting spam. What methods are they using? Hidden Markov models? Bayes? KNN? Neural Nets?

    Now, let's crack the Google code.
     
    MathArt, Mar 6, 2008 IP
  2. astup1didiot

    astup1didiot Notable Member

    Messages:
    5,926
    Likes Received:
    270
    Best Answers:
    0
    Trophy Points:
    280
  3. MathArt

    MathArt Peon

    Messages:
    45
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Read. Not enough Information. Let's get more
     
    MathArt, Mar 6, 2008 IP
  4. The Stealthy One

    The Stealthy One Well-Known Member Affiliate Manager

    Messages:
    3,043
    Likes Received:
    54
    Best Answers:
    0
    Trophy Points:
    105
    #4
    Should we talk about the fact that Google is not a 100% ethical company, or would you rather us leave that part out?
     
    The Stealthy One, Mar 6, 2008 IP
  5. MathArt

    MathArt Peon

    Messages:
    45
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Er... How so inethical?
     
    MathArt, Mar 6, 2008 IP
  6. shylesson

    shylesson Banned

    Messages:
    448
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #6
    How about try reading around the forum? I love posts like this because it basically says to me " guys, i don't wanna read like all ya'll did so give me the Cliff Notes". There is a plethora of information on this forum. Read.
     
    shylesson, Mar 6, 2008 IP
    ing likes this.
  7. MathArt

    MathArt Peon

    Messages:
    45
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I've gone thru most of them. A lot of forummers talk about stuff like TrustRank (yes, I know Google patented it), and that PageRank is just decoration.

    But I've stated my doubts about stuff like TrustRank (which is also a markov chain method to tell if sites are 'good' or 'bad'), and the sandbox (which I personally think is the result of an amalgated function that manages the rest of the PR and SERPs)

    I'm interested in the mechanics behind Google. I'd be grateful if someone can shed some light.
     
    MathArt, Mar 6, 2008 IP
  8. Nystul

    Nystul Well-Known Member

    Messages:
    3,077
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    175
    #8
    i will prefer to use SEOmoz's page strength.
     
    Nystul, Mar 6, 2008 IP
  9. arunkumar2006

    arunkumar2006 Active Member

    Messages:
    443
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    58
    #9
    Google is an advertising company. LOL and no company in this world is %100 ethical. It tends to informational sites coz their major revenue comes from relevant information search.
     
    arunkumar2006, Mar 6, 2008 IP
  10. Australianfranchises

    Australianfranchises Peon

    Messages:
    1,230
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #10
    On what this statement is based on?
    Now you will say on G adwords and Adsense,
    But those advertisement are specific with related search or content.
    G is a search engine and provide related ad not other ads that we don't want to see.
     
    Australianfranchises, Mar 6, 2008 IP
  11. sray

    sray Active Member

    Messages:
    298
    Likes Received:
    29
    Best Answers:
    0
    Trophy Points:
    70
    #11
    sray, Mar 7, 2008 IP
  12. MathArt

    MathArt Peon

    Messages:
    45
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    MathArt, Mar 7, 2008 IP