1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Back engineering the Google algorithm...

Discussion in 'Google' started by sparkimarki, Mar 1, 2006.

  1. #1
    Just for a moment, let’s imagine we’re the guys at Google and we have been tasked with coming up with a formula (a new robust Google search algorithm) that will decide which websites are the best in terms of any query entered into our search engine.

    How many factors would we need to consider?

    Size, age, frequency of updates, inbound links, outbound links - are there 10, 20, 50 or more than 100 differing factors?

    And then, when we have a list, what weighting do we apply to each?

    Finally, how do we ensure that no-one can then successfully back engineer our algorithm – do we develop red-herrrings such as PageRank?

    Just a thought (or two) - I would be interested to gather a few opinions.

    If we set out to achieve what Google want to achieve, why couldn't we possibly recreate (or effectively back-engineer) the Google algo?
     
    sparkimarki, Mar 1, 2006 IP
  2. speakerwire

    speakerwire Peon

    Messages:
    61
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Are you looking to create your own ranking algorithm for your own search engine or are you looking to find the factors Google is using in their rankings so that you can better optimize your pages?
     
    speakerwire, Mar 1, 2006 IP
  3. Joobz

    Joobz Peon

    Messages:
    598
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I think the entire premise Google is built on is a conspiracy (theory)

    I wear tinfoil on my head whilst surfing the net just in case
     
    Joobz, Mar 1, 2006 IP
    Jat likes this.
  4. speakerwire

    speakerwire Peon

    Messages:
    61
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #4

    :D That's comedy! What kind of reception do you get? Are you plugged in for Wi-Fi?
     
    speakerwire, Mar 1, 2006 IP
  5. miles

    miles Well-Known Member

    Messages:
    154
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    103
    #5
    All I know is I would love to be a fly on the wall in the lab where this type of research actually does happen. The funniest part is you know these guys surf the same seo forums/blogs we do and laugh at all our squirming ;)
     
    miles, Mar 1, 2006 IP
  6. TheNetCode

    TheNetCode Peon

    Messages:
    1,703
    Likes Received:
    48
    Best Answers:
    0
    Trophy Points:
    0
    #6
    LOL yea you never know they may be listening. In that movie signs it worked right. :rolleyes:
     
    TheNetCode, Mar 1, 2006 IP
  7. speakerwire

    speakerwire Peon

    Messages:
    61
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #7
    After re-reading the original post, it got me thinking... and that is dangerous.

    Assuming that the purpose of such a list is NOT to necessarily reverse-engineer, which I don't think is possible anyway, but to better optimize your sites so that Google can index and rank them into the appropriate topics... here are a few things I can think of.

    Whether or not they are looking at or weighting all of these things remains a question but there are potential elements that they could look at. Feel free to add to it if I've missed any.

    On-Page Factors:

    1) Page Title:
    a) Keyword density
    b) Keyword position (is it the first word)?
    c) Keyword relevance? Is the keyword(s) relevant to the rest of the content on the page or the rest of the site.

    2) Meta Keywords:
    a) Keyword density?
    b) Related topics?
    c) Overstuffing of keywords?

    3) Meta Description:
    a) Keyword density?
    b) Related topics?
    c) Related to page and site content.
    d) Is the description a complete thought or sentence or is it a bunch of words much like the Meta Keywords?

    4) H1,H2, etc.: May not be looked at as much anymore but it probably still has some weight. Where these elements are placed within the code may be important.
    a) Is it towards the top of the page or burried at the bottom?
    b) Is there a clear hiearchy structure of nested H tags that might signify importance of content subjects?

    5) Comment Tags: I doubt they are looked at but it is a factor nonetheless.

    6) Summary, Title, and Alt tags: Summary are normally in tables, title tags are normally in links (<a href= ), and Alt tags are usually within images. All are used to describe elements but it has also been used to stuff keywords and who knows if this is seen as a positive or negative in the eyes of Google.

    7) Content/Text:
    a) Is the content of the page located within a main element (like a complete article) or is it broken up (like short descriptions found on web results or news clippings)?
    b) Is the content related to the keywords found in the Title, Meta and other tags?
    c) Is the content considered unique or is it duplicated from somewhere else?
    d) Is the content legible or is it gibberish (like just a random set of words)?

    8) Links: The tough part here is the big question of: "are all links created equal?" If search engines are attempting to place a value on one link over another, here are a few things they may be looking at...

    a) Are the links found within a Navigation or Footer element on a page?
    b) Are the links internal to the domain or do they leave the site?
    c) Are the links leaving the site to an ad network?
    d) Are the links leaving the site via a text link or image link?
    e) Are the links leaving the site to another related site?
    f) Are the links located within the content or do they stand alone?
    g) Do the links have any "no-follow" properties?
    h) Are the links using javascript?
    i) How old are the links?
    j) Are you linking to a "bad neighborhood"? You can control who you link to and linking to a site that Google has banned or thinks is low quality might affect their view of your site.

    9) Page Rank:
    a) What is the page rank of the specific page?
    b) How long has that page been indexed?
    c) What is the page rank of the site's home page? Not sure if this is a major factor but it could have a general "value" for the entire site that then passes to any sub-page located within that site. Meaning, would a page on CNN rank higher just because it is part of CNN? I don't know but it is a good question.

    10) Refresh:
    a) Is the content on the page updated?
    b) How often does it update and how much of it updates?
    c) Does it update TOO often (like every page load)?


    Off-Page Factors:

    1) Domains:
    a) Is the optimized keyword located within the domain name?
    b) Does the domain have dashes in it? I'm sure they are smart enough to know that people stuff keywords into domains with dashes. The question is whether it has a positive or negative effect.
    c) Is the page being ranked on an "A-Name" or root level domain? (aname. domain.com) or (www. domain.com)

    2) Links: Many people believe that links are extremely important but there may be a lot of things they look at to determine the value of an inbound link.

    a) Does the inbound link come from a different Domain?
    b) Does the inbound link come from an image or text?
    c) Does the inbound link have the optimized keyword in the link text?
    d) Does the inbound link come from another site that is about your topic (or related topic)?
    e) Does the inbound link come from another page that is about your topic?
    f) Do the majority of inbound links go to the homepage or are they deep linked into sub-pages of your site?
    g) What is the Page Rank of the page that the link comes from?
    h) Does the link come from a site on a different IP address?
    i) Does the link come from a site on a different A, B, or C block?
    j) Does the link come from a site on an IP address registered by a different Hosting Provider?
    k) How old is the site you got the link from?
    l) How old is the page you got the link from?
    m) How old is the link itself?
    n) What is the keyword density of all inbound links; is it skewed to a single keyword? If it is skewed, is the link text the name of your domain which might be considered natural?
    o) Does the link come from a "bad neighborhood"? Most believe that because you can't control who links to you, you can't be penalized for "bad" links. That does make sense but again... you never know.
    p) Are your links ROS (run of site) links from another site?
    q) Are your links reciprocal links? (you link to them and they link to you)

    3) Registration:
    a) How long ago was the domain registered?
    b) For how many years IS the domain registered?
    c) Has the domain changed ownership, if so when and have there been any major changes to the site?
    d) What is the domain name server that the domain is currently using?

    4) Indexing:
    a) How many total pages have been indexed?
    b) What is the growth rate of the site from 1st, 2nd indexing etc.? Has it stopped growing? Does the site appear dead?
    c) How old is the site from first indexing (as opposed to registration date).

    5) Directories: Is the site in DMOZ or another directory. I don't know if they look at this anymore but who knows.

    6) Clicks: There has been debate as to whether the CTR in the search results itself can move you up the rankings. It may stand to reason that the more sites are clicked the more Google sees them as relevant and therefore should be ranked higher but there is NO PROOF of this that I know of.




    I don't know if I've covered everything but I didn't my best from what was on the top of my mind. As stated at top, I'm assuming this is used to help people better optimize their sites for Google to rank them in the appropriate categories. It may also help people so they know what NOT to do. I guess we don't have any idea what they REALLY look at but these factors above would probably be a good number of possible variables.
     
    speakerwire, Mar 1, 2006 IP
    Option6, sligowaths, Jat and 2 others like this.
  8. Caydel

    Caydel Peon

    Messages:
    835
    Likes Received:
    47
    Best Answers:
    0
    Trophy Points:
    0
    #8
    speakerwire - good job. I can't think of any that aren't on that list. But, it does do a great job of summarizing everything we know about google.

    Great Post!
     
    Caydel, Mar 1, 2006 IP
  9. longcall911

    longcall911 Peon

    Messages:
    1,672
    Likes Received:
    87
    Best Answers:
    0
    Trophy Points:
    0
    #9
    We delay by 2 or 3 months any likely change in a page's doc score that might result from changes to the page that are within the web developer's control.

    In that way, we make it nearly impossible to tie any one effect, to any one cause.


    /*tom*/
     
    longcall911, Mar 1, 2006 IP
  10. lorien1973

    lorien1973 Notable Member

    Messages:
    12,206
    Likes Received:
    601
    Best Answers:
    0
    Trophy Points:
    260
    #10
    Did someone say tinfoil hat?!??!

    [​IMG]
     
    lorien1973, Mar 1, 2006 IP
    classifieds and Las Vegas Homes like this.
  11. dilipsam

    dilipsam Well-Known Member

    Messages:
    606
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    135
    #11
    LAst year, my friend attended an interview at Google. He came back home, disillusioned and shocked. Actually, Google was looking for people who know little about SEO but are not SEOs. He was asked how many factors does Google rely on when ranking a particular web page. Our man confidently answered "Around 100 factors". They said the number was not even close. In fact it was 600 factors.

    600 factors!!! Would you believe that?????

    Since last year (well atleast now and then) I have been beating my brains about these 600 factors. Maybe the design and the codes contribute to them too.

    Regards,
    Dilip Samuel
     
    dilipsam, Mar 1, 2006 IP
  12. sparkimarki

    sparkimarki Peon

    Messages:
    105
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Fantastic post speakerwire - I counted about 29 - if it really is 600 as suggested by dilipsam what could the others be and then how we we choose to weight them?

    And would it work?
     
    sparkimarki, Mar 1, 2006 IP
  13. Brad Callen

    Brad Callen Peon

    Messages:
    854
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    0
    #13
    It makes you wonder if they have a group of people just sitting in a room thinking of ways to rank a site.

    "ooh ooh I got number 601, lets rank a site on if it shows the weather which was updated 2 days ago just before midnight"

    lol

    But seriously they must have some sort of team that just do that sort of thing.

    They could be called. "Search Squad - Team Google"

    lmao

    Brad
     
    Brad Callen, Mar 1, 2006 IP
  14. FujitsuBoy

    FujitsuBoy Guest

    Messages:
    54
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #14
    :D Haha

    Excellent list speakerwire, I can think of only a couple more:

    • Number of times added to bookmarks (GToolbar)
    • Number of times removed from search results (Personal Homepage)
    • Has the URL been mentioned in Gmail messages
    • Number of people subscribed to RSS feed (Gmail)
    • Is it linked to from a well known "topic authority" (basically a directory)
    • Possibly language, foul language and spelling
     
    FujitsuBoy, Mar 2, 2006 IP
  15. speakerwire

    speakerwire Peon

    Messages:
    61
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I think many of the ones I listed were "high level" factors. Each one could be broken down into smaller, more defined variables but I didn't think people would want to read a 5 page post. Even still I'd only be able to come up with 150 at most. 600 is a lot but if I let my mind really wander, I guess I could get pretty out there. For example, not just the topic, density, alternate topics, content, and page rank of the site that is linking to you, but also the sites that link to the sites that link to you and age variables for each of those. I suppose you COULD get pretty nuts with it.
     
    speakerwire, Mar 2, 2006 IP
  16. rehash

    rehash Well-Known Member

    Messages:
    1,502
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    150
    #16
    Back-engineer usually reffers to things which you run on your own computer so you have access to raw code, which you will eventually decode. But Google runs remotely :D

    And finally when the algo is good enough you can even make it public, because it is so good that it will rank good only useful, natural sites.

    And one more thing, most of algos are recursive. Like site B links to site A. In order to find about site A, you need to know site B thematic and backlinks, so you get to a site C and D and so on. All this require a huge database and huge CPU power which only Google has.
     
    rehash, Mar 2, 2006 IP
  17. dsm56

    dsm56 Active Member

    Messages:
    863
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    78
    #17
    I remember reading somewhere on googles site about PR, saying that when a search query is looked up, google considers around 2 million variables about every site.

    So I doubt it would be easy to back engineer it :)
     
    dsm56, Mar 2, 2006 IP
  18. CharlieHorse

    CharlieHorse Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Not sure I saw:

    Spelling?
    Grammar?
    Page Layout (nav, color, etc)?
    how the content relates to current events?
    Does the page have adsense?
    Does it use adwords?

    Great list speakerwire. A fantastic starting point. OK now who's gonna keep track of all the suggestions? :)
     
    CharlieHorse, Mar 2, 2006 IP
  19. corinaw

    corinaw Not Banned

    Messages:
    486
    Likes Received:
    69
    Best Answers:
    0
    Trophy Points:
    0
    #19
    I've been thinking lately in terms of "trust" in ranking a site.

    Privacy policy available
    Email address or feedback form available
    Physical address listed (not just a PO box)
    Phone number available
    Toll free phone number
     
    corinaw, Mar 2, 2006 IP
  20. Basoone

    Basoone Peon

    Messages:
    544
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Nice post here some of my thoughts:

    Hidden txt exist?
    keyword stuffing exist?
    Cloaking?
     
    Basoone, Mar 2, 2006 IP