Speakerwire - I started to update your list (which is excellent) but made it look a bit of a mess! Please, if you have time, would you update your list with all the other ideas as the compiled list will really make excellent info for others. Thanks to all for some really great responses and any other ideas would be good to add to the list.
Sure, I'll create more defined categories. I have a lot more that I had thought of that are variations of the ones I listed.
This thread was originally discussed as a way to reverse-engineer Google's algorithms. As factors began to be discussed, it became apparent that it doesn't just pertain to Google but all search engines and should therefore be in the SEO forum. The original thread is here: http://forums.digitalpoint.com/showthread.php?t=62060 So, here is the current list that I have compiled. Whether or not all of these ARE factors it is a good exercise to think of and list POSSIBLE factors. I'm sure each of these could be drilled down even more but to keep things somewhat manageable, I tried to think of the major things that could affect the ranking algorithms. The information is too large for a single post so I've broken it up. On-Page Factors: 1) Page Title: 1.1) Keyword Density – how often does the keyword show up? 1.2) Keyword Proximity – what position in the title does it show up? 1.3) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 1.4) Keyword Relevance - is the keyword(s) relevant to the rest of the content on the page or the rest of the site. 1.5) Symbols – are other symbols used to in the title to emphasize or delineate such as a vertical bar or comma? 1.6) Foreign – are foreign characters used? 1.7) Unique Per Page – is the title unique or is it the same on all pages of the site? 1.8) Unique Per Site – is the title, in whole or format, duplicated from another site (specifically another highly ranked site in the same category or topic)? 1.9) Keyword Stuffing – is the title informative or follow a natural site structure or is it simply a bunch of stuffed keywords? 2) Meta Keywords: 2.1) Keyword Density – how often does the keyword show up? 2.2) Keyword Proximity – what position in the Meta Keywords does it show up? 2.3) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 2.4) Keyword Relevance - is the keyword(s) relevant to the rest of the content on the page or the rest of the site. 2.5) Symbols – are other symbols used to in the title to emphasize or delineate such as a vertical bar or comma? 2.6) Foreign – are foreign characters used? 2.7) Unique Per Page – is the Meta Keywords unique or is it the same on all pages of the site? 2.8) Unique Per Site – is the Meta Keywords duplicated from another site (specifically another highly ranked site in the same category or topic)? 2.9) Keyword Count – how many Meta Keywords are there? Over 5? Over 10? With too many keywords, does the page appear too generic and less relevant? 3) Meta Description: 3.1) Keyword Density – how often does the keyword show up? 3.2) Keyword Proximity – what position in the Meta Description does it show up? 3.3) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 3.4) Keyword Relevance - is the keyword(s) relevant to the rest of the content on the page or the rest of the site. 3.5) Symbols – are other symbols used to in the title to emphasize or delineate such as a vertical bar or comma? 3.6) Foreign – are foreign characters used? 3.7) Unique Per Page – is the Meta Description unique or is it the same on all pages of the site? 3.8) Unique Per Site – is the Meta Description duplicated from another site (specifically another highly ranked site in the same category or topic)? 3.9) Keyword Stuffing – is the Meta Description a complete thought or sentence or is it a bunch of stuffed keywords? 4) TAGS 4.1) H1-H6: May not be looked at as much anymore but it probably still has some weight. If you are counting variables, you could multiply this group by 6 (the standard number of H tags). 4.1.1) Proximity - Is the tag towards the top of the page or buried at the bottom? 4.1.2) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 4.1.3) Word Count – How many words are in the H tag? If the purpose is to be a Heading then a reasonable number of words should be expected. If there are 200 words in the H tag, then is it really a heading? 4.1.4) Content – is there content that follows the heading? If you have 4 headings in a row then are they really describing anything? 4.1.5) Relevance – does the heading accurately describe the subsequent content to follow? 4.1.6) Nesting - Is there a clear hierarchy structure of H tags? Does H2 come after H1 and so on? 4.1.7) Format – is the CSS reducing the size and weight of the H tags? If so, how important is that H tag as a heading? 4.2) Bold/Strong Tags: 4.2.1) Keyword – is the keyword bolded? 4.2.2) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 4.2.3) Word Count – how many words are bolded? Like the H tags, if you have 200 words within the bold tags, does it really signify importance? 4.2.4) Legacy – do they signify a difference between Bold and Strong? 4.2.5) Proximity – where in the bolded set does the keyword appear? 4.2.6) Format – is the CSS changing the weight of the bolded tags, thereby making them un-bolded? 4.3) Italic Tags: 4.3.1) Keyword – is the keyword italicized? 4.3.2) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 4.3.3) Word Count – how many words are italicized? 4.3.4) Proximity – where in the italicized set does the keyword appear? 4.3.5) Format – is the CSS shifting the font style back to normal? 4.4) Underline Tags: 4.4.1) Keyword – is the keyword underlined? 4.4.2) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 4.4.3) Word Count – how many words are underlined? 4.4.4) Proximity – where in the underlined set does the keyword appear? 4.4.5) Format – is the CSS shifting the font style back to normal? 4.5) Font Tags: 4.5.1) Keyword – is the keyword within the font tag? 4.5.2) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 4.5.3) Font Parameters – is the weight and size of the font changed rather than using other tags like H or Bold? 4.5.4) Font Color Type – is the font using a standard description (i.e. black) or a hex color (#000000)? 4.5.5) Font Color Relation – is the font color the same as the background color or is it within a given hex range to make it illegible? 4.5.6) Word Count – how many words are within the font tag? 4.5.7) Proximity – where in the font word set does the keyword appear? 4.5.8) Format – is the CSS shifting the font style back to normal? 4.6) List Tags: 4.6.1) Keyword – is the keyword within the list elements? 4.6.2) Keyword Accuracy – is it the exact keyword/phrase or are there additional words included? 4.6.3) Word Count – how many words are within the list elements? 4.6.4) List Count – how many list elements <li> are there? 4.6.5) List Type – is it an ordered or unordered list? If it is ordered does the order signify importance? 4.6.6) Proximity – where in the list set does the keyword appear? 4.6.7) Format – is the CSS editing or positioning the list in any way? 4.7) Paragraph Tags: 4.7.1) Paragraph Count – how many paragraphs are there? Does you content appear choppy because you have a paragraph every other sentence? 4.7.2) Keyword Accuracy – is the exact keyword/phrase found? 4.7.3) Implementation – are <p> tags used correctly with an ending </p> and instead of the <br><br> combination? 4.7.4) Keyword – is the keyword within the P tag? 4.7.5) Word Count – how many words are within paragraphs? 4.7.6) Proximity – where in the paragraph set does the keyword appear? 4.7.7) Context – in what context does the keyword appear? Does the overall theme of the paragraph match that of the page and keyword? So if you have a site on gardening and a page on planting vegetables, it might tip off the search engine if suddenly there was a paragraph on motorcycles. Is it an ad? Was it paid for? What is it doing there? 4.7.8) Format – is the CSS editing the paragraph in any way? 4.8) Span Tags: 4.8.1) Keyword – is the keyword within the span tag? 4.8.2) Keyword Accuracy – is the exact keyword/phrase found? 4.8.3) Word Count – how many words are within the span tag? 4.8.4) Proximity – where in the span word set does the keyword appear? 4.8.5) Format – is the CSS editing or positioning the span tag in any way? 4.8.6) Background Color - if the background color of the contained text is changed (like to yellow to make it stand out) does that make it more important? 4.9) Div Tags: 4.9.1) Keyword – is the keyword within the div tag? 4.9.2) Keyword Accuracy – is the exact keyword/phrase found? 4.9.3) Word Count – how many words are within the div tag? 4.9.4) Proximity – where in the div word set does the keyword appear? 4.9.5) Format – is the CSS editing or positioning the div tag in any way? 4.10) Comment Tags: I doubt they are looked at but it is a factor nonetheless. 4.10.1) Keyword – is the keyword within the tag? 4.10.2) Keyword Accuracy – is the exact keyword/phrase found? 4.10.3) Word Count – how many words are within the tag? 4.10.4) Proximity – where in the word set does the keyword appear? 4.10.5) Format – is the CSS editing or positioning the div tag in any way? 4.11) Summary, Title, and Alt attributes: Summary are normally in tables, title tags are normally in links, and Alt tags are usually within images. All are used to describe elements but it has also been used to stuff keywords and who knows if this is seen as a positive or negative in the eyes of search engines. 4.11.1) Keyword – is the keyword within the tag? 4.11.2) Keyword Accuracy – is the exact keyword/phrase found? 4.11.3) Word Count – how many words are within the tag? 4.11.4) Proximity – where in the word set does the keyword appear? 4.11.5) Format – is the CSS editing or positioning the div tag in any way? 4.12) Other Tags: (pre, xmp, dir, acronym, abbr, blockquote, caption, title…..) some of these are deprecated or obsolete. 4.12.1) Importance/Clarification – most tags are used to signify importance of a word or group of words. Other times the tag is used to clarify what the containing text is such as the <title> or <acronym> tags. The search engines could look at any of these but for the sake of me not listing 50+ more tags, let’s just say that many of the same variables may apply including Keyword, Word Count, Accuracy, Proximity, and Format. 4.13) Nested Tags: an example of this is: <b><i><u>text</u></i></b> 4.13.1) Compounded Importance – if the keyword is nested within multiple tags does it make it any MORE important? 4.13.2) Over-emphasized – can you over emphasize something in the eyes of the search engines? 5) Content/Text: 5.1) Content Block - is the content of the page located within a main element (like a complete article) or is it broken up (like short descriptions found on web results or news clippings)? 5.2) Content Block Number – how many different content blocks are there? Does the use of multiple DIVs or Tables make the search engines treat each container as a separate block of content (even if it may not be displayed that way to the visitor)? 5.3) Content Length – how long is the content? Too short and it may not be considered valuable. Too long and you might want to break it up into separate pages. 5.4) Content Relation – is the content related to the keywords found in the Title, Meta and other tags? 5.5) Unique - is the content considered unique or is it duplicated from somewhere else? If it is duplicated is there a degree of infringement? Meaning, if it is a feed from the AP News Wire, it may not be as valuable as original content but it may not be valueless either. 5.6) Legibility - is the content legible or is it gibberish (like just a random set of words)? 5.7) Content Refresh: 5.7.1) Content Refresh Check - is the content on the page updated? 5.7.2) Content Refresh Frequency - how often does the content update? 5.7.3) Content Refresh Percentage - how much of the content updates? 5.7.4) Dynamic Content Check - does the content update TOO often (like every page load)? 6) Page Functionality/Attributes/Misc: 6.1) ActiveX – does the page use ActiveX controls? 6.2) Validation – does the page validate to w3c standards? 6.3) Java – does the page have any Java Applets? 6.4) Javascript – does the page use Javascript? What elements are affected? 6.5) Flash – does the page use Flash? 6.6) Misspellings – are there any misspellings or words not found in the unabridged dictionary? What is the percentage of misspellings? 6.7) Robots.txt – does the page have a robots.txt file? 6.8) Other Meta Data – does the page contain any additional meta data including geo information, author, or indexing parameters? 7) Site Functionality/Attributes/Misc: 7.1) Contact Info – does the site have any contact info which may include a support form or email address and a physical address? 7.2) Sitemap – does the site have a clear sitemap or a page that shows a hierarchy of the site. 7.3) SE Specific Sitemap – does the site have a Google Sitemap, Yahoo Sitemap, etc.? 7.4) RSS feed – does the site have an RSS file? 7.5) RSS Subscriptions – how many people have subscribed to the RSS feed? Google, Yahoo, or MSN could track this in their own network. 8) On Page Links: Are all links created equal? If search engines are attempting to place a value on one link over another, here are a few things they may be looking at. 8.1) Link Section - are the links found within a Navigation or Footer element on a page? Is less value given to those links? 8.2) Internal Links - are the links internal to the domain? 8.3) Internal Link Percentage – what percentage of the links on the page are internal links? 8.4) Internal Link Type – are the links to internal pages text links or image links? What is the percentage of each? 8.5) Internal Cross Linking – do the internal pages being linked to link back to the originating page? 8.6) External Links – are the links leaving the site? 8.7) External Link Percentage – what is the percentage of the links on the page that leave the site? 8.8) External Ad Links - are the links leaving the site to an ad network? 8.9) External Link Type - are the links leaving the site text links or image links? What is the percentage of each? 8.10) External Link Destination: 8.10.1) External Link Bad Destination - are you linking to a "bad neighborhood"? You can control who you link to and linking to a site that Google has banned or thinks is low quality might affect their view of your site. 8.10.2) External Link Good Destination - are you linking to an "Authority Site"? Is the Authority Site in your topic? 8.10.3) External Link Site Age – what is the age of the external site you are linking to? 8.10.4) External Link Page Age – what is the age of the external page you are linking to? 8.10.5) External Link Page PR – what is the Page Rank of the external page you are linking to? 8.10.6) External Link Relevance – is the external page you are linking on the same topic as the page that is linking? 8.10.7) External Link Text Relevance – is the topic of the external page you are linking to match the link text used in the link? 8.10.8) External Link Hosting – is the external page you are linking to located on the same IP address? A,B,C block? Hosting Provider? 8.10.9) External Link Meta – are the Meta tags on the external page you are linking to similar to your page? Are they identical? 8.10.10 External Link Title – is the Title of the external page you are linking to similar to your page? Is it identical? 8.10.11) External Link Domain – does the domain name of the external site you are linking to use the same DNS server? 8.10.12) External Link Domain Age - was the domain you are linking to registered on the same day as your domain? 8.10.13) External Link Domain Life – how long is the domain you are linking to registered for? 8.11) External Related Link - are the links leaving the site to another related site? 8.12) Content Links - are the links located within the content or do they stand alone? 8.13) List Links – are the links located within a list tag? Is it an ordering list that might signify importance? 8.14) Emphasized Links – is there any emphasis applied to the links using tags such as bold, italic, underline, or heading? 8.15) Link Count – How many total links are there on the page? 8.16) Nofollow Links – do the links have any nofollow attributes? 8.17) Javascript – do the links use javascript? 8.18) Link Target – what is the target for the link? Does it open a new window? 8.19) Link Age – how old are the links? How old are they in relation to the age of the page? 8.20) Link Refresh – have the links been updated? How often do they update? Every page load? 9) Page Rank/Indexing: 9.1) Page Rank - what is the page rank of the specific page? 9.2) Page Rank Growth Rate - how fast has the page’s Page Rank increased? 9.3) Page Rank Contribution – what percentage of the page’s Page Rank came from internal pages or directly from external sources? 9.4) Home Page PR - what is the page rank of the site's home page? Not sure if this is a major factor but it could have a general "value" for the entire site that then passes to any sub-page located within that site. Meaning, would a page on CNN rank higher just because it is part of CNN? 9.5) Page Age - how long has the page been indexed? 9.6) Page Indexing History – how many times has the page been indexed? 9.7) Page Indexing Frequency – how often is the page indexed? 9.8) Page Indexing Speed – how fast was the page indexed from the point where the site was first found? 9.9) Page Discovery – how was the page discovered? Internal Sitemap? Internal link? External Link? RSS Feed? SE Submission? Blog Post?
Off-Page Factors: 10) Domains: 10.1) Domain Keyword - is the optimized keyword located within the domain name? 10.2) Domain Keyword Proximity – if the keyword is present in the domain name, is it the first word? Where is it located? 10.3) Domain Keyword Accuracy – is the exact keyword/phrase found in the domain name? 10.4) Dashed Domain - does the domain have dashes in it? I'm sure they are smart enough to know that people stuff keywords into domains with dashes. The question is whether it has a positive or negative effect. 10.5) AName Domain - is the page being ranked on an "A-Name" or root level domain? (aname. domain.com) or (www. domain.com) 11) Inbound Links: Many people believe that links are extremely important but there are likely a lot of variables they look at to determine the value of an inbound link. 11.1) Inbound Link Domain - does the inbound link come from a different Domain? 11.2) Inbound Link Type - does the inbound link come from an image or text? 11.3) Inbound Link Section – did the inbound link come from a navigation or footer link? 11.4) Inbound Link Keyword - does the inbound link have the optimized keyword in the link text? 11.5) Inbound Link Accuracy – is the exact keyword/phrase contained in the link text? Any additional words? 11.6) Inbound Link Proximity – if the keyword/phrase is present in the link text, where is it located within the text? 11.7) Inbound Link Word Count – how many words are located in inbound link’s anchor text? 11.8) Inbound Link Site Relevance - does the inbound link come from another site that is about your topic (or related topic)? 11.9) Inbound Link Page Relevance - does the inbound link come from another page that is about your topic? 11.10) Inbound Link Age – how old is the link? 11.11) Inbound Link Page Age – how old is the page that you got the link from? 11.12) Inbound Link Page Index Frequency – how often is the page indexed where the inbound link came from? 11.13) Inbound Link Page Index Speed – how fast was the page where the inbound link came from indexed since the site was first found? 11.14) Inbound Link Page History – how many times has the page been indexed where the inbound link came from? 11.15) Inbound Link Page Discovery – how was the page that in inbound link came from discovered? 11.16) Inbound Link Page PR – what is the Page Rank of the page that the inbound link came from? 11.17) Inbound Link Page PR Growth – how fast has the page where the inbound link came from grown in Page Rank? 11.18) Inbound Link Text Relevance – does the topic of the page where the inbound link came from match the link text used in the link? 11.19) Inbound Link Page Meta – are the Meta tags on the page where the inbound link came from similar to your page? Are they identical? 11.20) Inbound Link Page Title – is the Title of the page where the inbound link came from similar to your page? Is it identical? 11.21) Inbound Link Site IP Address – is the site where the inbound link came from on a different IP address as your site? 11.22) Inbound Link Site IP Block – is the site where the inbound link came from on a different A, B, or C block? 11.23) Inbound Link Hosting Provider – is the site where the inbound link came from on a different hosting provider? 11.24) Inbound Link Domain Age - is the age of the site where the inbound link came from? 11.25) Inbound Link Domain Age Match – was the site where the inbound link came from registered on the same day as your domain? 11.26) Inbound Link Domain Life – how long is the site where the inbound link came from registered for? 11.27) Inbound Link Deep Link Percentage – what is the percentage of inbound links that go to the homepage or deep linked into sub-pages of your site? 11.28) Inbound Link Keyword Density - what is the keyword density of all inbound links; is it skewed to a single keyword? If it is skewed, is the link text the name of your domain which might be considered natural? 11.29) Inbound Link Bad Destination – is the inbound link coming from a "bad neighborhood"? Most think you can not get penalized for people linking to you. That does make sense but you never know. 11.30) Inbound Link Good Destination – is the link coming from an "Authority Site"? Is the Authority Site in your topic? 11.31) Inbound ROS Links - are your links ROS (run of site) links from another site? 11.32) Inbound Reciprocal Links – is the inbound link a reciprocal link? (you link to them and they link to you) 11.33) Advanced Link Exchanges – are you getting a link from Site A, who got a link from Site B, who you linked to? 11.34) Inbound Link Site Popularity – how popular is the site that linked to you? In terms of traffic or clicks in the SERPs? 11.35) Inbound Link Site RSS subscriptions – how many RSS descriptions does the site have that linked to you? 11.36) Inbound Link Site Dominance – how long has the site that linked to you been ranked in the Top 10 in your keyword? Top 50? Top 1000? 11.37) Inbound Link Site Advertising – does the site that linked to you have an AdWords/Overture/AdCenter account? 11.38) Inbound Link Site Ad Network – does the site that linked to you have an AdSense/YPN account? 11.39) Inbound Link Site Analytics – does the site that linked to you use Google Analytics? 12) Domain Registration: 12.1) Domain Age - how long ago was your domain registered? 12.2) Domain Life - for how many years IS your domain registered? 12.3) Domain History - has the domain changed ownership, if so when and have there been any major changes to the site? 12.4) Domain DNS - what is the domain name server (DNS) that your domain is currently using? 13) Indexing: 13.1) Site Indexing - how many total pages have been indexed on your site? 13.2) Site Growth Rate – what is the growth rate of your site from 1st, 2nd indexing etc.? Has it stopped growing? Does the site appear dead? 13.3) Site Index Age - how old is the site from first indexing (as opposed to registration date). 14) Directories: 14.1) Site Directory - is the site in DMOZ or another directory. I don't know if they look at this anymore but who knows. 15) Site Popularity: 15.1) Site Search Engine CTR - there has been debate as to whether the CTR in the search results themselves can move you up the rankings. It may stand to reason that the more sites are clicked on in the SERPs the more Google sees them as relevant and therefore should be ranked higher but there is NO PROOF of this that I know of. 15.2) Site Traffic – what is the traffic of your site. I doubt Google looks at Alexa but other search engines might. And if you are using Google Analytics they could use that information. 16) Other Factors: 16.1) Search Engine Removals – number of times your listing has been removed from the search results or blocked by people. 16.2) Site Reports – the number of negative reports of inquires your site has had. 16.3) Site Bookmarks – number of times your site has been bookmarked using and Toolbar Application. 16.4) Site Privacy Trust – does your site have a privacy policy? 16.5) Site Secure – does your site have a valid secure certificate? 16.6) Site Email – is the email server that the site uses associated with spam?
It has become apparent that this doesn't just pertain to Google but all search engines and should therefore be in the SEO forum. I've posted the new list here: http://forums.digitalpoint.com/showthread.php?t=62899
I am confused as to why the Mods moved my post back to this thread. It does not belong in the Google Forum as it is not just about Google. These are search engine variables for ANY search engine.
Thanks everybody for all your comments - I think a great list has been produced which has given us a possible insight into how Google really works. I think it is relevant more to google than any other s/e as we all know how difficult it is to be successful in Google without a lot of work. Any other suggestions would be welcomed so we can produce the most definitive list of how we think Googles ranking algorithms are made up. Thanks.
It is difficult to be successful in any search engine without a lot of work. With all due respect, when I created this list it was meant to be a compilation of search engine variables, not Google variables. These are factors that ANY search engine might look at and/or criteria someone would use if they were creating their own ranking algorithm. That is why we call it S.E.O. (Search Engine Optimization) not G.O. (Google Optimization). I suppose I am just concerned that this list, while extremely relevant and valuable to search engine optimization, may not be seen unless someone ventures into the Google Forum. Anyway, it was a good excercise and it gave me a headache for several hours just thinking about it.
In reality, how many people actively optimize for engines other than Google. So, maybe SEO is more accurately called GO. In looking at your list - it seems very comprehensive and a good start at reverse engineering. Are you look at this as an acedemic exercise?
Hi Phynder Interesting point regarding Google (GO - that could catch on!). SEO is dead - long live GO!
Ok, so I was just counting up the variables listed. And obviously I didn't list everything. There are always going to be some we missed. But there are approx 210 On-Page variables listed. If you add in the 30 or so additional tags (pre, xmp, dir, acronym, abbr, blockquote, caption, title...) it comes to approx 360 On-Page variables total. There are approx 70 Off-Page variables listed. Depending on how far back they go, (like looking not just at the sites that link to you but also the sites linking to the sites linking to you) that number could be multiplied by about 3 which would make it somewhere around 210 Off-Page varibles total. So the total algorithm variables listed are about: 280 The total algorithm variables accounted for (only based on this list) is: 570 That seems close to the 600 that someone quoted.
That's just it, Google is the one everyone wants so people are more willing to ignore MSN and Yahoo in their quest to win favour with Google. Good list.
I was thinking to create a tool that can tell (of course not 100% , I hope 50%) why a page is ranked better then another . But google shows only 1000 results so we can't analyze all backlinks google has in their index .
I've always thought it would be interesting to take a list of possible metrics like the ones speakerwire mentioned, somehow quantify each of these and get some known inputs and SERP results for say 10,000 sites. Put this data into a spreadsheet, punch it into a neural network and you should be able to make reasonable accurate predictions, and see the weight given to each variable. I'd say it would be and absolute pain in the arse to implement (especially quantifying things it's difficult to put a number to), but i think it would answer so many questions about weight given to different factors.