Back engineering the Google algorithm...

NetMidWest Peon

Messages:: 1,677

Likes Received:: 151

Best Answers:: 0

Trophy Points:: 0

#21

I would have to say there is an argument for proximity of words near keywords...
"Check out" (this site) and "Check out now" (buy now) is something I have been looking at a bit, to see if Google's semantic algos are working properly (no conclusions yet).

If I were building my own, somebody writing about polictics suddenly has a link to a widget site in the text, it would flag.
(We've seen these.)

Same for plain text. If they suddenly went off-topic, talking about widgets, the page would score for widgets and politics much less.
A simple mention of widgets would not.

NetMidWest, Mar 2, 2006 IP

FujitsuBoy Guest

Messages:: 54

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#22

NetMidWest said:

I would have to say there is an argument for proximity of words near keywords...
Click to expand...

I agree, and PageRank documentation does support this theory. 6-7 years ago Google were storing the first 4096 bytes of plain text on the page and using proximity of words to find results to best match the query.

FujitsuBoy, Mar 2, 2006 IP

NetMidWest Peon

Messages:: 1,677

Likes Received:: 151

Best Answers:: 0

Trophy Points:: 0

#23

FujitsuBoy said:

I agree, and PageRank documentation does support this theory. 6-7 years ago Google were storing the first 4096 bytes of plain text on the page and using proximity of words to find results to best match the query.
Click to expand...

I remember reading that now, and experimenting with this:
http://forums.digitalpoint.com/showpost.php?p=181163&postcount=5

It worked well. Nowadays, you can do similar stuff with css.

NetMidWest, Mar 2, 2006 IP

Dekieon Peon

Messages:: 33

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#24

The google algorithm, is the such a closely guarded secret, that trying to figure it out completely would be like trying to find a pin, in a needle stack.

I sure hope google treats their employee's like gold, otherwise some disgruntled employee could spill the beans on there algoithm.

Dekieon, Mar 2, 2006 IP

BrianR2 Guest

Messages:: 734

Likes Received:: 24

Best Answers:: 0

Trophy Points:: 0

#25

I think this is a really good exercise even if we can't find all of the factors. It gets us thinking in creative ways that can only benefit our seo efforts.

I don't think Google's algo takes into account the meta tags, especially the meta keywords tag. It uses the meta description sometimes I think but I'm not sure if it plays into the ranking. I would think that the DMOZ description would be very good for Google to find out what a site's about since it is reviewed by editors who look at the website to make sure that it is an accurate description and is pretty much guaranteed to not have keyword stuffing or anything else to manipulate search engines unless the editor is asleep on the job.

BrianR2, Mar 2, 2006 IP

speakerwire Peon

Messages:: 61

Likes Received:: 9

Best Answers:: 0

Trophy Points:: 0

#26

you guys have come up with some good ones that I wasn't even thinking about. Whether or not all of these are ACTUAL variables, I think making the list of POSSIBLE variables is a good exercise.

I would update my post to include the new stuff but I can't edit it now. Maybe one of the MODS can either make this thread a sticky or make a new thread a sticky with the compiled list that they can add to.

I don't know how it works, I just think this information shouldn't be lost on page 5 in a couple of weeks.

Just my thoughts. Keep them coming!!

speakerwire, Mar 2, 2006 IP

Mong ↓↘→ horsePower

Messages:: 4,789

Likes Received:: 734

Best Answers:: 0

Trophy Points:: 235

#27

It is very easy to undertand what is tried to put in Google algo that is to judge the webpage as much wisely as professional human can judge and find out for which "keyword" it is useful and note it in notebook.
That is rocket science and that is simply plain idea.

Mong, Mar 2, 2006 IP

sparkimarki Peon

Messages:: 105

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#28

Thanks speakerwire and everyone else for some greats posts. I have started to try to arrange the items in word to copy and repost one great complete list - but of course it won't be complete because we are still hundreds of factors missing!

I appreciate the great answers and will post the list soon. Please add any more you can think of in the meantime and I'll add those in too.

I think this will produce a great list of things to bear in mind for any web developer.

sparkimarki, Mar 2, 2006 IP

sparkimarki Peon

Messages:: 105

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#29

OK, I am just putting the list together - can somebody just clarify whether the following items suggested are possible...

Physical address listed - can worldwide addresses be identified, would it look for a zip/postal code? Would they build a list of available codes and if one of them was found, along with the word 'address' or the word 'contact' on a page give the site a point? I refer to points as I assume google has to build a score of your site based on the 600 (assumed) factors.

Spelling - can anybody think of any well-known site of any type which ranks highly in google which has spelling which you could normally not find in a dictionary?

Page Colours - I am no expert on hex colour values but if they go from 000000 to FFFFFF, is each one progressively darker than the next? Would they look for conflicting colours or hard to read. Again can anyone suggest a site that might disprove this?

Thanks.

sparkimarki, Mar 2, 2006 IP

Dekieon Peon

Messages:: 33

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#30

As far as colours go...I doubt google looks at whether they are too contrasting or conflicting. Mainly because who is to say what people like. some people like smooth flowing colour schemes, and others like high contast in their colours to make things stand out. The only thing I could see google taking into account, would maybe if the background colour is the same as some text colour, so as to hide the text from visitors, which google doesn't like. And I have seen this done on websites, not so much anymore, but in the past, most definately.

Dekieon, Mar 2, 2006 IP

NetMidWest Peon

Messages:: 1,677

Likes Received:: 151

Best Answers:: 0

Trophy Points:: 0

#31

sparkimarki said:

OK, I am just putting the list together - can somebody just clarify whether the following items suggested are possible...

Physical address listed - can worldwide addresses be identified, would it look for a zip/postal code? Would they build a list of available codes and if one of them was found, along with the word 'address' or the word 'contact' on a page give the site a point? I refer to points as I assume google has to build a score of your site based on the 600 (assumed) factors.
Click to expand...

I think physical address, phone, email address, etc. should not and probably are not a factor. Many companies have these behind robots.txt files, or noindex'd, to avoid junk mail, spam, and unwarranted cold sales calls. Many lists are built and sold by what can be found by a simple spider, and it would make sense that scraping the serps from Google for such info would also yield profitable data. If that does not work, you could get some hack in a third world country to sit there all day and copy and paste for cents an hour.

Try posting your email address on a decent ranking site and see how the spam goes up. This increases time spent dealing with email, even if you delete it or use spam filters. But some always gets through, slowing down the service.

NetMidWest, Mar 2, 2006 IP

Las Vegas Homes Guest

Messages:: 793

Likes Received:: 59

Best Answers:: 0

Trophy Points:: 0

#32

Dekieon said:

The google algorithm, is the such a closely guarded secret, that trying to figure it out completely would be like trying to find a pin, in a needle stack.

I sure hope google treats their employee's like gold, otherwise some disgruntled employee could spill the beans on there algoithm.
Click to expand...

This might not be a secret to much longer. If the goverment gets its way and Google chooses to battle this in court, during cross several of Googles algo secrets will come out. Google even says in their answer to the goverments motion that this is a concern of theirs.

I can see it now, Yahoo, MSN and several Seo's world wide sitting in the court room day in and day out waiting for something to come out.

Las Vegas Homes, Mar 3, 2006 IP

FujitsuBoy Guest

Messages:: 54

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#33

sparkimarki said:

Physical address listed - can worldwide addresses be identified, would it look for a zip/postal code?
Click to expand...

In the past I've had a site rank higher with a physical address listed on the contact page. I can see why this would be a factor - a site with a legitimate postal address can be more trustworthy than one without.

I recently developed a web crawler that takes physical addresses very seriously on a site. It uses geodetic transformations and effectively "maps" sites around the country, then breaks them down by topic. It's still in beta stages at the moment but it seems to bring back some great results. I feel like I spend more time with my head in research papers than I do coding...

sparkimarki said:

Spelling - can anybody think of any well-known site of any type which ranks highly in google which has spelling which you could normally not find in a dictionary?
Click to expand...

Many SEO contests use non-dictionary phrases (I would quote some, but I can't remember how to spell them ). I could imagine Google penalising a site with more than X% non-dictionary words (i.e. spelling mistakes), but I haven't seen proof.

sparkimarki said:

Page Colours - I am no expert on hex colour values but if they go from 000000 to FFFFFF, is each one progressively darker than the next? Would they look for conflicting colours or hard to read.
Click to expand...

Close, its actually the other way round. Yes, hidden text is now hunted down like the cheap trick it is (white text/white background), and I believe the new GoogleBot has been developed to spot similar tricks with CSS and/or hidden divs.

I'll try to think of more possible variables today, see what I can come up with. Very interesting thread

FujitsuBoy, Mar 3, 2006 IP

sparkimarki likes this.

dchapman Guest

Messages:: 96

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#34

The first mistake I see here is the use of the word "algorithm".

There are MANY different algorithmS and processes at work.

dchapman, Mar 3, 2006 IP

Labcoat88 Peon

Messages:: 192

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#35

Interesting thread... perhaps there's a Google factor that flags "hot" discussions about Google factors for their attorneys to investigate further. It sure seems like they have people with spare time to keep up on this stuff.

speakerwire said:

3) Registration:
a) How long ago was the domain registered?
b) For how many years IS the domain registered?
c) Has the domain changed ownership, if so when and have there been any major changes to the site?
d) What is the domain name server that the domain is currently using?
Click to expand...

I'm curious about (d). Would this be analogous to "bad neighborhood"? i.e. if your DNS provider also hosts many xxx-related domains, etc. Or ranking different DNS providers -- does Verisign outrank GoDaddy? etc...

Another comment I had about factors and weightings: I think its been discussed before that different content/topic spaces could very well have different factors/thresholds, i.e., different weightings. So (simple example) even if you knew how H1 fared vs. 'alt' tags, that equation might be different for casino sites vs. religion sites.

A few other things that certainly are factors include time-based and event-based relevancy. So "heart-shaped box" could mean one thing around Feb 14th, and a totally different thing around late March (American Heart Assoc. "Heart Walk" events). For news/relevancy items, Google has GoogleNews, a well-segmented source for classifying "current event" keywords and topics.

LC

Labcoat88, Mar 3, 2006 IP

BrianR2 Guest

Messages:: 734

Likes Received:: 24

Best Answers:: 0

Trophy Points:: 0

#36

sparkimarki said:

Spelling - can anybody think of any well-known site of any type which ranks highly in google which has spelling which you could normally not find in a dictionary?
Click to expand...

Robert Scoble from MSN started a search engine experiment with blogs using a word he made up called "brrreeeport" and his site is pr 6 and probably ranked highly for some keywords other than the obvious, "scoble" and "brrreeeport." Were you wondering about spelling mistakes and how the search engines treat sites with spelling mistakes? If so, this might be a good one to watch. I doubt MSN penalizes for this since Scoble is an MSN search guy and he's doing it all over his blog.

BrianR2, Mar 3, 2006 IP

NetMidWest Peon

Messages:: 1,677

Likes Received:: 151

Best Answers:: 0

Trophy Points:: 0

#37

BrianR2 said:

Robert Scoble from MSN started a search engine experiment with blogs using a word he made up called "brrreeeport" and his site is pr 6 and probably ranked highly for some keywords other than the obvious, "scoble" and "brrreeeport." Were you wondering about spelling mistakes and how the search engines treat sites with spelling mistakes? If so, this might be a good one to watch. I doubt MSN penalizes for this since Scoble is an MSN search guy and he's doing it all over his blog.
Click to expand...

Do you mean he is misspelling words on purpose, or by accident?
I would expect a dictionary of common misspellings to be used, I doubt that brrreeeport would be in it.
http://itre.cis.upenn.edu/~myl/languagelog/archives/001533.html
has many purposefully misspelled names, in blogging about a mural that had 11 misspellings.

Which gives me another idea... using the wrong, correctly spelled, but similar sounding/spelled word?

NetMidWest, Mar 3, 2006 IP

Notting Notable Member

Messages:: 3,210

Likes Received:: 335

Best Answers:: 0

Trophy Points:: 280

#38

Essentially this is the most comprehensive SEO thread I've ever read. Will be keeping my mince pies on this one!

Notting, Mar 3, 2006 IP

corinaw Not Banned

Messages:: 486

Likes Received:: 69

Best Answers:: 0

Trophy Points:: 0

#39

A few more I did not see listed above:

Validated HTML.
Following accessibility guidelines for those with disabilities per w3.org (no spam).
NO Broken links

corinaw, Mar 3, 2006 IP

equinoxprime Peon

Messages:: 144

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#40

Proper formatting is a big one, as well. Check if your sites validate.

equinoxprime, Mar 4, 2006 IP

Log in or Sign up

Back engineering the Google algorithm...

NetMidWest Peon

FujitsuBoy Guest

NetMidWest Peon

Dekieon Peon

BrianR2 Guest

speakerwire Peon

Mong ↓↘→ horsePower

sparkimarki Peon

sparkimarki Peon

Dekieon Peon

NetMidWest Peon

Las Vegas Homes Guest

FujitsuBoy Guest

dchapman Guest

Labcoat88 Peon

BrianR2 Guest

NetMidWest Peon

Notting Notable Member

corinaw Not Banned

equinoxprime Peon

Useful Searches