Someone just emailed me and spoke of the "related:" search in Google. I tried it and it does return a body of pages, but I'm not sure exactly how the relationship is determined. Does anybody know what this search is really returning in the SERP?
Compar, I already have a request in at Google asking on something very similar to this since I couldn't find anything specific across multiple forums and/or news sites. I have found on the Alexa site of a similar nature: Question: How are Related Links determined? Common Questions Alexa Internet uses crawling, archiving, categorizing, and datamining techniques to build the Related Links lists for millions of Web URLs. One technique used is to analyze links on the crawled pages to find related sites. The day-to-day use of the Alexa service and Related Links by all Alexa users also helps build and refine the data. By looking at high-level trends within the millions of URL "paths" created by Alexa users, we can deduce relationships between Web sites. For example, if many users go directly from site A to site B, the two sites are likely to be related. Next, all the URLs are checked to make sure they are live links. This process removes links that would take you to pages that don't exist (404 errors), as well as any links to servers that aren't available to the general Internet population, such as servers that are no longer active or are behind firewalls. Finally, once all of the relationships are established and the links are checked, the top Related Links for each URL are automatically chosen by looking at the strength of the relationship between the sites. Alexa Internet recrawls the Web on a regular basis and rebuilds the data to pull in new sites and to refine the relationships between the existing sites. New sites with strong relationships to a site will automatically appear in the Related Links list for that site by displacing any sites with weaker relationships. Please note that since the relationships between sites are based on strength, Related Links lists are not necessarily balanced. Site A may appear in the list for Site B, but Site B may not be in the list for Site A. Generally, this happens when the number of sites with strong relationships is greater than ten, or when sites do not have similar enough content. If/when I get a response from Google, I will gladly let you know.
With Alexa most of their related links are suggested to them by the webmasters (owners) of the site by their own submissions. If you look through the Alexa program, it will ask you to submit related sites and they will add them as related sites if the editors approve them. So some of this is by human intervention. As far as Google is concerned, they are using Algo's, DMOZ, Google Directory and toolbar data to build theirs, that is my guess.
All I could find is that it's mainly based on linking, not so much content... But I'd like to know too. (Unrelated: When looking for info on the subject I found this link here: Google with weird picture :S . How did that picture get there? )
This may have been obvious to some of you but here is what I just discovered. The results Google returns for the search 'related:www.somedomain.com' are the same as you get when you click on 'Similar pages' in a SERP. This all came up this morning when I a webmaster pointed out to me that he had PR0 and no backlink found according to Google, but a significant number of pages returned for thr related: search.
Seems to be it's all determined based on linking. Either a site that links to you, or a site that another site links to them *and* you from the same page. Ultimately it's really not very accurate. The Alexa related sites are much more accurate because it's actually based on visitor's habits.
Bob, Google uses the term "related" within their segment about similar pages in their how-to section. My question continues to be "exactly how are these similar/related pages determined?" and "are there any differences between clicking on similar vs. performing a related: or @ search?". It seems to me that knowing more about the details of these searches could prove insightful regarding how Google handles these and potentially where they are headed (?).
Related similar search seems to be based on linking structure. I did a preschool page for my sister and linked to her from my algebra tutorial page. She links to my home page, algebra site, and my two genealogical web sites. When I do related search for her site search results return all of the web sites I have done. Most of these sites have no relation to preschool site. Shannon
Bob, I just received the response from Google (for what's worth)... According to them, the 'related:' search and the 'similar' link perform the same function. This is described on their site at http://www.google.com/help/operators.html. The Google Team also went on to state: Please note there is no operator corresponding to the '@' character. I'm not sure what else I may be able to get from them but, hope this helps you in some way.
Interesting stuff. If you read one of my earlier posts you will see that I had already figured out that it was the same as the Similiar pages link that one finds in the SERPs. But that still doesn't answer the question "what is the relationship or similiarity? I also had suspected the the "@" character was not a search operator. McDar was using it at one time to indicate total links known by Google. I wonder what you are actually getting when you include it in a search string?
There are several characters that Google does not index (!@#$%^*? among them) and using any of these characters as a prefix seems to return the same results as using the @ prefix. If you search for the @yourURL it will return a list of all indexed pages which contain the text after the @, which are the same results as you get when you search "yourURL", but if you search for just yourURL google will return the homepage of that site. Interestingly, these searches return pages where the only mention of the URL are in links, so it is not only the visible content of the page that is being searched. My conclusion is that the prefixing of a non indexing character somehow turns off the Google feature that returns the home page and results in the same search as if you put quotes around the URL. Since most instances of the full url being found on a page are links, this often gives a good approximation of the true number of links you have in pages indexed by Google, and is why people often use it to see more than the link: search reports. There are those that feel that Google may report a link, but not count it for ranking purposes, or that just because a link is found with this type of search it does not mean that Google counts it. I cannot say with any authority that this is or is not the case, but logic leads me to believe that Google counts all links it knows about, otherwise PR would be inaccurate .
Good explanation Mel. That indeed does appear to be what is happening when you use the @ before an URL. So it may not be a Google operator but it does nonetheless have a function. Now do you have an answer for the original question. What is the relationship of sites found with the "related" search or the Similiar Pages" link in the SERPs?
Here is an explanation, but it does not make it clear to me: http://www.googleguide.com/similar_pages.html#howSimilarWorks
Good find Mel. However, I think that explanation is more speculative than authorative. The search definitely has something to do with pages that are linked to you, but I just did a Similiar Pages search from a site of mine with a total of 579 reported backlinks. The search returned 31 pages and while there was a link connection between them and the related page, the relationships where neither thematic or what I would call strong. In fact the 31 pages returned as similiar, had almost nothing to do with the main page, other than a tenuous link relationship.
Compar, Couldn't the "wildcard" here be the statement "and which additional links users click."? From the Google guide: Google automatically selected these sites by considering many factors including the popularity of the pages containing links to Google Guide, the positions, sizes, and proximities of other links to the Google Guide link, and which additional links users click. *Note: boldface emphasis placed by me to highlight statement segment.
But that is the very part that I think is speculative. I don't think Google keeps records of link usage and page popularity. And if they did that sure wouldn't explain the 31 related pages I talked about in the post above.
It's mainly based on linking (co-citation). Google has a new patent describing the process. You can read it here: Techniques for finding related hyperlinked documents using link-based analysis It describes how to get related pages There are algos to find related sites They are also based on the very same principles
nohaber, That may be the most informative posting anyone has made on this forum in some time. If Google can do this, or is doing this, think of the implication for various linking strategies. It could go a long way to explaining some of the very severe drops some people are seeing in their aggressively promopted sites. It is amazing to me that nobody has commented on this patent and the information you have supplied.