Taher Haveliwala is another Stanford graduate who now works at Google. Has anyone read his conference paper from the International World Wide Web Conference, (2002) here on Topic Sensitive Pagerank? Abstract: In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared.
Dominic, this is why many (myself included) harp on about relevance in links. During Florida it became obvious that Google were playing with TSPR & also Local Rank. Bob (Compar) was heavilly involved in the discussions that went into all sorts including TSPR, hilltop, Latent Semantic Indexing (LSR) Semantics in general and lots more.. here is a good read for LSI http://www.cs.utk.edu/~lsi/ I
I've done a bit of reading on those topics but hadn't read the TSPR document before. LSI and Local Rank are real winners in my book, not to sure about hilltop. I'd like to do more reading on TSPR if you have anything bookmarked.
LocalRank was used before Florida. No one paid attention to it though. The Topic Sensitive PageRank, imo, has a long way before (if at all) becoming a part of Google. I think Google modifies the PageRank of every page, before putting it in the index. Every keyword independant ranking factor can be nicely squeezed in this PageRank number (or better call it DocRank). Thus you can do all sorts of modifications to PageRank before putting it in the index. You won't need to modify the search algorithm, just modify the data you feed to it Example: you start indexing a new type of document (let's say pdf) but you don't want to make this kind of document rank on top, unless no other pages are found. You simply need to cut off its pagerank before putting it in the index. All documents of this type will rank low, unless no other results are found. To me, Google uses keyword independant factors (such as a page age) to modify PR before it is put in the index. And because we don't know the exact number that's written there (we only now the toolbar PR which is unmodified by the other factors), we can't know the *real* PageRank or DocRank of a page. Topic Sensitive PR is difficult to implement, because it is difficult to judge the topic of a page. That's my opinion based on years of programming experience. Of course, a lot of "never-has-written-a-line-of-code experts" will disagree. Topic Rank could improve the relevancy of some queries, and make others worse. It would be really difficult to make it work in the real world. Example: how would you determine the topic of a page that's in some exotic language? How would you weight in the country-specific factors? Too many details must be taken care of... I don't believe that Topic Rank will ever be implemented. The future of Google's ranking algo, is this number that's written in the index - call it PR, Topic PR, Doc Rank, whatever. The great thing about it, is that you manipulate it in-house, and once written, it does not slow down the response time to queries. And the very good thing about it, is that this number is not shown by any toolbar, and we can all sit and guess how Google modify PR.
Well there is no way of knowing if Google do modify PR (though since its only a rented algo, I really doubt if they can modify it and still call it PageRank) but one thing seems abundantly clear, and that is that it is a very minor ranking factor in Google, and thus any such changes would be overidden for the most part by things like anchor text links.