I was thinking on a way to determine if a pagerank 5 is 5.0001 or 5.9999 so: What do you think on this: Calculate the pagerank of the main page spidering up to 3 deep levels and counting how many pagerank does have those pages, and how many internal links they have, and with the formula of pagerank (PR=(1-K)+SUM(...)) estimate the decimal part of your pagerank using the information of propagated pagerank of other pages in your site. a more accurate way will be also looking for all backlinks on spidered pages getting the PR of that backlinks and the number of links they have.
Of course the most accurate way would be to steal one of the computers at Google that has the "actual" PageRank values for every page on the Internet.
Im looking for my skiing mask, Ill have a layover near the googleplex and get back to you with that computer.
Actually, now that I think about it, stealing the data would not only be more accurate, it would be easier than trying to reverse engineer the PageRank algorithms.
If you are really seriously interested I think I can help you get very close. Rustybrick has a new tool which will give you the number of backlinks and their PR value for any page at http://www.rustybrick.com/link_analysis.php. I have done some testing with this and I have modified the chart from my Google PageRank Article so that you can input the numbers from Rustybricks tool and it will calculate the PR points on my scale. You can also do "what ifs" by adding more hypothethical links. For instance you can ask the question "what if I had 2 more PR7 links?" and the spreadsheet will calculate the probably PR points and translate that into a PR rating. If anyone would like to have a copy of this new chart let me know and I'll make it available.
The original table was interesting and useful. In fact, last night I wanted to cite it and present some numbers from it in a reply post on another forum, where someone had asked a question that the table could answer, only the dog ate my homework, which is to say that that forum swallowed my post and wouldn't spit it back, and I had spent some time in composing it and didn't want to start over. As a sidebar, we perhaps are too easily spoiled by how well, in a sheer mechanical sense, the Digitalpoint forum works. Several others I have tried or use have problems ranging from annoying to severe with, among other things, a) being unable to remember whether someone is logged in (or, at the outset, even registered), and b) links to individual categories that don't work or go to the wrong place.
Thank you. If you remember the assumption in the chart was that the linking page had 50 links. I tested a reasonable number of result from Rustybrick's tool and if I adjusted the assumption to 40 links per page I got a very nice 100% correlation with Rustybricks results. In other words if Rusty reported a PR6 page had 22-PR4, 62-PR5 and 9-PR6 backlinks, when I plugged those into my spreadsheet it predicted that that page should be PR6. In fact those are the actual results for my InfoPool index page. If you play with the what ifs. It would require 2 PR8 links to push that page up to PR7. However another actual page which is PR6 has 42-PR3, 294-PR4, 228-PR5 and 11-PR6. The what if for this page says that a single PR8 would drive this page to PR7.
The talk of reverse-engineering got me thinking...instead of reverse-engineering based on the original algorithm and a few sites' page rank, why not do it more in a black-box manner. Feed a new algorithm the inputs (IBLs, PR of IBLs, unique IPs of IBL, etc) and the desired output of the PR given by the toolbar for thousands of sites. A neural net or genetic algorithm could handle this. It should start learn how each input affects the output. Among other things, this might also give you some insight into how many IBLs from the same IP starts to be to many. Re-reading this, I just realized I don't really care if the PR is 5.1 or 5.9. And also, I think I've read the toolbar PR is far from exact. It is more or less an quick estimate.
Im just going to steal their computers and then I will get back to you. It will just be easier. On a more serious note, I like the idea of engineering a new algo to match this, but I think it would be a tough task to pull off, youd be technically creating google..
I don't think you would be 'creating google'. Calculating PR isn't the though part for them. They are the ones with a cache of the whole internet, the ability to update it constantly, and return backlinks on a given site. That's the hard part. You would just be using results from the backlink calculation that Google gives you.
Well if you were going to reconstruct a search engine, you would need to daily update everything the same as google and beable to calculate all of the backlinks. The only thing you wouldnt necessarily have to do is store it. Everything else they do would still have to be done.
Thank you Compar for sharing your insights in the pagerank calculation algorithm. I have a question - Google gives importance to relevant backlinks. So I guess that factor also need to be incorporated in the logic; the more the relevance more is the factor. Any thoughts on that?
I haven't visited this forum for several months. Does everyone still think that PR is the most important element in getting a good Google SERP placement. P.S. I haven't thought so in a very long while.
The myth continues... some of us have left the Cult of PR a while ago http://forums.digitalpoint.com/showt...14#post1136514 and http://forums.digitalpoint.com/showt...=155289&page=3 Oh and a piece of mine on SlashDot ; http://developers.slashdot.org/artic.../07/24/1358204 The value of TBPR dropped off after Florida/HillTop back in 2004.. I talked to many people writing my piece including Danny Sullivan, Jill Whalen and more.... it's fact among most SEOs... Bye Bye TBPR
nice to see you around again and yeah I pay virtually no attention to pagerank apart from when I'm trying to get link exchanges (that is, I try to have my own site have PR so I don't get turned down all the time before I start a link building campaign)