Pr Update!?

Discussion in 'Google' started by joeychgo, Sep 12, 2004.

  1. Mel

    Mel Peon

    Messages:
    369
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #81
    Of course there has to be someplace where the spidered data is kept until the PR is calculated and the public links database is updated. When each page is spidered and reported to Google the information first goes into a Repository and then is parsed into the data which Google uses to do rankings for actual queries. The data from the onpage content (including the anchor text) is put into the word barrels with pertinent data encoded with the listing.

    The link data is put into a seperate database which is used to calculate PR. I am of the opinion that the public database which is used to show (some of) the backlinks is not the same database which is updated continually as spidered data is parsed. What you are referrring to as a backlinks update is IMO an update of the public database, but in the background Google is still recording thousands of links every second as data from the spiders comes in.


    What I am saying is that when I measure the actual directory bars with a screen caliper I get different figures than these.

    More or less we are saying the same thing.

    The PR equation is:

    PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

    which certainly does not produce a logarithmic result.
     
    Mel, Sep 19, 2004 IP
  2. PRBot.Com

    PRBot.Com Guest

    Messages:
    244
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    0
  3. bobmutch

    bobmutch Peon

    Messages:
    683
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    0
    #83
    Mel: I think we all know here there is one real or raw PR and that the directory displayed PR and the toolbar displayed PR are that real or raw PR displayed on a 7 unit 2/38, 11/29, 16/24, 22/28, 27/13, 32/8, 38/2 scale, and on a 10 unit 1-10 scale. There may be disagreemnt on the directory displayed PR scale, which I can understand as it is not completely clear.

    Well I don't think the wideth of the directory bars on a screen means anything. It is the scale that is in the source that matters.

    This is the equation to calculate the voted PR . I was refering, or meant to refer to, the toolbar displayed PR, that that scale is logarithmic. That is to say the scale increases by multiplying rather than adding. So to go from a PR4 to a PR8 is not twice as much. The toolbar scale is logarithmic and I noted Wakfer numbers uses a base of 5.5 and Sobek's numbers are a base 6. They both show their proposed scales in there respective articles.


    Well if this is the case how do you explain that sites start to receive the effects of their backlinks only after a backlink update. And by the backlink update I mean when a change in the displayed backlinks is made. If backlinks are updated and applied ever 2 days as you believe then why don't you see the effects of those backlinks right away?
     
    bobmutch, Sep 19, 2004 IP
  4. bobmutch

    bobmutch Peon

    Messages:
    683
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    0
    #84
    bobmutch, Sep 19, 2004 IP
  5. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #85
    Obviously not, since the toolbar and directory graphical representations of PageRank are rarely and then only briefly up to date and since any number multiplied by a graph yields nothing at all. I'd also point out that nowhere, ever, have I suggested that -- another example of your favorite technique of inventing something that wasn't said to attack in hopes of obfuscating whatever the issue at hand actually is.

    As for the rest of it, yes, yes, yes, Mel. I've heard it all before... often. I'm sure everyone else older than about age 3 has too. We all know about the Stanford patent and the "original paper" and the damn "algo". What's your point? That, by the way, is rhetorical -- the answer is there isn't one.

    That one trick pony act is getting very tired.
     
    minstrel, Sep 19, 2004 IP
  6. Mel

    Mel Peon

    Messages:
    369
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #86
    My my my aren't you testy today.

    What is getting tiring Mistrel is you habit of answering logical questions with snotty and irrelevant replies. As you claim to be a phsycologist, I would think that you would understand that you really have no idea what my favorites are and are not.

    Surely there is a thread somewhere on Forum etiquette that you could read?

    I suppose that the number displayed when you mouse over the toolbar greenbar is a graph too?

    If you have such a clear and complete understanding of PR then why did you question that there are three seperate figures associated with PR, but that only one of them is used in the rankings?
     
    Mel, Sep 19, 2004 IP
  7. schlottke

    schlottke Peon

    Messages:
    2,185
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    0
    #87
    Minstrel,

    I think you may want to take advantage of the links in your signature..
     
    schlottke, Sep 19, 2004 IP
  8. Mel

    Mel Peon

    Messages:
    369
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #88
    We are saying the same thing Bob, its just that I do not see concensus on the width of the basic directory tool bar, which the source code puts at 40 pixels (for instance 38 for the pr gif and 2 for the negative gif), with others also totalling 40 pixels width ({44+40 exception for google only}38+2, 32+8, 27+13, 24+16, 22+18, 16+24, 11+29,5+35 etc) or on the number of divisions, in fact if you go to the Microsoft page in the Google directory today, there are no PR bars shown.

    Yes totally agree with that except that there are several proposed scales none of which have been proven to be correct, so again we are guessing.

    First because the Private database is not used for PR calcs until it is time to calculate PR, when the Private database is fixed, the PR calculated and then the private database becomes the public database from where the Link: results are drawn. if this is not the case what would you suggest that they do with all the spidered data between PR updates?

    There are two different effects of backlinks, anchor text and PR. Anchor text IMO is updated when the page is parsed (unless the sandbox therories are correct, and they are put into some special location) but the PR effect is only seen when the PR is updated.
     
    Mel, Sep 19, 2004 IP
  9. bobmutch

    bobmutch Peon

    Messages:
    683
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    0
    #89
    Mel: Yes the charts showing the proposed real PR values as they relate to the toolbar display scale can't be proven as Google is not disclosing that information.

    The spidered data is all cached in Googles database. These cached pages of course have the links in them. You can see the cached pages below each of the search engine results. At the time of BL update Google calculates the Backlinks off its cache of all the pages. After Backlinks are calculated they are stored. This process is the backlink update. After the backlinks are calculated a PR calculation is run and before June 22/2004 the next step would be to update the toolbar displayed PR.

    Google's order of results is automatically determined by more than 100 factors, one of them being backlinks and another being real PR (the PR weight toward rankings has been devalue big time in the last 2 years). This is the reason why right after a backlinks update a site that has increased their backlinks quite a bit will get better rankings.

    I know of know articles that support a public and private backlinks database. Nor do I see a logical reason for having it this way. Nor do I see any reason for Google to update the backlinks every 2 days as you have noted when they don't do anything with that update. Again if they did do some thing with that 2 day update then we would get a small rankings change ever 2 days instead of a larger one ever time the backlink update is done.

    The idea of calculating 4 billion pages ever 2 days for backlinks and then doing nothing with that calculation make no sense to me at all.

    The sandbox theory is a filter not a different place that links are put.

    To sum up I hold that web pages are crawled and the Google page cache is updated, about once a month there is a backlink update based off the cached pages, this is followed by a real PR update. At this point if a directory or toolbar display update is scheduled the toolbar or directory values are calculated from the real PR, the directory PR is stored in the directory and the toolbar display PR is stored and accessed by the toolbar. The new backlink and real PR values are now used in Google's calculation for order of results in the search engine.

    The above process is the simplest and most logical way for backlinks and PR to be handled. If you know of papers that show some thing different I would like to read them.
     
    bobmutch, Sep 20, 2004 IP
  10. Mel

    Mel Peon

    Messages:
    369
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #90
    Thats a nice idea Bob but thats just not how Google works. The cached pages in the repository are not used for searching or ranking, instead all the data is parsed and put into various databases and the searches and rankings are done on those database, not on the cached pages.

    You don't have to calculate backlinks you only have to record them.

    The original Google paper gives a pretty good description of how its done:

    and if you look at the accompanying Illustration (is there anyway of posting a graphic here?) you can see that there is in fact a seperate link database and a seperate anchors database.

    In fact most search engines operate in a similar fashion, there is simply not enough time and resources to parse and sort every one of the billions of pages in the index for every search and still return an answer to a search query in more or less real time.
     
    Mel, Sep 20, 2004 IP
    SEbasic likes this.
  11. bobmutch

    bobmutch Peon

    Messages:
    683
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    0
    #91
    Mel:
    I don't see where I said that the cached pages were used for searching or ranking. In fact if you read my post again you will see the direct opposite. I do see where I noted that the monthly backlink update was "based off the cached pages" and that "after Backlinks are calculated they are stored." and that the PR calculation is based off the "stored" results of the backlinks.

    I still don't see support for a private and a public backlinks anchors file that contains where links points from and to, and the text of the link is stored.

    The indexer parses the web pages in the repository and stores the links in the anchors file. From this file the URL resolver generates the links db which is used for the PR calculations and it is from this same links db that the displayed BL are "calculated." You say they are just recorded. Well when I type in link:URL it gives me the links for that URL. I say they have to be calculated at some point. You say they are only recorded. Do you hold they "only recorded" and then are calculated on the fly? Or are the number of links calculcated for each page prior to the request, which is what I am maintaining?

    You noted there was a public and private backlinks database -- "then the private database becomes the public database". My comment to you was I know of no "articles that support a public and private backlinks database." I still don't see support for a public and private backlinks database. I didn't say there was not a links and anchors db. I said I didn't see support of a public and private backlinks db.

    I think that is a given. Again if you are implying that I hold this view, you are reading this into my posts. I have at no time, in any way or form, implyed either in article or posted in forum such a silly idea. To imply, if that is what you are doing, that I hold to that kind of a view is silly. A 3 year old might think that but I think that we all are further along in our understanding of search engines
    than that.

    If you look over my last post to you, you will note that I said the direct opposite of this. "The new backlink and real PR values are now used in Google's calculation for order of results in the search engine."

    Could you address the issue in my last post that I asked you about concerning your view that backlinks updates are done every 2 day. Why would Google calculate the repository every 2 days for backlinks and then not use those backlinks. Again those that get a good number of new backlinks see there rankings change after the monthly backlink calculation.

    Also could you address the issue of the special location that I asked you about in my last post. You posted, "Anchor text IMO is updated when the page is parsed (unless the sandbox therories are correct, and they are put into some special location)". There is a sandbox and I maintain it is a filter not a special location where some is put into to.

    I am enjoying this series of posts Mel. It makes one think things through when you put your thoughts down in writing. Also when we commit ourselves to positions in a public forum we allow our peers to examine them and point out where we have misconceptions. I welcome this to all my views and I trust you do also.
     
    bobmutch, Sep 20, 2004 IP
  12. Mel

    Mel Peon

    Messages:
    369
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #92
    Ok Bob then lets go one step at a time,
    The pages as spidered are put into the repository and then the indexer parses them and puts the results into various databases.

    I assume you agree that the pages are parsed as soon as they come in and not stored for a month or so before being entered into the databases. That being the case if there is only one links database why do we only see the public backlinks updated once a month or so? The data is going somewhere every time a page is parsed ( and that is pretty well a continuous process) and if there is only one links database why don't we see the links being updated every minute?

    When you query a database asking how many links it has to a particular page, and or what the details are there is no calculation required to answer that question, it simply provides you with the data that is in the database for that page as it is recorded in each field.

    see above

    Ok fine. Then these figures are used for all ranking calcs in the next month or so, and no changes are seen by the user but all during that month the search engine is busy recording the results of the spidering of pages and those results have to go somewhere. It makes no sense to parse the pages when they come in for some part of the data, and then at the end of the month parse all the pages again just to get the links data does it?

    Again there is no need to calculate backlinks you just parse the pages and record the backlinks in the links database, and when you want information on links you query the database and it provides the answers to you.

    OK lets assume its a filter that says such and such backlinks should not count in the rankings for the next three months. What is a filter? It is a program that selects certain links based on certain criterea. I am sure you understand that any search engine prefers to do as much preprocessing as possible, so I do not think it likely that every time a query is run, the entire links base is searched and the links selected if they conform to such and such a criteria. Much easier to simply have the filter run over the links database every so often and remove to another database those links that it does not want shown, and then to replace them when they have reached their "maturity date"

    Assuming that all the data is stored in the filter itself is not likely, since the filter would soon grow to terrabyes in size, and so the data on the links that are being blocked must be stored somewhere. A normal query will not see the links so they must not be in the links database, so where are they? I am postulating that they are in a sandbox database.

     
    Mel, Sep 20, 2004 IP
  13. bobmutch

    bobmutch Peon

    Messages:
    683
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    0
    #93
    Mel: Let me read over a number of articles and see if I can sort this all out. I'll try to get back to you and pick this all up. I am getting busy with some work.
     
    bobmutch, Sep 20, 2004 IP