Debt Consolidation - Debt Consolidation - Bollywood India forum movie reviews - Kamala Harris - Wordpress Themes

PDA

View Full Version : Page Count as weighting factor


john_loch
Oct 5th 2004, 8:17 pm
Hi Shaun,

How do you determine pagecount ?

I know it sounds like a stupid question, but here's why i ask:

If you're using the 'site:' cmd on G, the numbers will be falling considerably short of the mark ie..

[site:digitalpoint.com] yields 91,600

while appending a query..

[site:digitalpoint.com digitalpoint] yields 170,000 (almost double the number of the typical site: cmd use.

I know that my site sees a significant improvement (a touch closer to reality) when using an arg.

How can these differences be addressed ? You may already have addressed this somehow - it's just the differences are site related, not a standard reduction/increase in numbers across the board, and I obviously want to ensure that what I bring to the network is properly assessed.

I'll bet I wrote all this for nothing, and the answer will be an all too obvious one liner :)

a389951l
Oct 5th 2004, 9:14 pm
http://forums.digitalpoint.com/showthread.php?t=2331&highlight=indexed+pages

Read Shawn's post #4.

john_loch
Oct 5th 2004, 9:23 pm
http://forums.digitalpoint.com/showthread.php?t=2331&highlight=indexed+pages

Read Shawn's post #4.

Thx a389951l, but the thread doesn't address my question. It simply states that there are variations between the API results and G web.

The same (above) can be applied to the API as well. It's primarily about the addition of the query arg.

What the other thread *did* remind me of was the weighting cap, which seems to be 22,500.

Just the same, I'd still like to know whether the above is considered/addressed.

Cheers

JL

digitalpoint
Oct 5th 2004, 9:51 pm
The variations between the API and www.google.com are relatively similar (percentage-wise) from site to site. So yeah.. even though the API returns lower results than www.google.com, it returns lower results for everyone, and the weight is just relative to everyone else on the network.

thebassman
Oct 6th 2004, 12:30 am
So you use the API results, then?

john_loch
Oct 6th 2004, 1:44 am
I think I understand what you're saying (the percentage differences are essentially the same for all sites via the api and Google web.)

But what I'm really asking is this:

If I use a keyword with the site: cmd (in original post) the number of pages found is far higher than without the keyword (regardless of whether this is done via the google search page or the API).

Given that we all have our own specific keywords that we know will return a higher count (in original post), I suppose I'm asking is there any plan to allow the use of an appended keyword via the user account for site: queries (*which I presume you're using to check the number of indexed pages* (you call it 'pages in URL' in the keyword tracker.)).

The percentages change quite significantly across sites, so unlike the differential between web and api, it can't really be generically applied.

Of course this may be entirely inconsequential and a waste of space to implement - but I see it as a way to get more accurate results from G (especially for those with low page counts, who would really notice it).

:)

T0PS3O
Oct 6th 2004, 2:12 am
The search query you use has got nothing do to with page count. It's just like any other search but limited to that site. That's no good page count at all.

john_loch
Oct 6th 2004, 2:21 am
It's just like any other search but limited to that site.

Sure. I'll agree with that. So what am I missing..

How do you determine the number of pages currently in G's index for a specific site ?

T0PS3O
Oct 6th 2004, 2:39 am
You can't.

G's count might or might not be accurate, no one knows. But to keep it fair this count can be used since it seems like everybody is affected by its error the same percentage.

Unless Shawn writes his own spider for this, the Network has to stick with what is publicly available...

john_loch
Oct 6th 2004, 3:04 am
I'll agree that the results are never accurate, but I'm yet to see a *better* way to get closer to the true count. I'm also yet to see G overstate the numbers. So, unless I'm mistaken, my original approach is more effective than the standard site: approach.

The objective here is to afford ppl the opportunity to get as close to the real numbers as possible. Not to disadvantage anyone. And if it's implemented at acct level then it's publicly available for all to use.

There's absolutely no need to deploy a crawler, nor is it implied. It's just a matter of minor coding, and an extra field in some db (well, theoretically anyway - I don't presume to know the back end).

Just because not everyones going to use it, doesn't mean it won't be leveraged by those who want to, nor does it give anyone an unfair advantage.

As I said in my previous posts, Shaun may be querying pagecount another way, and this entire discussion may be mute.

T0PS3O
Oct 6th 2004, 3:17 am
But your approach doesn't work because I can think of plenty of sites which don't have a certain keyword plastered on each page.

DP might have digitalpoint on each page but I'm convinced that 95% of websites / domains will NOT have one sinlge, measurable keyword on all, not even most pages.

So it doesn't work that way IMO. I agree it might get closer to the real number but adding variables doesn't make it a transparent weighting system.

T0PS3O
Oct 6th 2004, 3:19 am
And remember... If everyone indeed is affected similarly then you don't need the real figure because the end result will be the same.

john_loch
Oct 6th 2004, 4:36 am
There are plenty of sites that don't appear in G too but does that mean we shouldn't use it ?

I think I've already pointed out that it can improve things - it's up to end users to determine whether they care for it.

*** Oh and one more thing. It's not necessarily dependant on keywords being present on each page.

Anyway, Shaun will chime in sooner or later :)

digitalpoint
Oct 6th 2004, 8:31 am
So you use the API results, then?
Yes.

I think I understand what you're saying (the percentage differences are essentially the same for all sites via the api and Google web.)

But what I'm really asking is this:

If I use a keyword with the site: cmd (in original post) the number of pages found is far higher than without the keyword (regardless of whether this is done via the google search page or the API).

Given that we all have our own specific keywords that we know will return a higher count (in original post), I suppose I'm asking is there any plan to allow the use of an appended keyword via the user account for site: queries (*which I presume you're using to check the number of indexed pages* (you call it 'pages in URL' in the keyword tracker.)).

The percentages change quite significantly across sites, so unlike the differential between web and api, it can't really be generically applied.

Of course this may be entirely inconsequential and a waste of space to implement - but I see it as a way to get more accurate results from G (especially for those with low page counts, who would really notice it).So you are essentially asking for a way for people to individually have a higher weight if they take the time to do so? :) Sorry, not going to happen. The pages in URL returned are pretty accurate (as far as percentages go), so the only way to make something like that work is if EVERYONE entered a "keyword" and then we would be back to the same percentages.

I'll agree that the results are never accurate, but I'm yet to see a *better* way to get closer to the true count. I'm also yet to see G overstate the numbers. So, unless I'm mistaken, my original approach is more effective than the standard site: approach.

The objective here is to afford ppl the opportunity to get as close to the real numbers as possible. Not to disadvantage anyone. And if it's implemented at acct level then it's publicly available for all to use.

There's absolutely no need to deploy a crawler, nor is it implied. It's just a matter of minor coding, and an extra field in some db (well, theoretically anyway - I don't presume to know the back end).

Just because not everyones going to use it, doesn't mean it won't be leveraged by those who want to, nor does it give anyone an unfair advantage.

As I said in my previous posts, Shaun may be querying pagecount another way, and this entire discussion may be mute.
It wouldn't really matter what the absolute number is because the weighing is determined as relative to everyone else, not an absolute number.

An easy way to do it without the user needing to enter anything would be to make a query with part of the domain in it (for example digitalpoint in my case). But again, it's not going to matter at all, because the weight is relative to everyone else. If everyone's weight goes up by pretty much the same percentage, it's pointless.

john_loch
Oct 6th 2004, 7:12 pm
Thx Shaun.

I'm not convinced that the effects would remain consistent across all sites, (otherwise I wouldn't have bothered proposing it in the first place) - not that it matters.

At the end of the day, I'm sure there are other ways to improve the coop (stats etc) that would prove far more beneficial for all, and make better use of the time you have for it.

Thx for answering the post, it was getting a bit long winded :)

chachi
Oct 7th 2004, 9:39 am
So, we know PR plays a role in the BW of each site. Do you have any idea when the new PR figures will factor into the BW figures in the Co-op?

thebassman
Oct 7th 2004, 10:14 am
And is it the PR of just the main page, or the PR of all the pages within the site?

GuyFromChicago
Oct 7th 2004, 10:24 am
Do you have any idea when the new PR figures will factor into the BW figures in the Co-op?


I would guess the next time the base weights are updated...but that's just my guess :)

Although I did see Shawn mention in another thread that the only time the PR is updated (using the api/keyword tool) is when backlinks are updated. If the api is being used here, maybe the PR change will be reflected in our base weights after the next backlink update?

chachi
Oct 7th 2004, 10:34 am
I was kinda looking for an answer from "The Master of the Universe". But, thanks for guessing. :)

digitalpoint
Oct 7th 2004, 10:48 am
Probably this afternoon.

chachi
Oct 7th 2004, 12:28 pm
awesome. Thanks.