1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How would you rank pages?

Discussion in 'All Other Search Engines' started by nfzgrld, Jan 30, 2005.

  1. #1
    It's obvious that G, Y, M and everyone else out there has some kind of formula for ranking pages and calculating relevance in a search. How would you do it? I'm getting ready to do a major rewrite of my search engine and want to do a better job of returning relevant results. Currently I'm just counting up the keywords and returning the page with the most matching keywords to the top, and the rest in order under it. I think that's a bit over simplistic. I want to actually assign rank to each entry in the index much as google does. Then use that as part of a more comprehensive calculation to try and return the most relevant results possible. The question is, what criteria should the ranking be based on, and how should it be calculated?
     
    nfzgrld, Jan 30, 2005 IP
  2. pwaring

    pwaring Well-Known Member

    Messages:
    846
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    135
    #2
    Well, the first thing to realise is that however you rank sites someone will find a way of cheating, just like when Google brought out page rank everyone starting setting up reciprocal links in order to get ranked higher (or at least attempt to).

    I don't think any one method is good by itself, you have to look at a combination of things like keywords (obviously you're going to need these to decide whether the page is relevant), context (quite difficult to do, but if someone searches for say Java programming you don't probably want links about coffee or the country), authority (how many people think this page is "worthwhile" by linking to it) and perhaps things like how often the site has been updated or how long it's been around.
     
    pwaring, Jan 30, 2005 IP
  3. nfzgrld

    nfzgrld Peon

    Messages:
    524
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #3
    All of that stuff is relevant, but it seems to me there has to be a way to accurately address the question of what is most relevant. It's easy to count keywords but then it's way too easy to cheat on that. Currently my spider has a feature that allows me to look at the keywords meta tag and count the number of times a particular word appears there. If it's more than "X" number of times I deletes the page from the index for spamming. One of the first things I'm going to do in the rewrite is not only get rid of this feature, but I'm not even going to look at the keyword meta tag at all. It's not worth the trouble.

    It's a tough nut to crack. I'm amazed that google would even consider counting the number of links to a site in order to measure its value. The number of links to a site is almost always more a matter of marketing than people linking to the site because they think it's good. It's the content that matters, not the number of times someone links to it. What I want to try to figure out is how to differentiate between "Blue Dress" and "Dress Blues." One is a blue dress, the other is a military uniform. Of course, depending on your gender and branch of service, your dress blues may well be a blue dress.

    It seems to me the only way to really do this is to find a way to detect the context in which a word or phrase is used. Of course that is only useful if you know the context the person performing the query is thinking in, and that's just not possible for us mortals.

    I guess what I have to do is find a way to gauge the quality of a page from a general standpoint, then the relevance of the page for a given query. Then find a calculation that will put the more relevant, high quality pages at the top of the serp. The question is, how do you gauge quality?

    Is the age of the page or how often it is updated a measure of quality? No, not necessarily. Some pages should be updated frequently, some should not. Some pages should never change. Does that make them less relevant or of lower quality? What if a page is a week old and is exactly what the person running the query is looking for? Should that page be penalized due to it's youth? What possible reason could there be for that. Should the number of pages a site has be factored into the quality or relevance calculation? What about the amount of text on the page?

    I can't tell you how many times I've had to read through 100's of lines of text on a page to find the one sentence with the information I wanted. Now, is that a quality or relevant page just because it has a lot of text on it? I think it would be higher quality if all it had was that one sentence. What's needed here is a smarter search engine. Unfortunately I'm not sure I'm up to building one. Until we find a way to let the machine read the user’s mind this isn’t going to get any easier. In either case, it’s going to take me a while to even figure out the best way to go with this. I'll probably "finish" the damn thing a dozen times before I'm done with it.
     
    nfzgrld, Jan 30, 2005 IP
  4. Gutsy

    Gutsy Peon

    Messages:
    21
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    This discussion is really very informative and i must say that i am learning a lot from this forum.

    Getting back to the topic in question, how do the search engines rank the pages in terms of relevance? And how can one works towards ethically increasing the page rank? I am still not clear of the methodology they currently use and this bit of information might be very useful. Can anyone throw more light on that...

    Regarding the criteria of relevance, the context for which the info is being searched is unknown to the search engine as rightly pointed out by nfzgrld. The www being such a large storehouse of information, it is an extremely difficult task to assign relevance to the content in terms of what the end-user wants.

    One idea to increase the possibility of the relavant quality content for the end-user could be to turn more specific and ask a couple of more questions from him/her regarding his/her search. In this way, you (the search engine) would know more about the target audience and relate well to the type of content he/she might be looking for.

    For example, there might be two people doing a search on Malaysia. One might want to visit the place on a vacation and the other is just a student looking for the historical perspective of the area. If the search engine queries the person about the purpose of the search and the profession of the person doing the search and then relates the keywords to the purpose and/or profession, then the results might be more appropriate in terms of the relavance.

    This is just a passing thought and would like to know your views on the viability of such an option. :)
     
    Gutsy, Jan 31, 2005 IP
  5. Dominic

    Dominic Well-Known Member

    Messages:
    1,725
    Likes Received:
    121
    Best Answers:
    0
    Trophy Points:
    185
    #5
    The way google does but with greater weighting given to local rank and less focus on the age of an IBL.

    PS - you need to index a lot more pages than you currently do... start from dmoz.
     
    Dominic, Jan 31, 2005 IP
  6. pwaring

    pwaring Well-Known Member

    Messages:
    846
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    135
    #6
    The other method that I'd use to rank pages is have some kind of dynamic sceme a bit like what I think Google does for Adwords - if your result gets clicked on more often, it gets moved higher up the results list next time (or rather you have some form of "click through rank" that is changed). I don't know how well that would work though, you'd have to give new sites a high rating to ensure that they didn't start at the bottom.
     
    pwaring, Feb 1, 2005 IP