The first problem is right here - DMOZ can hardly be considered an allinclusive or particularly expert collection, and of course there are no relevancy rankings there. While nueral networks are great gadgets they have not as yet been demonstrated with a collection of data as large as you are talking about, or with as far reaching goals. The power needed to predict the number of runs a baseball team will make in a year is trivial compared to indexing and ranking the entire web. First of all links are used in many more ways than as votes, but then you know that I am sure. At this point in time we have search engines which use fairly simple principles and while you may not be satisfied with the results (personally I don't have the time to read everything written on a particular subject) but the simple fact is that these search engines which you seem to think so little of have increased the knowledge available to the average person by thousands of times in a very short space of time. They may not be perfect, but then I suspect that one persons perfect and anothers may be quite different.
http://infomesh.net/2001/swintro/ Seems to me like a bunch of unrealistic dreamers trying to teach computers to do things they will never be able to do... but it reminded me of your ideas for getting better searches out of google. Ever hear of Semantic web project? Have an opinion? Thomas.
I think Google PR Level should fluctuate with their current market price. Maybe number of shares that a website owner has in his portfolio! Tony
It does not need to be "all-inclusive"--only large enough to be a diverse representative database of human-evaluated and human-selected pages. Whether it is "particularly expert" is a matter on which you may find many opinions differing from your own. The lack of "relevancy ratings" is immaterial: all dmoz might do (and that is just one possible way of approaching the issue) is serve as a sample of sites found by reasonably expert editors to be significantly better than average for the categories in which they appear. "Indexing and ranking the entire web" is not what a neural network would do: the indexing would be done as it is. The nn would serve, in the development stage, as a way to generate some filters that could then be applied to individual pages. But in any event, I know of no work suggesting that such networks are not scalable. As the recent discussion on the LocalScore patent has indicated, it looks as if only now, with that patent, is Google (at least) moving away from rather straightforward use of PageRank. And there are deficiencies in the newer method as well, which are generically the same as with PageRank. The crux is this: The test is not whether most of a search engine's top results are mainly relevant pages--it is whether most of the relevant pages are in the search engine's top results. There is a universe of difference between those two tests, and the present engines fail it. I only just finished a search for a fairly straightforward thing, public-domain web resources, and I was finding useful, highly relevant pages out to #500, which is as far as my patience could endure. Information searched for is of two basic types: quick, obvious facts--what is the height of Mt. Everest?--which the SEs today do quite well at handling (as, in fact, they mostly did before PR), and what we may call scope knowledge. "Scope knowledge" is what we want when we want or need to learn much about a subject, and need to read most or all of the informed and informative essays and like works on the topic, so that we can both learn it and learn the spectrum of opinions current in and about it. At that, the SEs are not at all good. That they are better--far, far better--than nothing is undeniable, and I never suggested otherwise. But there is a classic trap--in business, I call it the "I'm making money" trap--in which any nontrival success is taken for maximum possible performance. "I'm making money in my business, so I must be doing eveything right." Put that baldly, it sounds as dumb as it is, but you'd be surprised how many people lead most or all facets of their lives by that fallacy.
I had not heard of it before, and only glanced over the first few paragraphs. I don't at all think they're unrealistic. It's more than anything else a categorization, or data-organization scheme. It isn't going to happen overnight, nor do they seem to suggest it will. They are simply looking ahead to the future development of the net, and ways to better organize and use what's on it. It's larger in scale, but scarcely more radical, than moving from HTML to XHTML. I don't think, though, that it is closely on point for this discussion.
Remember when we are all speaking of Google and waxing eloquent about our theories that Google is a computer. Don't give the "computer" too much credit AJ
Owlcroft, Ok, forget the semantic web. I am really interested in what you are saying, and so I will ask another question to better get at your "worldview." Could it be that the concerns you are raising apply primarily to people that are bad at using search engines? In other words, if you took the average guy and trained him for a week... in public school, say, when he was in the sixth grade... could it be that all the fancy filters and stuff you are clamoring for could be unnecessary? I would start such a sixth grade curriculum with http://www.searchlores.org and if I had to get even more specific http://www.searchlores.org/tips.htm Your "problem query" was a case of what you called "scope knowledge", specically searching for public-domain web resources It seems to me that in this case Tip # 5 from searchlores would apply. "5) NARROW DOWN [ AND | & | + ] and ELIMINATE MERCILESSY [ AND NOT | | | - ] Narrow your searches by linking your search terms with AND or &, or simply use the plus sign [+]. The search engine will find only those pages that contain all of your search terms. Similarly, exclude pages that are not relevant to your search by preceding the search term with AND NOT or | or simply use the minus sign [-]. +"search engines" +hints +tips +techniques -tits -sex -"make money" (5200) is better than the more simple +"search engines" +hints +tips +techniques (7700) " This is advice that in my opinion CANNOT BE IGNORED. Too bad most people do, because they don't know optimally how to use SEs. But the solution in my opinion, is just teach people how to do this, kind of like kids learn how to use the card catalogue in public libraries when they reach a certain age. Anyway, they taught us this when I was in high school, but I am getting old and libraries in the USA are running out of money so now who knows. But what I am getting at is, maybe your scheme is grandiose and unworkable, cool in theory but intractable in practice, and the solution is people just need to get better at using search engines. How *you* could argue to get me to pay better attention to what you are saying is, come up with more examples of queries that SEs give bad results on, and then take a look at the "tip sheet" I posted. Can the queries be "fixed" by following the advice in the tip sheet? If yes, forget it. But if the queries can NOT be "fixed," or fixing takes more brainpower than the average 6th grader has, then I want to know about them, and this will be convincing to me that SEs are failing and the grandiose schemes to fix them are truly necessary, and will be implemented by someone if not google, and this is the future of search. So, in a nutshell, to me, this argument is about: Is there information that is being hidden even to people that KNOW how to use SEs, ie follow the advice in the tipsheet, because the SEs are "broken" in some important way. See what I'm saying? Thomas.
Owlcroft, Fravia, the creator of searchlores.org, is also aware of the problem that relevant results are sometimes buried. He created a php script he calls the "yoyo" wand to ameliorate this problem. http://www.searchlores.org/yoyo1.htm Just thought y'all might find this interesting... Thomas.
Searchlores also has something called the "synecdoche approach" to get at scope knowledge. "The 'Moundarren' field case Speaking on a messageboard about haiku (maybe the supreme achievement of zen culture) I advised a friend to check the books of a French editor: Moundarren. Shortly afterwards I did myself a search for Moundarren on the main search engines, and this revealed a cluster of sites -mostly French, but not only - dealing with zen, haiku, and more generally with Chinese and Japanese poetry. It is what I call an 'arrow' or a 'clean cut' for my target. Let's define as 'clean cut' any search query that allows you to 'cut' through commercial crap and get a 'useful' or 'promising' cluster of sites that provides REALLY some useful knowledge (or further pointers)." http://www.searchlores.org/synecdoc.htm
It seems to me that all of these problems stem from one thing... PR. Why doesn't google just turn it off? It seems to be a big problem for them. I don't think it helps them in anyway and only promotes the manipulation of the search engine. Why do they leave it on? I never understood that.
seveinid, I can remember back in the 90's when AltaVista was the big dog and there was no such thing as PR or even Google for that matter. People were still trying to find ways to gain an advantage. It's business. Business is compeditive so people will compete. The only problem I have is when people cheat. If there was no PR, people would still be trying to either find a compeditive edge, or a way to cheat the system, whatever it is. What's all this talk about "learning" how to use a search engine? That's fine for you and me, but most people don't want to have to think about it. They want to put the words they relate to the thing they're looking for in the search field, click the button and find it. If most people find out they need to take a class to use google properly they'll just go somewhere else. Anyone who can look up a word in a dictionary should be able to go to a search engine and find what they're looking for. Google is delivering a service to a customer base. The customers are the people doing the searches. Unless you're selling airplanes or something really complex and potentially dangerous like that you're cusomer doesn't want to hear he has to go take a class to learn how to use the product. That's just not going to happen. It's up to google to deliver the service in a way that the people who want to use it can do so without needing a degree in boolean algebra.
But is it really possible? There are lots of things in life that you can't just 'leap into' as you describe, many of which are essential parts of life just like a search engine. Example: driving a car. In an advanced technological civilisation, your citizens will inevitably need more education to survive. Transport a stone age man into 20th century USA and watch him sink. It's all very well to talk of providing a service to your customers, but sometimes you can only get so simple. Another example, you go to your bank and say "I want a mortgage" - sorry, although it's a simple concept, that's just not enough information. Same with search engines. Edit: Also, "degree in boolean algebra"? Hardly, it would take perhaps an hour, maybe two, to learn pretty much everything you could possibly need to know about how to use search engines to their full potential! Not really that great a sacrifice.
While the ideal search engine would be able to read your mind and deliver fishing information to some searchers who typed bass into the search bar and beer information for others, we are a long ways from that ideal at this time. The search engines are working every day to improve their abilities in this area but until then there is a direct correlation of input versus output. If you are not satisfied with the results of of your searches, learning how to search better will help you to get better results. Most searchers are learning slowly that more specific searches generally produce better results, but this often is along the lines searching for tickets to a particular match by first using the term tickets, then trying football tickets when seeing all the movie ticket pages, then trying US football tickets when seeing that many of the results apply to what the US calls soccer, then trying Packers versus Ram football tickets in order to get to sites that offer the tickets you want. Its up to the surfer, if the wants his free search to be better a half hour spent learning how to search is time well spent.
truly and interesting topic... however , I am still a newbie to this whole SEO thing and still catching up.
I must admit I have participated in buying links once upon a time...and it worked. Not now, I am following some live examples. These are examples that I ,well, yes bought links from...expensive links. PROBLEM, I did not get the spider on a 3 month term. Something was wrong, I say. I learned that G had gained wisdom of this and silently created a list of PR blocked sites that were known to be participating link selling. I have now noticed all sites that were on the PR block list have been removed. Sponsers that are now purchasing links on these same sites ARE recieving spidering and showing massive bl's from these sites. The difference is these massive bl's are NOT helping SERP's. I think G is wise to what's really going on for the most part. By that I mean the TRUE VALUE of a link.
Here is what I don't understand. I have not read this entire thread, but my opine would not change. If I were to buy links (which I never have) but if I were, that is a way of advertising. I pay to send out flyers. I pay to get things printed online (PRWeb) etc... is that not paying for links?
PR is at the heart of their business model and marketing plan, the toolbar (now the linking toolbar) is a real time MIS device that delivers real time information back to Google, if not for the PR gauge no one would use the ToolBar, next Google will develop a browser, within that browser will be ads generated on the content of the URL/pages that the surfer is on, with no money going to the publisher. The linking toolbar is creating a controversy at this time with webmasters worried that Google will send the traffic to the competition from the content on their pages. Nothing new under the Sun, if a business can get away with not paying someone they will.
What a great discussion. A couple of points. Yahoo seems to be able to use onpage content rather efficiently. Sites I run tend to have both a lot of "store" competition and a fair bit of porn spam when I look at the Google searches; but Yahoo seems to be able to avoid that. Second, were I designing the algo I would get the standard results and then run them through two sorts of filter. The first would be a spaming/page swarm check. This would look for the standard spamming techniques and ruthlessly cull all the sillier methods. And I would do this every time there was an update. The second would be a "similiars" checker. Essentially this would look at a page relative to the other pages Google considers "similar". That filter would look for anomalies and flag pages which seemed odd. Then, using some of the billion dollars Google made last quarter, I'd hire a bunch of online analysts who could be paid a nominal sum per page checked. Yup, humans. They'd get a page and the search string which generated it and they would rank relevancy. Each page would go to, say, five checkers and they would click 1-10 on relevant. That would feedback into SERP. Yes, at the outset, that human checking would be expensive; but Google is facing big competition on SE and adwords. Having 100,000,000 pages human checked would be a huge improvement and would give Google a competitive advantage.
Google and Yahoo are already sending some visitors to my site... I hope I get the hang of SEO as soon as possible. Thanks all for the useful tips