1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Open Source Search Engines

Discussion in 'All Other Search Engines' started by anthonycea, Jun 17, 2004.

  1. #1
    Nutch is a open source software code that is the platform for new engines to build on.

    Check out the following

    http://labs.yahoo.com/demo/nutch/

    http://www.objectssearch.com

    http://www.mozdex.com

    These are all new OPEN Source engines using the www.Nutch.org platform.
     
    anthonycea, Jun 17, 2004 IP
  2. Help Desk

    Help Desk Well-Known Member

    Messages:
    1,365
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    180
    #2
    Wouldn't an "Open" source search engine be too easy to trick?
     
    Help Desk, Jun 17, 2004 IP
  3. xml

    xml Peon

    Messages:
    254
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Ha-ha yeah, no tricking necessary, you'd know the algo by looking at the source. Let's hope Google goes open source. :D
     
    xml, Jun 17, 2004 IP
  4. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Yahoo is looking at it for a reason, you will just have to study the issues.

    Again, I think that a company like IBM, with all of the supercomputing and database software technology will win out in the end.

    In the mean time, a lot of players will try a lot of new things.

    It is just a matter of time, search will become a commodity that will be able to be purchased from IBM wholesale, then small companies will add value and their own spin on that data and create a end user interaction with their flavor.

    Just think we could have "MasteroftheUniversesearch.com" powered by IBM.

    Now you know a programmer with Shawns skills could pull it off, I just wonder what flavor he would add to the search function?

    Or a programmer like Shawn or anyone else can download the www.nutch.org open source code for free and start their own engine.

    Who knows who will win the SearchEngineWars.whatever, in the end only those with the most resources win, IBM has the most resources.
     
    anthonycea, Jun 17, 2004 IP
  5. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #5
    Open source search engines (or even ones that are licensed with the algorithms) will never truly be competitive for relevant results, because of the fact they ARE open source. Which means SEO people can get into the guts of them, and see exactly what it considers important for relevancy.

    So they are wide open to SEO. It would be the same as if Google publicly disclosed their ranking algorithms. Instantly you would see a lot more crap at the top of the results because people know exactly what the search engine deems important.
     
    digitalpoint, Jun 17, 2004 IP
  6. xml

    xml Peon

    Messages:
    254
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6
    You think IBM could possibly pull a search engine out of their hat? Thats an interesting thought. However Microsoft have similar scale resources, and may pull it off. Then again, Yahoo! and Google have the power of their brands on their side.
     
    xml, Jun 17, 2004 IP
  7. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #7
    IBM was involved in search when Larry and Sergey were 10 years old, read the thread here at Digital Point on "IBMtheKingofSearch?" and look at the articles linked from there.

    Shawn below is a interview with the programmer and one of the creators of Nutch, you guy's talk the same talk, two peas in a pod!

    http://blog.outer-court.com/archive/2004_05_28_index.html#108573025728740424
     
    anthonycea, Jun 17, 2004 IP
  8. erwig

    erwig Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    It's true that it might be easier to manipulate the rankings if you have the source code. This is what nutch.org has to say in their FAQ:

    Won't open source just make it easier for sites to manipulate rankings?
    Search engines work hard to construct ranking algorithms that are immune to manipulation. Search engine optimizers still manage to reverse-engineer the ranking algorithms used by search engines, and improve the ranking of their pages. For example, many sites use link farms to manipulate search engines' link-based ranking algorithms, and search engines retaliate by improving their link-based algorithms to neutralize the effect of link farms.

    With an open-source search engine, this will still happen, just out in the open. This is analagous to encryption and virus protection software. In the long term, making such algorithms open source makes them stronger, as more people can examine the source code to find flaws and suggest improvements. Thus we believe that an open source search engine has the potential to better resist manipulation of its rankings.

    ---- End of Quote ----

    I think the idea is that the ranking algorithm will be adapted so quickly that the SE may be able to win the cat and mouse game against SEO optimizers because so many programmers will be writing the code. The main idea behind a search engine is of course to eventually return the pages that a human would deem most relevant if he/she had read every single page on the net and personally answered the query.

    Please note that the users of the open source code can tweak the importance that the search engine puts on certain aspects and they don't have to make that public. So you're far from knowing what exactly will make your page rank highly. You already know that most SEs look at headlines, titles, keyword density, link popularity, etc. You just don't know the weight of each of these aspects.

    Christian
     
    erwig, Jun 17, 2004 IP
  9. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #9
    As with the "Hilltop algo" thread, clearly the gold standard is an algo that cannot effectively be spammed. That may sound at first blush like an oxymoron, but not necessarily. In any event, the clear point of the OS advocates is that if we don't try, we'll never know. What this or that person or group may not be able to come up with, the entire programming world may.

    Clearly, a sufficiently large panel of intelligent humans could--if not with ideal speed--be an absolutely "unspammable" form of "algorithm"; so the question boils down to "Can we make an 'expert system' sufficiently close to human judgement?"

    Has anyone yet done any work with neural networks? That looks to me like the most promising avenue of approach right now.
     
    Owlcroft, Jun 17, 2004 IP
  10. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Owlcroft, you speak of AI, really if computer search is converging with AI which is still way off, many of us will not be around when this happens.

    You are right on when you speak of the goals of the SE executives in their search for better search.

    Industry experts have been saying the same things you just mentioned.
     
    anthonycea, Jun 17, 2004 IP
  11. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Has anyone yet done any work with neural networks?

    Quote by Owlcroft above


    Who are these folks Owlcroft? Can you tell us what they do and give a link for them?

    Thank you
     
    anthonycea, Jun 17, 2004 IP
  12. mushroom

    mushroom Peon

    Messages:
    369
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Do You have a mouse, use it, click on thier name (view profile).
     
    mushroom, Jun 17, 2004 IP
  13. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #13
    MUSHROOM when you find a live link for the company he mentioned "Neural Networks" let us all know, would you?
     
    anthonycea, Jun 17, 2004 IP
  14. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #14
    I have assumed--rightly? wrongly?--that everyone here knows what a neural network is and, at least in a broad-brush way, how such things work.

    This is scarcely science-fiction futurism: neural networks are doing useful work right now.

    (I don't believe anyone classes neural networks as "AI", which is a dubious concept. I personally think AI is, in the very long run, an achievable goal, but who am I to argue with Roger Penrose?)

    If none of the leading players in SE are now looking long and hard at neural networks, I am one surprised puppy.
     
    Owlcroft, Jun 17, 2004 IP
  15. anthonycea

    anthonycea Banned

    Messages:
    13,378
    Likes Received:
    342
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Wrongly Owlcroft, remember the old saying about ass u me, I KNOW you are aware of that.
     
    anthonycea, Jun 17, 2004 IP
  16. tphyahoo

    tphyahoo Peon

    Messages:
    140
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Where is all the processing power, and the data storage, going to come from?

    What popped into my head is a kazaa/skype type thing that somehow interoperates with nutch. Users could determine their seeds, algos, and seo spam filters, for the results that their "node" is responsible for.

    Problem is, we don't really have p2p db. And I don't know if authority graph type (aka pagerank) calculations can effectively be performed on a distributed network.

    But if they could... it would be cool! Seo spam in a zero info regime is such a frustrating, stupid (if lucrative) problem to waste brainpower on!

    thomas.

    ps write it in perl! :)
     
    tphyahoo, Oct 27, 2004 IP