How do you think designing a search engine?

Discussion in 'All Other Search Engines' started by fesite, May 26, 2006.

  1. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #41
    I've got a couple of search engines, one I just took over and one I'm about to launch that is niche. Here's my take:
    - if you're planning on a real search engine with millions or 10's of millions of documents, you'll need nutch/lucene. Starting from scratch isn't really a good idea unless you're heavily backed. Anything else other than nutch likely won't do the job with that volume of documents. php/mysql/asp solutions really aren't up to the task of something that big. I've looked at all the solutions I could find and nutch was the only thing viable.
    - getting ppc software is tough. Real tough. Here's what I'm using: http://www.smarterscripts.com . Don't be fooled by the ugly site, the software works reasonably well - and again is the best of what I could find. Everything else is either hyper expensive or isn't up to the job of advertisers creating their own account, depositing funds, etc. (I mention the ugly site because I actually got told to go back twice - my initial thoughts were that the product was mickey mouse because of the site. Turns out it works fine. and is dirt cheap, <$100). The only problem with that script is that it pulls ads based on exact match not a broad match. I' haven't worked through that yet.

    Whitelisting sites is hard work. One of my sites has a few hundred thousand or more whitelisted sites that I've mostly reviewed. My eyes are still burning. I had to do it because algorithmically there wasn't any way to sort the sites I was targetting. If you can do it algorithmically, all the better.

    If you're going to seed an se by using 'whitelist' only sites then what you really are going to do is have to build a whitelist of whitelisted sites. i.e. rather than reviewing individual sites, tag some sites as being those that only link to your niche - i.e. you crawl only your whitelist of sites plus sites that they link to - but no further. That way you can whitelist directories in your niche and get a lot of sites to seed the engine.
     
    wheel, Jun 5, 2006 IP
  2. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #42
    Oh, and if you want a real handle on what's involved, here's the book you're going to want to read:
    "Mining the web" by Soumen Chakrabarti
     
    wheel, Jun 5, 2006 IP
  3. webmasterlabor.com

    webmasterlabor.com Peon

    Messages:
    2,889
    Likes Received:
    76
    Best Answers:
    0
    Trophy Points:
    0
    #43
    Thanks for the lead re the PPC script.
     
    webmasterlabor.com, Jun 5, 2006 IP
  4. webviz

    webviz Peon

    Messages:
    216
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #44
    it's extremely technical. you would have to be rich to start one.
     
    webviz, Jun 5, 2006 IP
  5. webmasterlabor.com

    webmasterlabor.com Peon

    Messages:
    2,889
    Likes Received:
    76
    Best Answers:
    0
    Trophy Points:
    0
    #45
    Or start small and ramp up.
     
    webmasterlabor.com, Jun 5, 2006 IP
  6. Emperor

    Emperor Guest

    Messages:
    4,821
    Likes Received:
    180
    Best Answers:
    0
    Trophy Points:
    0
    #46
    What skills do you have? There are many free tools available but you need some basic knowledge.
     
    Emperor, Jun 6, 2006 IP
  7. Emperor

    Emperor Guest

    Messages:
    4,821
    Likes Received:
    180
    Best Answers:
    0
    Trophy Points:
    0
    #47
    What skills do you have? There are many free tools available but you need some basic knowledge.

    PS: Sorry about double posts please remove thanks.
     
    Emperor, Jun 6, 2006 IP
  8. mahmood

    mahmood Guest

    Messages:
    1,228
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    0
    #48
    Why would somebody want to make a search engine in the first place? Nobody would use it unless you find a magic technique that gives exteremly relative result.
    Actually there is no magic remained. What else can somebody come up with? Other search engines have already considerd age, backlink, keyword denisity,relvancy of backlinks,duplication ....

    .
     
    mahmood, Jun 6, 2006 IP
  9. webmasterlabor.com

    webmasterlabor.com Peon

    Messages:
    2,889
    Likes Received:
    76
    Best Answers:
    0
    Trophy Points:
    0
    #49
    Two Words: CONTENT SPECIALIZATION
     
    webmasterlabor.com, Jun 6, 2006 IP
  10. mahmood

    mahmood Guest

    Messages:
    1,228
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    0
    #50
    You mean it is a course work or something?
     
    mahmood, Jun 7, 2006 IP
  11. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #51
    He means that the sites you search need to be specific to something. If you want the 'everything' search, everyone uses Google. But say you wanted to just search, i dunno, maybe you wanted to search *only* the library of congress. Now, if you had a search engine that only had information from the library of congress, well, you'd be better than google for those searches right?

    The province of Quebec, Canada is heavily French. While the rest of Canada uses Google, in Quebec I think it's third. Because all of the French people in Quebec want a French search engine - and that's what they use, a French search engine, which Google ain't.

    I generally say you're best to get a niche. Content specialization is an example of that - find your niche by indexing specialized content.
     
    wheel, Jun 8, 2006 IP
  12. webmasterlabor.com

    webmasterlabor.com Peon

    Messages:
    2,889
    Likes Received:
    76
    Best Answers:
    0
    Trophy Points:
    0
    #52
    There are many publicly available pieces of information that people will pay money to get or visit a site frequently for. Find a high value market, identify it's information needs, find the data, and monetize your data collection.

    Example: search engine for service providers, search engine for niche content like comics, etc.
     
    webmasterlabor.com, Jun 8, 2006 IP
  13. nvidura

    nvidura Well-Known Member

    Messages:
    1,780
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    150
    #53
    Hows the prograss?
     
    nvidura, Jun 9, 2006 IP
  14. blue_angel

    blue_angel Well-Known Member

    Messages:
    1,174
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    130
    #54
    You can buy ready script search the forum you will find similar post
     
    blue_angel, Jun 17, 2006 IP
  15. webmasterlabor.com

    webmasterlabor.com Peon

    Messages:
    2,889
    Likes Received:
    76
    Best Answers:
    0
    Trophy Points:
    0
    #55
    The value lies not in the script but in the CONTENT
     
    webmasterlabor.com, Jun 17, 2006 IP
  16. old_expat

    old_expat Peon

    Messages:
    188
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #56
    Are you using one of the open source packages?
     
    old_expat, Jun 19, 2006 IP
  17. old_expat

    old_expat Peon

    Messages:
    188
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #57
    It would be a massive undertaking with no guarantee of recovering you funds ..
    but, after some recent searches on Google, which showed me directories .. which pointed me to Made For Adsense sites .. I would welcome such an engine.

    You might think of starting in a niche and expanding if it clicks.
     
    old_expat, Jun 19, 2006 IP
  18. Linknz

    Linknz Peon

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #58
    Well I have designed one, it took three years working around twelve hours a day.
    If you wanted a Meta search engine it would be easy just to use a script. The development of the webspider alone took nine months.
     
    Linknz, Jun 19, 2006 IP
  19. Linknz

    Linknz Peon

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #59
    So far I have just started to index the forum and the webspider has so far found and indexed the following files:
    http://forums.digitalpoint.com/ (34 pages)

    -
    -faq.php
    -faq.php?faq=dp_faq
    -faq.php?faq=revenue_sharing
    -forumdisplay.php?f=11
    -forumdisplay.php?f=25
    -forumdisplay.php?f=26
    -forumdisplay.php?f=27
    -forumdisplay.php?f=35
    -forumdisplay.php?f=4
    -forumdisplay.php?f=43
    -forumdisplay.php?f=46
    -forumdisplay.php?f=47
    -forumdisplay.php?f=5
    -forumdisplay.php?f=6
    -forumdisplay.php?f=62
    -forumdisplay.php?f=65
    -forumdisplay.php?f=66
    -forumdisplay.php?f=68
    -forumdisplay.php?f=69
    -forumdisplay.php?f=7
    -forumdisplay.php?f=71
    -forumdisplay.php?f=72
    -forumdisplay.php?f=8
    -forumdisplay.php?f=82
    -forumdisplay.php?f=84
    -member.php?u=11373
    -member.php?u=17339
    -member.php?u=21655
    -member.php?u=23595
    -member.php?u=29255
    -member.php?u=7036
    -member.php?u=7399
    -register.php
    -search.php
    -search.php?searchid=1457568 ​
     
    Linknz, Jun 19, 2006 IP
  20. wasted soul

    wasted soul Banned

    Messages:
    212
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #60
    it will need a lot of work anyway goodluck to you
     
    wasted soul, Jun 22, 2006 IP