Search engine for forums, news, and blogs

Discussion in 'Websites' started by Nitin M, Sep 7, 2005.

  1. #1
    Looking for critical feedback on the layout (esp. of the results pages) and the overall concept of our new search engine: www.itch.com

    Why build it? Partly because search engines fascinate me and mainly because I feel like there's no good way to search forums and blogs which IMHO are the best source of knowledge sharing on the internet.

    It's still a little rough around the edges and we only have about 10% of the initial content in the index. But, it's enough to start getting some feedback ...

    Thanks in advance for taking a minute to check it out :)
     
    Nitin M, Sep 7, 2005 IP
  2. sharpweb

    sharpweb Guest

    Messages:
    246
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #2
    When I searched for snow I didn't get relevant results back...This was the first result:

     
    sharpweb, Sep 7, 2005 IP
  3. Nitin M

    Nitin M White/Gray/Black Hat

    Messages:
    640
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Yeah, that's a weord one. For some reason with synonyms on, snow sucks. But if you turn synonyms off then snow results are more relevant.
     
    Nitin M, Sep 7, 2005 IP
  4. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #4
    Make sure you adhere to robots.txt (if you don't already). Nothing pisses me off more when spiders don't.
     
    digitalpoint, Sep 7, 2005 IP
  5. Nitin M

    Nitin M White/Gray/Black Hat

    Messages:
    640
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Yeah. We have a note for webmasters for the moment to contact us to opt out or opt in ... but a working robots.txt will be there in the very near future. Feel free to use that form or PM me if you want DP out.
     
    Nitin M, Sep 7, 2005 IP
  6. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #6
    Nah, I don't want to opt out... but here's another suggestion for you. Throttle the spider as far as individual sites go, rather than sucking down as much as it can. Forums are typically very resource intensive on the backend (database mostly), so if you get a couple unfriendly bots going at once, it can turn into a problem.
     
    digitalpoint, Sep 7, 2005 IP
  7. Nitin M

    Nitin M White/Gray/Black Hat

    Messages:
    640
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Thanks. Yeah making sure the spider sucks up very low resources was a big priority to not piss off forum owners.

    We've put a lot of time into into the spider to make it as lightweight as possible. Not sure how closely you looked at the site but we actually traverse the forum sites and intelligently grab only updated threads. It's not a standard spider looking for links and following them... it knows what type of forum site and how to find only updated threads to suck down.

    It also monitors the activity of each forum over time and so it attempts to learn how often it needs to come back looking for new posts, etc.

    We also only ever allow a single spidering operation to access a forum so you will never have more than 1 concurrent process running and for each forum we have a built-in sleep operation that auto-throttles based on the response time of the site for each spidering operation. The parsers are written only for vbulletin and phpbb right now but we'll be adding on the other big guys as we go.
     
    Nitin M, Sep 7, 2005 IP