DontBuyLists.com is scanning each page?

Discussion in 'All Other Search Engines' started by jonathon, Apr 9, 2009.

  1. #1
    Checking my stats yesterday i found this: Agent: Mozilla/5.0 (compatible; DBLBot/1.0; +http://www.dontbuylists.com/) anyone know who they are and what they are scanning for?

    [​IMG]

     
    jonathon, Apr 9, 2009 IP
  2. cormack2009

    cormack2009 Peon

    Messages:
    177
    Likes Received:
    3
    Best Answers:
    1
    Trophy Points:
    0
    #2
    On their webpage, they said:

    DontBuyLists is a company search engine and list creation tool.

    The DBLbot is crawling the web in search of company websites. Company websites are cached and are then searchable on our search engine.

    Because we structure the information found on websites using semantic technology, you can easily find companies, and create lists of companies for free.


    My suggestion, just ban them.
     
    cormack2009, Apr 9, 2009 IP
  3. jenslapinski

    jenslapinski Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Hi jonathon, Hi cormack2009,

    I am the CEO & Founder of aiHit, the company behind DBL. Very happy to answer your questions. DBL is indeed a company search engine and list creation tool. We are one of the few search engines that is actually crawling the whole web (I think there are some 50 search engines doing this in the world). Yes, we scan many web pages in each domain. We are looking for companies that have a web presence and then try to figure out what companies do, what products, services, and solutions they offer, etc. We then structure this information (think semantic search). You can easily find companies in our search engine.

    If you go to our website http://www.dontbuylists.com/ and subscribe to our beta testing program by clicking on the green button, then I will give you access to the search engine at the next release, so you can see for yourself what we are up to.

    Re blocking DBL: We respect robots.txt You can find our instructions on how to configure your robots.txt file so we no longer crawl your site here: http://www.dontbuylists.com/faq.htm

    Hope the above is useful.

    Kind regards,
    Jens
     
    jenslapinski, Apr 10, 2009 IP
  4. cormack2009

    cormack2009 Peon

    Messages:
    177
    Likes Received:
    3
    Best Answers:
    1
    Trophy Points:
    0
    #4
    Hello jenslapinski,

    One question:
    If you find a site with, lets say 49 k pages, did you scan it all???

    The problem , on the webmaster side, is that this kind of spiders take much bandwith with no benefit for the webmaster. Personally im not talking about your spider, but in general.
    In those cases robots.txt do not work, because we need to know the name of each spider, and that is not possible.

    In my case, after having a bad experience a week ago, with some unknown (for me) spider that consumes 4,5 Gigas on my site. I develop my own code that don't let anybody (except google) to visit more than x number of pages in 10 minutes on one site of mine.
     
    cormack2009, Apr 11, 2009 IP
  5. upa playa

    upa playa Peon

    Messages:
    30
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I have to look into this
     
    upa playa, Apr 11, 2009 IP
  6. articleterritory

    articleterritory Peon

    Messages:
    543
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #6
    What makes this search engine different from let's say Google?
     
    articleterritory, Apr 11, 2009 IP