DontBuyLists.com is scanning each page?

jonathon Well-Known Member

Messages:: 523

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 110

#1

Checking my stats yesterday i found this: Agent: Mozilla/5.0 (compatible; DBLBot/1.0; +http://www.dontbuylists.com/) anyone know who they are and what they are scanning for?

Excerpted from the website:
We are not saying a great deal about the product and the company just yet, though from the name and some of the details below you might be able to hazard a guess.
Click to expand...

jonathon, Apr 9, 2009 IP

cormack2009 Peon

Messages:: 177

Likes Received:: 3

Best Answers:: 1

Trophy Points:: 0

#2

On their webpage, they said:

DontBuyLists is a company search engine and list creation tool.

The DBLbot is crawling the web in search of company websites. Company websites are cached and are then searchable on our search engine.

Because we structure the information found on websites using semantic technology, you can easily find companies, and create lists of companies for free.

My suggestion, just ban them.

cormack2009, Apr 9, 2009 IP

jenslapinski Peon

Messages:: 1

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

Hi jonathon, Hi cormack2009,

I am the CEO & Founder of aiHit, the company behind DBL. Very happy to answer your questions. DBL is indeed a company search engine and list creation tool. We are one of the few search engines that is actually crawling the whole web (I think there are some 50 search engines doing this in the world). Yes, we scan many web pages in each domain. We are looking for companies that have a web presence and then try to figure out what companies do, what products, services, and solutions they offer, etc. We then structure this information (think semantic search). You can easily find companies in our search engine.

If you go to our website http://www.dontbuylists.com/ and subscribe to our beta testing program by clicking on the green button, then I will give you access to the search engine at the next release, so you can see for yourself what we are up to.

Re blocking DBL: We respect robots.txt You can find our instructions on how to configure your robots.txt file so we no longer crawl your site here: http://www.dontbuylists.com/faq.htm

Hope the above is useful.

Kind regards,
Jens

jenslapinski, Apr 10, 2009 IP

cormack2009 Peon

Messages:: 177

Likes Received:: 3

Best Answers:: 1

Trophy Points:: 0

#4

Hello jenslapinski,

One question:
If you find a site with, lets say 49 k pages, did you scan it all???

The problem , on the webmaster side, is that this kind of spiders take much bandwith with no benefit for the webmaster. Personally im not talking about your spider, but in general.
In those cases robots.txt do not work, because we need to know the name of each spider, and that is not possible.

In my case, after having a bad experience a week ago, with some unknown (for me) spider that consumes 4,5 Gigas on my site. I develop my own code that don't let anybody (except google) to visit more than x number of pages in 10 minutes on one site of mine.

cormack2009, Apr 11, 2009 IP

upa playa Peon

Messages:: 30

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

I have to look into this

upa playa, Apr 11, 2009 IP

articleterritory Peon

Messages:: 543

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#6

What makes this search engine different from let's say Google?

articleterritory, Apr 11, 2009 IP

Log in or Sign up

DontBuyLists.com is scanning each page?

jonathon Well-Known Member

cormack2009 Peon

jenslapinski Peon

cormack2009 Peon

upa playa Peon

articleterritory Peon

Useful Searches