Myspace Comments - Discount Magazine Subscriptions - Loans - Bad Credit Mortgages - Remortgages

PDA

View Full Version : Anyone seen this spider?


NewComputer
Jul 7th 2004, 6:55 am
I had a spider visit today called "Larbin". I have never heard of it. Anyone know?

l0cke
Jul 7th 2004, 7:08 am
A quick google search reveals that Larbin is an open source web crawler - http://larbin.sourceforge.net/index-eng.html
Larbin is a web crawler (also called (web) robot, spider, scooter...). It is intended to fetch a large number of web pages to fill the database of a search engine. With a network fast enough, Larbin should be able to fetch more than 100 millions pages on a standard PC.

Larbin is (just) a web crawler, NOT an indexer. You have to write some code yourself in order to save pages or index them in a database.

Larbin was initially developped for the XYLEME project in the VERSO team at INRIA. The goal of Larbin was to go and fetch xml pages on the web to fill the database of an xml-oriented search engine. Thanks to its origins, Larbin is very generalistic (and easy to customize).

NewComputer
Jul 7th 2004, 7:32 am
hmmmmm, so this could have come from anyone. I will have a look at the ip.

megri
Jul 15th 2004, 11:37 pm
What is the way to stop this spider robots.txt

Touchdown
Jul 20th 2004, 11:11 am
Why do you care if that Spider visits or not?

NewComputer
Jul 20th 2004, 1:47 pm
A little thing called bandwidth... among other reasons.

sarahk
Jul 20th 2004, 1:57 pm
I wouldn't rely on robots.txt for anything other than legitimate search engine robots where you want to control the results shown in the search engine results.

My site has some info on Larbin (http://botspotter.net/bs-53.html) too.

You may be better to block the bot name using your .htaccess. This post might help: http://www.mod-rewrite.com/forum/showthread.php?p=48

Sarah

schlottke
Jul 20th 2004, 6:33 pm
Thanks for the link sarah, adding to my favs.

hulkster
Jul 21st 2004, 8:27 am
FYI FWIW: "Larbin" spidered 81 pages on the www.komar.org website last week with the same IP address of 202.9.158.10 - an example apache log entry is shown below. A reverse lookup of this IP address times out for me, but a traceroute seems to indicate it was from Singapore.

alek

202.9.158.10 - - [15/Jul/2004:07:46:27 -0600] "GET / HTTP/1.0" 200 7510 "-" "larbin_2.6.3 snishant@ipolicynet.com"