Anyone seen this spider?

NewComputer Well-Known Member

Messages:: 2,021

Likes Received:: 68

Best Answers:: 0

Trophy Points:: 188

#1

I had a spider visit today called "Larbin". I have never heard of it. Anyone know?

NewComputer, Jul 7, 2004 IP

l0cke Active Member

Messages:: 178

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 73

#2

A quick google search reveals that Larbin is an open source web crawler - http://larbin.sourceforge.net/index-eng.html

Larbin is a web crawler (also called (web) robot, spider, scooter...). It is intended to fetch a large number of web pages to fill the database of a search engine. With a network fast enough, Larbin should be able to fetch more than 100 millions pages on a standard PC.

Larbin is (just) a web crawler, NOT an indexer. You have to write some code yourself in order to save pages or index them in a database.

Larbin was initially developped for the XYLEME project in the VERSO team at INRIA. The goal of Larbin was to go and fetch xml pages on the web to fill the database of an xml-oriented search engine. Thanks to its origins, Larbin is very generalistic (and easy to customize).
Click to expand...

l0cke, Jul 7, 2004 IP

NewComputer Well-Known Member

Messages:: 2,021

Likes Received:: 68

Best Answers:: 0

Trophy Points:: 188

#3

hmmmmm, so this could have come from anyone. I will have a look at the ip.

NewComputer, Jul 7, 2004 IP

megri Active Member

Messages:: 367

Likes Received:: 12

Best Answers:: 0

Trophy Points:: 58

#4

What is the way to stop this spider robots.txt

megri, Jul 15, 2004 IP

Touchdown Peon

Messages:: 14

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

Why do you care if that Spider visits or not?

Touchdown, Jul 20, 2004 IP

NewComputer Well-Known Member

Messages:: 2,021

Likes Received:: 68

Best Answers:: 0

Trophy Points:: 188

#6

A little thing called bandwidth... among other reasons.

NewComputer, Jul 20, 2004 IP

sarahk iTamer Staff

Messages:: 28,807

Likes Received:: 4,534

Best Answers:: 123

Trophy Points:: 665

#7

I wouldn't rely on robots.txt for anything other than legitimate search engine robots where you want to control the results shown in the search engine results.

My site has some info on Larbin too.

You may be better to block the bot name using your .htaccess. This post might help: http://www.mod-rewrite.com/forum/showthread.php?p=48

Sarah

sarahk, Jul 20, 2004 IP

schlottke Peon

Messages:: 2,185

Likes Received:: 63

Best Answers:: 0

Trophy Points:: 0

#8

Thanks for the link sarah, adding to my favs.

schlottke, Jul 20, 2004 IP

hulkster Peon

Messages:: 1,705

Likes Received:: 93

Best Answers:: 0

Trophy Points:: 0

#9

FYI FWIW: "Larbin" spidered 81 pages on the www.komar.org website last week with the same IP address of 202.9.158.10 - an example apache log entry is shown below. A reverse lookup of this IP address times out for me, but a traceroute seems to indicate it was from Singapore.

alek

202.9.158.10 - - [15/Jul/2004:07:46:27 -0600] "GET / HTTP/1.0" 200 7510 "-" "larbin_2.6.3 snishant@ipolicynet.com"

hulkster, Jul 21, 2004 IP

Log in or Sign up

Anyone seen this spider?

NewComputer Well-Known Member

l0cke Active Member

NewComputer Well-Known Member

megri Active Member

Touchdown Peon

NewComputer Well-Known Member

sarahk iTamer Staff

schlottke Peon

hulkster Peon

Useful Searches