Recognizing Search Engine Bots

Discussion in 'Traffic Analysis' started by ymgem, Dec 18, 2005.

  1. #1
    Hi,

    I am very, very new at site creation and have only just uploaded my very first brand new site, so please excuse me if my questions seem a bit naive.

    How can I know, from my webstats, which bot has read my site?

    The obvious ones, like google or msn are written, but what are:

    1. BLA
    2. ia_archiver-web.archive.org
    3. MetaTagRobot

    As I said, my site is very, very new, so what other ones should I expect over the coming weeks?

    And the million dollar question, should anybody know the answer, is how long after they appear in my stats should I expect to receive visitors from search engines?
     
    ymgem, Dec 18, 2005 IP
  2. sarahk

    sarahk iTamer Staff

    Messages:
    29,016
    Likes Received:
    4,584
    Best Answers:
    124
    Trophy Points:
    665
    #2
    No Idea
    from the way back machine at www.archive.org
    no idea but something parsing based on your meta tags I guess

    Rule #1: don't worry about the important bots visiting - submit once, get backlinks, don't stress

    Rule #2: don't expect to be able to identify every bot that visits. There are literally thousands and it's just not worth the stress. Between the referral spammers, the spoofers (pretend to be googlebot when they're not) and the people verifying their backlinks, the subscription only search engines you'll be exhausted just trying to keep up.
     
    sarahk, Dec 18, 2005 IP
  3. ahearn

    ahearn Peon

    Messages:
    292
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #3
    MetaTagRobot is from this site. I don't know if the crawls are automatic or if they are manually initiated, and know little else about it.

    Here are some bots that visit one of my sites:
    Googlebot
    MSNBot
    Inktomi Slurp
    WISENutbot
    LinkWalker
    Unknown robot (identified by hit on 'robots.txt')
    Unknown robot (identified by 'crawl')
    AskJeeves
    Walhello appie
    Alexa (IA Archiver)
    Lycos
     
    ahearn, Dec 19, 2005 IP
  4. joaquin

    joaquin Well-Known Member

    Messages:
    606
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    130
    #4
    yes use the meta bot thing.. I think it's pretty easy to spot minor both though
     
    joaquin, Dec 21, 2005 IP
  5. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #5
    Definitely. You'll drive yourself nuts worrying about them all and they'll just keep shifting each time you block one variation anyway...
     
    minstrel, Dec 21, 2005 IP
  6. MattBeard

    MattBeard Peon

    Messages:
    259
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I get:

    cache-xxx-yyyy.proxy.aol.com

    and:

    nnn-nnn-nnn-nnn.gen.twtelecom.net

    Call by a lot and do very little. I guess that the first is a caching proxy at AOL (maybe it also does search crawling too) but the second one stumps me. I think that it just reads one thing from the root, either the root directory or the robots.txt file.

    Any ideas?
     
    MattBeard, Dec 31, 2005 IP
  7. sarahk

    sarahk iTamer Staff

    Messages:
    29,016
    Likes Received:
    4,584
    Best Answers:
    124
    Trophy Points:
    665
    #7
    sarahk, Dec 31, 2005 IP
  8. MattBeard

    MattBeard Peon

    Messages:
    259
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #8
    OK, now I just need to decide if I care about WebSense

    I should have thought to google it, but I always assumed it was some sort of search engine crawler
     
    MattBeard, Dec 31, 2005 IP