Hi, I am very, very new at site creation and have only just uploaded my very first brand new site, so please excuse me if my questions seem a bit naive. How can I know, from my webstats, which bot has read my site? The obvious ones, like google or msn are written, but what are: 1. BLA 2. ia_archiver-web.archive.org 3. MetaTagRobot As I said, my site is very, very new, so what other ones should I expect over the coming weeks? And the million dollar question, should anybody know the answer, is how long after they appear in my stats should I expect to receive visitors from search engines?
No Idea from the way back machine at www.archive.org no idea but something parsing based on your meta tags I guess Rule #1: don't worry about the important bots visiting - submit once, get backlinks, don't stress Rule #2: don't expect to be able to identify every bot that visits. There are literally thousands and it's just not worth the stress. Between the referral spammers, the spoofers (pretend to be googlebot when they're not) and the people verifying their backlinks, the subscription only search engines you'll be exhausted just trying to keep up.
MetaTagRobot is from this site. I don't know if the crawls are automatic or if they are manually initiated, and know little else about it. Here are some bots that visit one of my sites: Googlebot MSNBot Inktomi Slurp WISENutbot LinkWalker Unknown robot (identified by hit on 'robots.txt') Unknown robot (identified by 'crawl') AskJeeves Walhello appie Alexa (IA Archiver) Lycos
Definitely. You'll drive yourself nuts worrying about them all and they'll just keep shifting each time you block one variation anyway...
I get: cache-xxx-yyyy.proxy.aol.com and: nnn-nnn-nnn-nnn.gen.twtelecom.net Call by a lot and do very little. I guess that the first is a caching proxy at AOL (maybe it also does search crawling too) but the second one stumps me. I think that it just reads one thing from the root, either the root directory or the robots.txt file. Any ideas?
OK, now I just need to decide if I care about WebSense I should have thought to google it, but I always assumed it was some sort of search engine crawler