Hello all, My site is a phpbb forum with some articles as well, has been up for about 2 weeks now, and it works fine for our users. and I'm wondering if something is set up wrong for search engines to index it. Do they simply take longer than this to visit the site? One of the reasons I'm worried is because the bandwidth is so low for all the spiders. Anyway, here are my stats for robots/spiders for march from awstats. Unknown robot (identified by hit on 'robots.txt') 0+14 18.35 KB Inktomi Slurp 8+4 39.95 KB I know the one is Yahoo, but why are all the rest unkown? Could something be wrong with my robots.txt? I copied much of it from some forum, I've attached if someone wouldn't mind giving me some pointers. I'm obviously pretty new at this, so any help is really appreciated. Thanks a ton.
Yes, there is something wrong with your robots.txt file: # These robots either waste resources, harvest emails, or # do some other "bad" thing, but at least they obey the # robots.txt file. They're not allowed here. User-agent: almaden User-agent: ASPSeek User-agent: baiduspider User-agent: dumbBot User-agent: Generic User-agent: grub-client User-agent: MSIECrawler User-agent: NexaBot User-agent: NPBot User-agent: OWR_Crawler User-agent: psbot User-agent: rabaz User-agent: RPT-HTTPClient User-agent: ScoutAbout User-agent: semanticdiscovery User-agent: TurnitinBot User-agent: Wget Disallow: / # All other robots will be allowed to spider the domain # but are requested not to spider the images, and # document directories User-agent: * Disallow: /images/ # # Disallow the following directories to optimize page rank. # Disallow: /home/admin/ Disallow: /home/db/ Disallow: /home/images/ Disallow: /home/includes/ Disallow: /home/language/ Disallow: /home/templates/ Disallow: /home/common.php Disallow: /home/config.php Disallow: /home/faq.php Disallow: /home/groupcp.php Disallow: /home/login.php Disallow: /home/modcp.php Disallow: /home/posting.php Disallow: /home/privmsg.php Disallow: /home/profile.php Disallow: /home/search.php Disallow: /home/viewonline.php Code (markup): The first part is syntactically incorrect and you've excluded most of your forum pages by locking out the /templates/ folder. Change to: User-agent: * Disallow: /images/ Disallow: /home/admin/ Disallow: /home/db/ Disallow: /home/images/ Disallow: /home/includes/ Disallow: /home/language/ Disallow: /home/common.php Disallow: /home/config.php Disallow: /home/faq.php Disallow: /home/groupcp.php Disallow: /home/login.php Disallow: /home/modcp.php Disallow: /home/posting.php Disallow: /home/privmsg.php Disallow: /home/profile.php Disallow: /home/search.php Disallow: /home/viewonline.php Code (markup):
Allright, I've changed that part of the code and I'll let you all know if it helps. So the first part, is it just a complete mess? Is it even useful? I'm trying to learn this on my own as well, but I figured a nice list of some bad robots would be a fine thing to include. Actually, we're you saying, change the whole file to just your code, or just change the second part? I think for now I'll change the whole file, and maybe I should see the bad bots and what they do and then remove them myself... In any event, thanks for that tip so far...
It's not really useful. And a couple of the bots it's trying to block are legitimate albeit lesser spiders. Just get rid of it entirely.