Search Engines Missing Site Completely?

Discussion in 'Google' started by perdog05, Mar 17, 2005.

  1. #1
    Hello all,

    My site is a phpbb forum with some articles as well, has been up for about 2 weeks now, and it works fine for our users. and I'm wondering if something is set up wrong for search engines to index it. Do they simply take longer than this to visit the site? One of the reasons I'm worried is because the bandwidth is so low for all the spiders. Anyway, here are my stats for robots/spiders for march from awstats.

    Unknown robot (identified by hit on 'robots.txt') 0+14 18.35 KB
    Inktomi Slurp 8+4 39.95 KB

    I know the one is Yahoo, but why are all the rest unkown? Could something be wrong with my robots.txt? I copied much of it from some forum, I've attached if someone wouldn't mind giving me some pointers.

    I'm obviously pretty new at this, so any help is really appreciated. Thanks a ton.
     

    Attached Files:

    perdog05, Mar 17, 2005 IP
  2. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #2
    Yes, there is something wrong with your robots.txt file:

    # These robots either waste resources, harvest emails, or
    # do some other "bad" thing, but at least they obey the
    # robots.txt file.  They're not allowed here.
    
    User-agent: almaden
    User-agent: ASPSeek
    User-agent: baiduspider
    User-agent: dumbBot
    User-agent: Generic
    User-agent: grub-client
    User-agent: MSIECrawler
    User-agent: NexaBot
    User-agent: NPBot
    User-agent: OWR_Crawler
    User-agent: psbot
    User-agent: rabaz
    User-agent: RPT-HTTPClient
    User-agent: ScoutAbout
    User-agent: semanticdiscovery
    User-agent: TurnitinBot
    User-agent: Wget
    Disallow: /
    
    # All other robots will be allowed to spider the domain
    # but are requested not to spider the images, and
    # document directories
    
    User-agent: *
    Disallow: /images/
    
    #
    # Disallow the following directories to optimize page rank.
    #
    
    Disallow: /home/admin/
    Disallow: /home/db/
    Disallow: /home/images/
    Disallow: /home/includes/
    Disallow: /home/language/
    Disallow: /home/templates/
    Disallow: /home/common.php
    Disallow: /home/config.php
    Disallow: /home/faq.php
    Disallow: /home/groupcp.php
    Disallow: /home/login.php
    Disallow: /home/modcp.php
    Disallow: /home/posting.php
    Disallow: /home/privmsg.php
    Disallow: /home/profile.php
    Disallow: /home/search.php
    Disallow: /home/viewonline.php
    Code (markup):
    The first part is syntactically incorrect and you've excluded most of your forum pages by locking out the /templates/ folder. Change to:

    User-agent: *
    Disallow: /images/
    Disallow: /home/admin/
    Disallow: /home/db/
    Disallow: /home/images/
    Disallow: /home/includes/
    Disallow: /home/language/
    Disallow: /home/common.php
    Disallow: /home/config.php
    Disallow: /home/faq.php
    Disallow: /home/groupcp.php
    Disallow: /home/login.php
    Disallow: /home/modcp.php
    Disallow: /home/posting.php
    Disallow: /home/privmsg.php
    Disallow: /home/profile.php
    Disallow: /home/search.php
    Disallow: /home/viewonline.php
    Code (markup):
     
    minstrel, Mar 17, 2005 IP
  3. perdog05

    perdog05 Peon

    Messages:
    28
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Allright, I've changed that part of the code and I'll let you all know if it helps.

    So the first part, is it just a complete mess? Is it even useful? I'm trying to learn this on my own as well, but I figured a nice list of some bad robots would be a fine thing to include.

    Actually, we're you saying, change the whole file to just your code, or just change the second part?

    I think for now I'll change the whole file, and maybe I should see the bad bots and what they do and then remove them myself...

    In any event, thanks for that tip so far...
     
    perdog05, Mar 17, 2005 IP
  4. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #4
    It's not really useful. And a couple of the bots it's trying to block are legitimate albeit lesser spiders.

    Just get rid of it entirely.
     
    minstrel, Mar 17, 2005 IP
  5. perdog05

    perdog05 Peon

    Messages:
    28
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Should teach me not to just copy and paste without understanding it... OK thanks again.
     
    perdog05, Mar 17, 2005 IP