1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Why do I need a robots.txt , and what to put in it?

Discussion in 'robots.txt' started by Melissa2007, Feb 23, 2019.

  1. #1
    I'm looking into some SEO issues for my home business site, and keep seeing that Google expects to see a robots.txt file. For SEO?

    But then I read that lots of robots just ignore it anyway, and of course we WANT Google to keep it indexed, right?
    SEMrush
    So why exactly do I need a robots.txt , and what should I put in it?
     
    Melissa2007, Feb 23, 2019 IP
    SEMrush
  2. qwikad.com

    qwikad.com Illustrious Member Affiliate Manager

    Messages:
    6,410
    Likes Received:
    1,369
    Best Answers:
    24
    Trophy Points:
    400
    #2
    You're right most bots will ignore your robots.txt

    If you want Google to crawl your entire website put this into your robots.txt:

    
    User-agent: *
    Disallow:
    
    Code (markup):
    The truth is your website will get crawled without robots.txt. It's mostly used to disallow Google (and other bots) to crawl certain parts of a website.


     
    qwikad.com, Feb 23, 2019 IP
  3. Melissa2007

    Melissa2007 Active Member

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    86
    #3
    I was going without one completely, but keep seeing that for SEO purposes, Google looks for one. So I just did this one, to exclude my test directory from being indexed, anyway:

    User-agent: *
    Disallow: /test/
     
    Melissa2007, Feb 23, 2019 IP
  4. mmerlinn

    mmerlinn Prominent Member

    Messages:
    2,576
    Likes Received:
    508
    Best Answers:
    6
    Trophy Points:
    320
    #4
    If NOTHING in your test directory is EVER linked to the outside world, NO robot can find it nor index it, in which case you do not need a robots.txt file for it.

    However, be aware that ROGUE bots do NOT honor a robots.txt file, but instead index the links anyway. Having a robots.txt file for a test directory tells the rogue bots where the test file is located, AND THEY WILL INDEX IT IF THEY FIND YOUR robots.txt file.

    I know this because it has happened to me. I had to move everything, then create a fake page with fake information for the rogue bots to index to repair the damage.

    Best bet is to remove your test file from the robots.txt file before a rogue bot finds it and indexes your test file. Then make sure that you NEVER link your test file to the outside world. You can also put noindex and nofollow tags in your META tags for the test file which will stop all honorable bots from indexing it. Finally, you can password protect your test file which keeps EVERYONE out except those who know the password.
     
    Last edited: Feb 24, 2019
    mmerlinn, Feb 24, 2019 IP
  5. Melissa2007

    Melissa2007 Active Member

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    86
    #5
    If I want to disallow several directories in the main domain root, is it done this way?:

    User-agent: *
    Disallow: /test/ /JPG/ /PHOT/ /VID/
     
    Melissa2007, Feb 24, 2019 IP
  6. Melissa2007

    Melissa2007 Active Member

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    86
    #6
    Not sure what you mean by linked. The test directory is a subdirectory of my main domain, so the main domain is indexed.

    I'm not familiar with these terms yet. Rogue bots? Are they looking for things to steal? My test directory is just a way to test what I intend to put in the indexed domain later. There's nothing that really needs securing, that I know of.

    But for what purpose would they index it? I have no secret or security info in it.

    As far as password protection, I made a directory like that years ago, eventually forgot the password - even what was in it, and it still lingers in my ISP's account for me. I don't even know if they can remove it. I've asked, to no avail.
     
    Melissa2007, Feb 24, 2019 IP
  7. qwikad.com

    qwikad.com Illustrious Member Affiliate Manager

    Messages:
    6,410
    Likes Received:
    1,369
    Best Answers:
    24
    Trophy Points:
    400
    #7
    No, the correct way is this:

    User-agent: *
    Disallow: /test/
    Disallow: /JPG/
    Disallow: /PHOT/
    Disallow: /VID/
     
    qwikad.com, Feb 24, 2019 IP
  8. Melissa2007

    Melissa2007 Active Member

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    86
    #8
    Thanks!
     
    Melissa2007, Feb 24, 2019 IP
  9. mmerlinn

    mmerlinn Prominent Member

    Messages:
    2,576
    Likes Received:
    508
    Best Answers:
    6
    Trophy Points:
    320
    #9
    Is there a link on ANY PUBLIC WEBPAGE ANYWHERE ON THE INTERNET that will open up your test pages? If so, you have outside links.

    A rogue bot is a bot that is rogue, which is a bot that DOES NOT FOLLOW THE RULES, which WILL index pages where you specifically DISALLOW indexing. Yahoo comes to mind here as Yahoo bots indexed my private pages EVEN THOUGH I DISALLOWED THEM and unfortunately my private information has spread across the web for anyone to use/abuse. Indexing bots generally are not looking to steal anything as they exist for only one purpose, to index pages.

    The problem lies in the fact that once your private pages are indexed, the WHOLE world can now access your private pages and they can NEVER be unindexed. Should you have any private information on those pages (now or in the future), anyone else can see it or even steal it.

    Bots do not need a reason to index. They just index EVERYTHING that they find. If you do not want something indexed, you MUST prevent them from indexing it. Disallowing and noindexing is not enough.

    If bots can never find your pages, you will never need to password them to stop the bots.
     
    mmerlinn, Feb 26, 2019 IP