1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Robots.txt question

Discussion in 'robots.txt' started by Moneyfolk, Jan 12, 2006.

  1. #1
    I created a basic robots.txt file. Do I need to also include

    <meta name="ROBOTS" content="ALL"> in the head tags of my html pages or is just having the robots.txt in my root directory is enough.

    Thanks in advance.
     
    Moneyfolk, Jan 12, 2006 IP
  2. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,740
    Likes Received:
    1,702
    Best Answers:
    78
    Trophy Points:
    510
    #2
    Robots.txt file sufficient. Actually if you had neither the robots.txt or the meta tag mentioned above entire site would be available for indexing. I use robots.txt more to disallow certain folders for compliant robots. Rogues will not honor robots.txt.

    Shannon
     
    Smyrl, Jan 12, 2006 IP
  3. dcristo

    dcristo Illustrious Member

    Messages:
    19,776
    Likes Received:
    1,199
    Best Answers:
    7
    Trophy Points:
    470
    Articles:
    7
    #3
    It's only necessary to have the Meta Title, Keyword, and Description Tags, the rest are not required.
     
    dcristo, Jan 12, 2006 IP
  4. Moneyfolk

    Moneyfolk Peon

    Messages:
    420
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thank you, both. I hope having the file will be conducive to the big 3 indexing more of my pages. I understand that MSN likes robots.txt files.
     
    Moneyfolk, Jan 12, 2006 IP
  5. dcristo

    dcristo Illustrious Member

    Messages:
    19,776
    Likes Received:
    1,199
    Best Answers:
    7
    Trophy Points:
    470
    Articles:
    7
    #5
    Getting well indexed in the SE's is just a matter of getting more links to your site. The robots.txt is typically used to tell the SE's NOT to index parts of your site.
     
    dcristo, Jan 12, 2006 IP
  6. mdvaldosta

    mdvaldosta Peon

    Messages:
    4,079
    Likes Received:
    362
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Actually, it's only NECESSARY to have the title tag, the meta description is optional (the SE's will skim your page and pull a description for you) but highly recommended you have your own. The keyword is usually a waste of time, but I still use it anyways for good form.

    The robots.txt is especially important for MSN, even if you upload a blank one. Also, for awstats because hits on that file is one of the ways it recognizes bot hits.
     
    mdvaldosta, Jan 12, 2006 IP
  7. dcristo

    dcristo Illustrious Member

    Messages:
    19,776
    Likes Received:
    1,199
    Best Answers:
    7
    Trophy Points:
    470
    Articles:
    7
    #7
    To clarify, when I stated it were necessary, I really meant it's advised to include them.

    As for robots.txt and MSN, I havent had any problems with any sites on MSN when excluding the file.
     
    dcristo, Jan 12, 2006 IP
  8. seo_expert

    seo_expert Well-Known Member

    Messages:
    475
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    123
    #8
    I'd like to add one thing here.....

    be aware of websites while link exchanging as some web masters use robots.txt file to dis-allow the Google spiders to crawl their link pages..your link will be of no use then...
     
    seo_expert, Jan 12, 2006 IP
  9. Moneyfolk

    Moneyfolk Peon

    Messages:
    420
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #9
    So should you check the robots.txt file of websites that you want to link to?
     
    Moneyfolk, Jan 13, 2006 IP
  10. northpointaiki

    northpointaiki Guest

    Messages:
    6,876
    Likes Received:
    187
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Hadn't thought of that - it's a good point. Unlike "nofollow," which can be detected, yes, how would you know what's in their robots.txt?
     
    northpointaiki, Jan 13, 2006 IP
  11. BILZ

    BILZ Peon

    Messages:
    1,515
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    0
    #11
    just look at the file... theirdomain.com/robots.txt

    Without checking the source, How do you detect a nofollow?
     
    BILZ, Jan 13, 2006 IP
  12. maldives

    maldives Prominent Member

    Messages:
    7,187
    Likes Received:
    902
    Best Answers:
    0
    Trophy Points:
    310
    #12
    Excellent! In most cases I use robots.txt more to disallow certain folders for robots. MSNBot complies with the standards for robots.txt.
     
    maldives, Jan 13, 2006 IP
  13. Moneyfolk

    Moneyfolk Peon

    Messages:
    420
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Moneyfolk, Jan 13, 2006 IP
  14. northpointaiki

    northpointaiki Guest

    Messages:
    6,876
    Likes Received:
    187
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Yeah, just saw this today: put robots.txt and then look for the disallow on the directory or page you are linked on:

    User-agent: Googlebot
    Disallow: /TheDirectoryorpageyou'reon/

    Thanks.
     
    northpointaiki, Jan 13, 2006 IP
  15. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Why would it be more important for MSN than for Google or Yahoo ? If robots.txt is not present, they all will understand that they are permitted to visit all pages.

    When the file is not present, AWStats sees the requests for the non-existing robots.txt file. These requests allow AWStats to recognize these bots.

    Jean-Luc
     
    Jean-Luc, Jan 13, 2006 IP
  16. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #16
    This is not the way to identify bots, all it will do it fill up your 404 error section. The SE's dont download the robots.txt every visit as this would be a waste of resources and would nullify your idea.
     
    ServerUnion, Jan 13, 2006 IP
  17. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #17
    I agree that it is not the best way to identify bots and it should certainly not be the only way to do it, but it is used by AWStats and other stats software to discover new bots. AWStats reports them as
    Unknown robot (identified by hit on 'robots.txt')
    Code (markup):
    Jean-Luc
     
    Jean-Luc, Jan 13, 2006 IP
  18. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #18
    No, that simply means that the bot does not have an official listing as a verified source. It still knows it is a bot, just doesn't know the name. Could be many reasons for this, the robots.txt file has nothing to do with it.

    I get these with sites I have the file on, and ones I do not. Can you provide documenation on this theory? I would be interested to read more about it.
     
    ServerUnion, Jan 13, 2006 IP
  19. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #19
    A line, titled "Unknown robot (identified by hit on 'robots.txt')", appears in the AWStats list of "Robots/Spiders visitors". That seems pretty clear to me.

    Jean-Luc
     
    Jean-Luc, Jan 13, 2006 IP
  20. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #20
    opposed to have hundreds of 404 errors on the file?

    This may just be due to the fact that the stats programs aren't going to waste overhead by listing out ever little bot that stops by. Most likely just list the larger sources.
     
    ServerUnion, Jan 13, 2006 IP