1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Robots.txt vs <meta name="robots" content="noindex,nofollow">

Discussion in 'robots.txt' started by gravy834, Mar 2, 2009.

  1. #1
    Hi,

    If I specify within my robots.txt file to disallow specific pages do I still need to include <meta name="robots" content="noindex,nofollow"> on each of those pages?

    Thanks
     
    gravy834, Mar 2, 2009 IP
  2. wp-themes

    wp-themes Banned

    Messages:
    230
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Not really required, as Robots.txt rules for allowing / disallowing indexing are the most important ones...

    However, you need to make sure to use it wisely, otherwise you might get important pages/folders deindexed ;)
     
    wp-themes, Mar 2, 2009 IP
  3. Lpe04

    Lpe04 Peon

    Messages:
    579
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #3
    You can, but it's probably not necessary (but will definitly issure that it doesn't get indexed).

    You can also maybe try it if there is a page that you want removed from an index.
     
    Lpe04, Mar 3, 2009 IP
  4. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #4
    No need to add, when you already added it in robots.txt..:)
     
    manish.chauhan, Mar 4, 2009 IP
  5. shailendra

    shailendra Peon

    Messages:
    1,225
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #5
    it's better to use robots.txt file to block the pages from getting crawled. moreover, you should always try to keep the coding as mow as possible to prevent code bloating
     
    shailendra, Mar 6, 2009 IP
  6. linkmonkey

    linkmonkey Peon

    Messages:
    68
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    No need if it's already in robots.txt
     
    linkmonkey, Mar 10, 2009 IP
  7. meri0098

    meri0098 Peon

    Messages:
    36
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    We are developing a portal, for that our development team has made 3 or 4 sub-folders on the same server for its backup and testing purpose. Google is considering these folders as a sub-sites and indexing all of them.

    Today I have disallowed all these folder or sub-sites with the help of Robots.txt file.

    In which I have used Following code

    User-agent: *
    Disallow: /

    User-agent: Googlebot
    Noindex: /

    in this way I think search engine crawlers will not index these sub-folders.

    We are also using Meta Tag <meta name="robots" content="index, follow" />
    in site, I cant change it in subfolders for disallowing because developer does all changes in these folders, they can upload in to site.

    My question is I have disallowed sub-folder by robot.txt file but there is meta tag <meta name="robots" content="index, follow" /> which is saying to follow and index the content.

    Should I remove follow meta tags from all of them?
    One is saying for follow and one is disallowing it? I am totally confuse what to do.
     
    meri0098, Jan 15, 2011 IP
  8. tenners

    tenners Peon

    Messages:
    30
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    @ gravy834 - Both the robots.txt file and the <meta name="robots" tag are used to control the indexing and caching of your website's pages. If you already stated NOT to index a page in the robots.txt file it is not necessary to do so on the page with the meta tag. However, keep in mind that not all spiders are created equal...meaning, they don't all use or follow your robots.txt directives so in my humble opinion it is still good to utilize the <meta "robots" tag even though you have explicitly stated not to index a page in your robots.txt file. Consider also, the scenario in which a spider gets to your page via a link that someone else put to it directly...will the spider index that content? (who knows for sure)...besides, it's not that much code that you should be too concerned about it's "weight" on the page.

    @ meri0098, if you consider what I've said above, the set-up you have seems like it could potentially cause a problem for you. I would find a way to have your directives in sync.
     
    tenners, Jan 16, 2011 IP
  9. Backlinkshub

    Backlinkshub Peon

    Messages:
    35
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    First of all we must put "robot.txt" at the top-level directory of our web server.

    And the second one

    When a robot looks for the "/robots.txt" file for any URL, it takes the path component from the URL (Everything from the first single slash), and puts "/robots.txt" in its place.

    For example, for "http://www.ABC.com/designs/index.html, it will remove the "/designs/index.html", and replace it with "/robots.txt", and will end up with "http://www.ABC.com/robots.txt".

    So i thing there is no need to again specify robot tag in every page coz whenever spider comes to any of the page of our website first of all it directly goes to "robot.txt" then after goes to that particular page which we request .
     
    Backlinkshub, Jan 17, 2011 IP
  10. calvin4u

    calvin4u Peon

    Messages:
    40
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Thank You..
    use for this robot .txt metta tag all inerpage site...
     
    calvin4u, Jan 20, 2011 IP
  11. hilhilginger

    hilhilginger Well-Known Member

    Messages:
    322
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    103
    #11
    Well.Thanks for the info.I heard that bing is taking site info from DMOZ and not from the robot.text. So if my site is listed in DMOZ then there is no point in using robot.text.
     
    hilhilginger, Jan 20, 2011 IP
  12. brad.smith4321

    brad.smith4321 Peon

    Messages:
    249
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    robots.txt file to block the pages from getting crawled. furthermore, you should always try to keep the coding as mow as possible to prevent code bloating
     
    brad.smith4321, Feb 1, 2011 IP
  13. fsdnetwork

    fsdnetwork Peon

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    You should remove the meta tag info (it's redundant), so these pages folders are currntly blocked by robots and no one crawler ( robots.txt compliant ) could crawl these pages or folders
     
    fsdnetwork, Feb 9, 2011 IP