What should be robots.txt file for these indexed urls??

Discussion in 'robots.txt' started by geniusoptimizer, Dec 17, 2012.

  1. #1
    Hi. I have a matrimony website to be work on. Its url is www.sanjogse.com. My problem is that Google has indexed more than 1500 pages of my website that has same or no content. for example :
    http://www.sanjogse.com/?m=browseby&a=city.profiles&geoId=1759
    http://www.sanjogse.com/?m=browseby&a=country.profiles&geoId=138
    http://www.sanjogse.com/?m=browseby&a=religion.profiles&rlgnId=1
    http://www.sanjogse.com/?m=browseby&a=caste.profiles&cstId=17


    These are the example of urls whcih are indexed have same or no content. I want to confirm that the indexing of these urls can be harmful for my rankings or not . If yes that what should i wrote in robots.txt and how should i wrote it.
     
    Last edited: Dec 17, 2012
    geniusoptimizer, Dec 17, 2012 IP
  2. ryan_uk

    ryan_uk Illustrious Member

    Messages:
    3,983
    Likes Received:
    1,022
    Best Answers:
    33
    Trophy Points:
    465
    #2
    If multiple URLs have the same content (but say displayed differently depending on how the content is sorted), then you should use rel="canonical". This will tell the search engine which page is the correct one and just index that.

    Read this post I previously made for a simple example:
    http://forums.digitalpoint.com/showthread.php?t=2531110&p=17925307#post17925307

    For pages you don't want indexed, but you do want search engine bots to follow the links (for example country categories) you could use the robots meta tag:

    <meta name="robots" content="noindex, follow" />
    HTML:
    This would just help ensure you have the most relevant pages indexed and focus organic visits landing on them, instead of empty pages.

    You don't need robots.txt, unless you want to completely restrict, although from what you've written it seems like you just need what I've mentioned above.

    Good luck.
     
    ryan_uk, Dec 22, 2012 IP
  3. agitetech

    agitetech Peon

    Messages:
    122
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    In robots.txt file you have to write the name of the crawler robot and the allow and disallow command.
    For example if i want to give full access to robots than i write
    User-Agent:*
    Disallow:
     
    agitetech, Dec 27, 2012 IP
  4. ryan_uk

    ryan_uk Illustrious Member

    Messages:
    3,983
    Likes Received:
    1,022
    Best Answers:
    33
    Trophy Points:
    465
    #4
    1) There isn't an allow command.
    2) The above is unnecessary, if you want to allow access. (As it's allowed by default.)
    3) What you wrote in no way relates to the OP's question.
     
    ryan_uk, Dec 31, 2012 IP
  5. icool89

    icool89 Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Hmm

    Is robots.txt so effective for search engines as everyone speak about it?
     
    icool89, Jan 2, 2013 IP