Hi. I have a matrimony website to be work on. Its url is www.sanjogse.com. My problem is that Google has indexed more than 1500 pages of my website that has same or no content. for example : http://www.sanjogse.com/?m=browseby&a=city.profiles&geoId=1759 http://www.sanjogse.com/?m=browseby&a=country.profiles&geoId=138 http://www.sanjogse.com/?m=browseby&a=religion.profiles&rlgnId=1 http://www.sanjogse.com/?m=browseby&a=caste.profiles&cstId=17 These are the example of urls whcih are indexed have same or no content. I want to confirm that the indexing of these urls can be harmful for my rankings or not . If yes that what should i wrote in robots.txt and how should i wrote it.
If multiple URLs have the same content (but say displayed differently depending on how the content is sorted), then you should use rel="canonical". This will tell the search engine which page is the correct one and just index that. Read this post I previously made for a simple example: http://forums.digitalpoint.com/showthread.php?t=2531110&p=17925307#post17925307 For pages you don't want indexed, but you do want search engine bots to follow the links (for example country categories) you could use the robots meta tag: <meta name="robots" content="noindex, follow" /> HTML: This would just help ensure you have the most relevant pages indexed and focus organic visits landing on them, instead of empty pages. You don't need robots.txt, unless you want to completely restrict, although from what you've written it seems like you just need what I've mentioned above. Good luck.
In robots.txt file you have to write the name of the crawler robot and the allow and disallow command. For example if i want to give full access to robots than i write User-Agent:* Disallow:
1) There isn't an allow command. 2) The above is unnecessary, if you want to allow access. (As it's allowed by default.) 3) What you wrote in no way relates to the OP's question.