What should be robots.txt file for these indexed urls??

geniusoptimizer Greenhorn

Messages:: 53

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#1

Hi. I have a matrimony website to be work on. Its url is www.sanjogse.com. My problem is that Google has indexed more than 1500 pages of my website that has same or no content. for example :
http://www.sanjogse.com/?m=browseby&a=city.profiles&geoId=1759
http://www.sanjogse.com/?m=browseby&a=country.profiles&geoId=138
http://www.sanjogse.com/?m=browseby&a=religion.profiles&rlgnId=1
http://www.sanjogse.com/?m=browseby&a=caste.profiles&cstId=17

These are the example of urls whcih are indexed have same or no content. I want to confirm that the indexing of these urls can be harmful for my rankings or not . If yes that what should i wrote in robots.txt and how should i wrote it.

Last edited: Dec 17, 2012

geniusoptimizer, Dec 17, 2012 IP

ryan_uk Illustrious Member

Messages:: 3,983

Likes Received:: 1,022

Best Answers:: 33

Trophy Points:: 465

#2

If multiple URLs have the same content (but say displayed differently depending on how the content is sorted), then you should use rel="canonical". This will tell the search engine which page is the correct one and just index that.

Read this post I previously made for a simple example:
http://forums.digitalpoint.com/showthread.php?t=2531110&p=17925307#post17925307

For pages you don't want indexed, but you do want search engine bots to follow the links (for example country categories) you could use the robots meta tag:
<meta name="robots" content="noindex, follow" />
HTML:
This would just help ensure you have the most relevant pages indexed and focus organic visits landing on them, instead of empty pages.

You don't need robots.txt, unless you want to completely restrict, although from what you've written it seems like you just need what I've mentioned above.

Good luck.

ryan_uk, Dec 22, 2012 IP

agitetech Peon

Messages:: 122

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

In robots.txt file you have to write the name of the crawler robot and the allow and disallow command.
For example if i want to give full access to robots than i write
User-Agent:*
Disallow:

agitetech, Dec 27, 2012 IP

ryan_uk Illustrious Member

Messages:: 3,983

Likes Received:: 1,022

Best Answers:: 33

Trophy Points:: 465

#4

agitetech said: ↑

In robots.txt file you have to write the name of the crawler robot and the allow and disallow command.
For example if i want to give full access to robots than i write
User-Agent:*
Disallow:
Click to expand...

1) There isn't an allow command.
2) The above is unnecessary, if you want to allow access. (As it's allowed by default.)
3) What you wrote in no way relates to the OP's question.

ryan_uk, Dec 31, 2012 IP

icool89 Peon

Messages:: 1

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

Hmm

Is robots.txt so effective for search engines as everyone speak about it?

icool89, Jan 2, 2013 IP

Log in or Sign up

What should be robots.txt file for these indexed urls??

geniusoptimizer Greenhorn

ryan_uk Illustrious Member

agitetech Peon

ryan_uk Illustrious Member

icool89 Peon

Useful Searches