Problem with robots.txt file

Discussion in 'Search Engine Optimization' started by vasvigupt, Aug 26, 2013.

  1. #1
    Hi, I have create the robots.txt file for one of my website. I have create rule for block some pages.One of the line I have added to block user page is
    User-agent: *
    Disallow: /user/
    
    Code (markup):
    This line should block the user page also but Google has crawled this page also. Can any one tell me why Google has indexed this page. what is wrong in this code. The url of website is http://cp.wainhouse.com/robots.txt.
     
    Solved! View solution.
    vasvigupt, Aug 26, 2013 IP
  2. patco

    patco Well-Known Member

    Messages:
    2,035
    Likes Received:
    47
    Best Answers:
    17
    Trophy Points:
    100
    #2
    Are you sure Google didn't crawl the URL before you create this robots.txt file? I mean if it already visited and indexed this page, it would be visible in search engines... ;)
     
    patco, Aug 26, 2013 IP
  3. vasvigupt

    vasvigupt Member

    Messages:
    13
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    31
    #3
    Yes I am sure. I have blocked all the urls before due to maintenance of website. Nothing was showing on Google. Then I have changed and after 10 days these urls are listed. Do you have any idea why that page is listed. Have I done in correct way?
     
    vasvigupt, Aug 26, 2013 IP
  4. heliumc

    heliumc Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #4
    May be Google has been crawled your page before you update robots.txt file. Wait for next crawling of Google and see what happens? Or you can use Google webmaster tool to remove your page from Google. Just go into Google webmaster tool and there is an option of Google Index there is the option to remove URL's just add your that page after this your page will be de listed from google.
     
    heliumc, Aug 26, 2013 IP
  5. sailvanetwork

    sailvanetwork Greenhorn

    Messages:
    13
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    23
    #5
    try this, Disallow: /user remove the last slash.

    and crawl ≠ index completely
     
    sailvanetwork, Aug 26, 2013 IP
    Arick unirow likes this.
  6. #6
    Actually there is nothing wrong with Google's crawl and index mechanism.
    It seems your robots.txt is what cause the 'unexpected result'. If you look carefully, Your robots.txt only block '/user/' and not '/user'. Please understand if '/user/' and '/user' are two different things.
    To solve the problem, follow the advice as what already said by comment above me.

    Solution:
    1. Add more line
      Disallow: /user
      HTML:
    2. Add rel="noindex" in your 'not for indexed' pages.
      If you did this, there would be completely no 'links' in Search Engine. If you do only the first method, Whenever SE would say:
      A description for this result is not available because of this site's robots.txt – learn more.
      HTML:
    I prefer the second method than the first as it would be more efficient and all follow-able links could be crawled (more authority while keeping the pages 'not being indexed'). However, you should apply both methods to get the best possible result.
    For your indexed pages, you can request to 'delete' that page in 'Webmaster Tools'.
     
    Last edited: Aug 27, 2013
    Arick unirow, Aug 26, 2013 IP
  7. sailvanetwork

    sailvanetwork Greenhorn

    Messages:
    13
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    23
    #7
    it should be <meta name="robots" content="NOINDEX" />

    more info,
    meta tags is define what the page is, rel is explanation what is the relation with other pages.
     
    sailvanetwork, Aug 26, 2013 IP
  8. Arick unirow

    Arick unirow Acclaimed Member

    Messages:
    719
    Likes Received:
    298
    Best Answers:
    30
    Trophy Points:
    500
    #8
    Once again, you were correct about meta tags and the use of 'rel' (relation between link).
    For OP, you can use meta tag created by @sailvanetwork above or this tag:
    <meta name="robots" content="noindex,follow"/>
    HTML:
    Using 'Rel' to 'NoIndex' page while keeping it 'Crawl-able' and Follow-able' in each link is recommended.
     
    Arick unirow, Aug 26, 2013 IP
  9. vasvigupt

    vasvigupt Member

    Messages:
    13
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    31
    #9
    Thanks sailvanetwork and Arick. Now I have corrected the error. Hope it will work.
     
    vasvigupt, Aug 27, 2013 IP