Problem with robots.txt

Discussion in 'robots.txt' started by Coldfaun, Jul 12, 2010.

  1. #1
    I have like this:

    User-agent: *
    Allow: /

    Disallow: /page.html
    Disallow: /page2.html

    Basically I want to remove from its index the 2 pages.
    However I've seen lots of pages removed from google index lately, so is my robots.txt wrong?
     
    Coldfaun, Jul 12, 2010 IP
  2. DoDo Me

    DoDo Me Peon

    Messages:
    2,257
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    0
    #2
    User-agent: *
    Disallow: /page.html
    Disallow: /page2.html
    Allow: /

    is better

    btw, robots.txt is CaSe SenSiTive

    check mine http://referer.us/robots.txt
     
    DoDo Me, Jul 12, 2010 IP
  3. Coldfaun

    Coldfaun Active Member

    Messages:
    57
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    91
    #3
    Do I have to add Disallow: /page2.html/ (slash) after html?
     
    Coldfaun, Jul 12, 2010 IP
  4. maltadude

    maltadude Peon

    Messages:
    24
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The disallow command in the robots.txt file is not the best way of removing pages from Google's index. You will be simply instructing Google not to visit those pages again, not to remove them. They might be de-indexed over a long period of time, but it's definitely not the best solution.

    I would simply add <meta name="robots" content="noindex, nofollow"> in the <head> section of that page, unless you want to manually remove them using Google Webmaster Tools. ONLY once they are out of the index, I would add the disallow command in the robots.txt.

    Hope this helps :)
     
    maltadude, Jul 12, 2010 IP
  5. AirForce1

    AirForce1 Peon

    Messages:
    1,325
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Personally, I would like to use <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> inside head part of those 2 pages.

    Have a nice day,
     
    AirForce1, Jul 12, 2010 IP
  6. snow

    snow Member

    Messages:
    208
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    33
    #6
    I want to know basically what is robots.txt
     
    snow, Jul 12, 2010 IP
  7. makeit easy

    makeit easy Active Member

    Messages:
    2,067
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #7
    <META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> is better.


    As for the robots.txt;

    This is not necessary:
    User-agent: *
    Allow: /

    The correct syntax of your robots.txt:

    User-agent: *
    Disallow: /page.html
    Disallow: /page2.html
     
    Last edited: Jul 12, 2010
    makeit easy, Jul 12, 2010 IP
  8. ledb2b

    ledb2b Guest

    Messages:
    66
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    if you removed from google index, you can try to use google webmaster tool.
     
    ledb2b, Jul 12, 2010 IP
  9. Groovystar

    Groovystar Peon

    Messages:
    596
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Best way is change the name of the page. You upload an identical page to the one you want to deindex, put the robots.txt to disallow that page then change the links to it elsewhere on your page. Then delete the original.
     
    Groovystar, Jul 12, 2010 IP
  10. Coldfaun

    Coldfaun Active Member

    Messages:
    57
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    91
    #10
    Is there any difference if I use NoIndex or NOINDEX?
    I know is not case sensitive but not sure if it works as NoInDeX or just case sensitive meaning NOINDEX and noindex.
     
    Coldfaun, Jul 13, 2010 IP
  11. makeit easy

    makeit easy Active Member

    Messages:
    2,067
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #11
    It differs for html or xhtml validation. I don't know if some search engines are case sensitive. I always write all of my codes with small letters. I think small letters must be accepted by all of the search engines.
     
    makeit easy, Jul 13, 2010 IP
  12. rentacampervan

    rentacampervan Greenhorn

    Messages:
    20
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    13
    #12
    You can use this meta code into the separate page of page.html and page2.html <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> or you can use robots.txt, you can write..
    # /robots.txt for your website name

    User-agent: *
    Allow: /
    Disallow: /page.html/
    Disallow: /page2.html/

    sitemap: your website name/sitemap.xml

    gud luck....
     
    rentacampervan, Jul 13, 2010 IP
  13. COLO

    COLO Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Changing the robots file will apply to crawlers hitting your site and navigating/indexing the content.
    But if a crawler comes from an external referrer (a deep link from an off domain page to your page.html, or page2.html it'll still get indexed).

    You'll want to also add the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> to the page source on those pages...
     
    COLO, Jul 14, 2010 IP
  14. kooner001

    kooner001 Peon

    Messages:
    265
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    go to google webmaster tools and remove and de index them simply or delete them and make a little change in the slug and apply noindex command above mentioned that will help u lot bcoz if ur not looking it to get indexed in google means u dont want pepole to see you delete it will not create any problem
     
    kooner001, Jul 17, 2010 IP
  15. Groovystar

    Groovystar Peon

    Messages:
    596
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I thought robots.txt used "disallow" not "noindex"?
     
    Groovystar, Jul 17, 2010 IP