Robots.txt

Discussion in 'Search Engine Optimization' started by aljosabre, Mar 29, 2009.

  1. #1
    I was wondering .. how would 'normal' robots.txt look like ?
    Like this ?
    User-Agent: *
    Allow: /
    Sitemap: http://****.net/sitemap.xml.gz
    Code (markup):
    Or like this (Which is mine right now)
    User-agent: *
    Disallow:
    Sitemap: http://****.net/sitemap.xml.gz
    Code (markup):
    Might be that an issue of my site not being 'backlinked', that is not showing any backlinks (although i have them) on google link: search?
     
    aljosabre, Mar 29, 2009 IP
  2. Camay123

    Camay123 Well-Known Member

    Messages:
    3,423
    Likes Received:
    86
    Best Answers:
    0
    Trophy Points:
    160
    #2
    Second one is most common
     
    Camay123, Mar 29, 2009 IP
  3. MrPJH

    MrPJH Well-Known Member

    Messages:
    1,066
    Likes Received:
    7
    Best Answers:
    1
    Trophy Points:
    155
    #3
    sorry accept apologies but i cant stop myself asking a question

    what the usage of robots and what will be happen if i save robots.txt file in my website containing the code mentioned above
     
    MrPJH, Mar 29, 2009 IP
  4. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #4
    Friendly spiders like Googlebot, Slurp (from Yahoo!) etc. look at your robots.txt file before crawling your site to determine which files/folders you do NOT want indexed. Unfortunately, bad bots will ignore your robots.txt file and crawl anything they feel like.

    By default they consider your entire site indexible unless you tell them otherwise with your robots.txt.

    All disallows are relative to the root of your web. You cannot disallow sub-domains or particular protocols via robots.txt. Only files or folders below relative to the root of your web.
     
    Canonical, Mar 29, 2009 IP
  5. woz2

    woz2 Peon

    Messages:
    79
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #5
    There's a good explanation of robots.txt at:

    http://www.google.com/support/webmasters/bin/answer.py?answer=40360&hl=en

    Basically, robot.txt is a request from the webmaster to the robot (such as Googlebot) to not take certain files or folder into account when crawling the site.
     
    woz2, Mar 29, 2009 IP
  6. jitendraag

    jitendraag Notable Member

    Messages:
    3,982
    Likes Received:
    324
    Best Answers:
    1
    Trophy Points:
    270
    #6
    Allow is not a standard syntax in Robots.txt. Use Disallow with blank arguments.
     
    jitendraag, Mar 29, 2009 IP
  7. mp3jammer.com

    mp3jammer.com Peon

    Messages:
    40
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    so robot.txt is same as sitemap right?
     
    mp3jammer.com, Mar 29, 2009 IP
  8. kutekutta

    kutekutta Peon

    Messages:
    807
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #8
    No. Both are different.

    Sitemap contains list of all the pages in your site. Bots will easily crawl all the pages you have mentioned in sitemap.

    If you want to restrict the bots for some sensitive pages so you could use robots.txt
     
    kutekutta, Mar 29, 2009 IP