1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Help needed - robts.txt and sitemap

Discussion in 'robots.txt' started by jack_sparrow, Aug 3, 2007.

  1. #1
    Hi,

    Can someone advice on robots.txt file. M site has a sitemap both xml and html which works fine with google, yahoo and msn. I do not have any robots.txt file. However some search engine repeatedly looks for this file.

    I need help in a simple robots.txt file to direct all robots to the xml or html file.

    Thanks in advance.

    Jack
     
    jack_sparrow, Aug 3, 2007 IP
  2. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #2
    pls visit robotstxt.org to learn more about robots.txt
     
    trichnosis, Aug 6, 2007 IP
  3. adone

    adone Peon

    Messages:
    190
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    hi jack,

    I think you need to create a robots.txt file for your sites. Every search engine first find your robots.txt file in you files.

    As far as concerned about the xml and html sitemap both are important, but for the point of view of search engine, you must xml map because search engine crawl easily and will get your new pages indexed by xml sitemap.

    bye
     
    adone, Oct 15, 2007 IP
  4. Ladadadada

    Ladadadada Peon

    Messages:
    382
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    0
    #4
    There are now two purposes for a robots.txt file. The first (and main) one is to tell robots which parts of your site they should NOT view.

    The second purpose is a more recent addition to the robots.txt standard and is to let robots know where your sitemap file is. If the robots are finding your sitemap file already, then there isn't much need to add it's location to your robots.txt file, but it won't hurt.
     
    Ladadadada, Oct 18, 2007 IP
  5. visionfez

    visionfez Peon

    Messages:
    84
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    visionfez, Oct 21, 2007 IP
  6. Mr_Kumar

    Mr_Kumar Notable Member

    Messages:
    2,561
    Likes Received:
    374
    Best Answers:
    1
    Trophy Points:
    265
    Articles:
    4
    #6
    Forget robots.txt file. It is nothing important.

    Learn more on sitemap specially if you site have thousands of pages. make more than one sitemaps if needed. :)

    I guess I am bit late to reply here.
     
    Mr_Kumar, Nov 13, 2007 IP
  7. Kuldeep1952

    Kuldeep1952 Active Member

    Messages:
    290
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    60
    #7
    It is always a good practice to have a robots.txt file. If you have nothing
    to enter in it, you can create a blank file. It will prevent the redundant
    404 errors. Another file which you should have on the server to reduce
    404 errors is favicon.ico.
     
    Kuldeep1952, Nov 15, 2007 IP
  8. reza_24

    reza_24 Member

    Messages:
    98
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #8
    reza_24, Nov 15, 2007 IP
  9. Ibrahim Al Mohanna

    Ibrahim Al Mohanna Peon

    Messages:
    101
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I did not understand anything. Could you explain waht should I type in it?
     
    Ibrahim Al Mohanna, Nov 16, 2007 IP
  10. Michael2007

    Michael2007 Guest

    Messages:
    15
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    If you need every page to be indexed, you can use the following info in the txt.fie:
    User-agent: *
    Disallow:
     
    Michael2007, Nov 26, 2007 IP
  11. janwei

    janwei Banned

    Messages:
    161
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    He's right. robotstxt.org is realy good.
     
    janwei, Dec 7, 2007 IP
  12. prlinker

    prlinker Peon

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    robots.txt has no link with the sitemap file
    your sitemap shd be sitemap.xml for google

    for yahoo its a text file
     
    prlinker, Dec 24, 2007 IP
  13. pssolanki86

    pssolanki86 Well-Known Member

    Messages:
    905
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    135
    #13
    create simple robots.txt file and sitemap on ur website

    If u want help then I can do for u
     
    pssolanki86, Dec 26, 2007 IP
  14. agrawat

    agrawat Banned

    Messages:
    491
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #14
    i think sitemap.xml more acceptable and preferable for by most SE. robots.txt maninly prevent your site from bad boots who consume your bandwith but if bandwidth is not a issue for your website than you need not want robots.txt
     
    agrawat, Dec 26, 2007 IP
  15. shimon333

    shimon333 Guest

    Messages:
    53
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #15
    in robots.txt you tell tje search engin not to go to parts in you site, but we want that google will see all of our site so ' i dont put robots.txt anywhere
     
    shimon333, Jan 7, 2008 IP
  16. SwapsRulez

    SwapsRulez Peon

    Messages:
    32
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Just create the robots.txt file in your root directory of the web space. & put the following code in that text file to allow all the robots to crawl your site..

    User-agent: *
    Disallow:
    Code (markup):
     
    SwapsRulez, Jan 12, 2008 IP
  17. chriszz

    chriszz Peon

    Messages:
    233
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Im not sure what robots.txt is used for
     
    chriszz, Jan 19, 2008 IP
  18. thetafferboy83

    thetafferboy83 Active Member

    Messages:
    312
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    70
    #18
    It is used to exclude pages from bots, such as search engines. For instance, if you wanted to have a specific page not shown in the search engines.

    You can normally get answers to simple questions like this by Googling [​IMG]
     
    thetafferboy83, Jan 22, 2008 IP
  19. catanich

    catanich Peon

    Messages:
    1,921
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Jack, you do not need a robots.txt file. We use it to tell the SEs NOT to index a directory or file. It is also used to tell some SEs where to fine the Site Map file.

    This is mine:

    # Robots.txt file created by 1/20/08
    # For domain: http://www.catanich.com
    #
    # All other robots will spider the domain
    User-agent: *
    Disallow: /_common/
    Disallow: /_private/
    Disallow: /_ScriptLibrary/
    Disallow: /_*/
    Sitemap: http://www.catanich.com/sitemap.xml.gz

    It also should be noted that a blank line in the robots.txt file will create an error.
     
    catanich, Feb 2, 2008 IP
  20. Ladadadada

    Ladadadada Peon

    Messages:
    382
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Does it ? I have never heard anything about a blank line causing an error but if it does it certainly could explain some of the strange behaviour that some crawlers exhibit.

    Presumably, when it causes an error the crawler will ignore the rest of the file below the blank line. I guess some crawlers may even throw the whole file out if they get an error.
     
    Ladadadada, Feb 10, 2008 IP