1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Robot.txt file

Discussion in 'robots.txt' started by iluvm, Nov 30, 2005.

  1. #1
    Hello,
    Is a robot text file important? or is this an outdated idea as search engine may ignore them?
    Would it increase site visibility to the search engines? or would it make no difference?
     
    iluvm, Nov 30, 2005 IP
  2. mdvaldosta

    mdvaldosta Peon

    Messages:
    4,079
    Likes Received:
    362
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You should have one, some bots won't crawl without a robots.txt, and it's at least worth it because stats programs (like awstats) identify some search bots because of their hits on the robots file. Just upload a blank one, it's good practice.
     
    mdvaldosta, Nov 30, 2005 IP
  3. RobZZ

    RobZZ Peon

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Let me tell you that I can verify that the robots.txt file is extremely important.

    Basically, it's the first thing that most search engine bots are going to look for. If they don't find it, then they will leave and come back later to crawl the site....maybe.

    I recently made the mistake of failing to upload the robots.txt file to one of my larger sites when I did a site redesign. Subsequently, our web hosting company had an issue in the configuration of the web server, where unknown pages were not returning the 404 status header.

    So... the bot looks for robots.txt, it's not there, it ONLY assumes that it can continue crawling if it gets the 404 status header. If not, then it will come back later. In my case, the robot kept coming back to look for it and never found it, never got a 404 either, never crawled my site.

    I first dropped from MSN and had no idea why. Then I dropped from Google and it wasn't until I started looking at this more carefully that I discovered and fixed the 404 issue. I noticed that Googlebot and MSNbot had not been crawling the site. After the issues were resolved, I saw both bots in my logs and within a few days, I was re-indexed.
     
    RobZZ, Nov 30, 2005 IP
  4. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I agree with the above, if you dont want to limit access at least upload a black one.

    Also, without the blank file you will see a lot of 404 errors in the logs when the bots are looking for your robots.txt file.

    Good luck.
     
    ServerUnion, Nov 30, 2005 IP
  5. iluvm

    iluvm Peon

    Messages:
    165
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks for the info,
    I've had 766 hits of 404 this month alone!
    So if there was nothing I did not want indexed I would just place the following code in the robots file?
    Is there anything else you put in there? eg like block e mail collection bots?
     
    iluvm, Nov 30, 2005 IP
  6. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #6
    your double negatives make it hard to answer your question, but...

    If you want to give full access, place an empty file
     
    ServerUnion, Nov 30, 2005 IP
  7. malekov

    malekov Peon

    Messages:
    54
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #7
    like ppl say, its better if you have one well configured, but this doesnt mean robots wont index your site without that file.
     
    malekov, Nov 30, 2005 IP
  8. wrmineo

    wrmineo Peon

    Messages:
    3,087
    Likes Received:
    379
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Great points on the 404 issue!

    Also, if you have folders that you use for your purposes only, or a staging ground before you launch and then move up to the root directory, you're likely to get even more 404 errors » I recently experienced this first hand.

    Also, besides excluding folders you don't want accessed, you can also exclude some known "spam" bots or those not important to you.
     
    wrmineo, Nov 30, 2005 IP
  9. tb1234

    tb1234 Peon

    Messages:
    49
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Is there anyone who is facing a problem tht bluddy ghosty bots crawling the site & eating up your bandwidth.. i got some entries in my log like...

    Unknown robot (identified by 'spider')4070 Pages Crawled - 106 MB Crawled

    Unknown robot (identified by 'crawl')1806 Pages Crawled - 50 MB Crawled

    So is there someone has any idea on tht...
     
    tb1234, Jun 7, 2006 IP
  10. onlywin

    onlywin Greenhorn

    Messages:
    97
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    18
    #10
    the simplest solution is an empty robots.txt
     
    onlywin, May 21, 2009 IP
  11. jokarl

    jokarl Peon

    Messages:
    52
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    A robots.txt is good if you want to limit the bots and what they index. I always disallow my images from the bots so they dont waste time indexing all images. Other good things to use a robots file for is to specify where your sitemap is located.

    If you dont want to limit the bots you dont need a robots.txt really.
     
    jokarl, May 23, 2009 IP
  12. compwizards

    compwizards Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    You should still have a robots.txt file. It will help the bots to spider your site easily without having to work harder. Some bots also will just leave your site if they do not see a robots.txt file. That's the first thing they look for.
     
    compwizards, May 24, 2009 IP
  13. luizeba

    luizeba Active Member

    Messages:
    265
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #13
    Sure you need have a robots.txt file!
    It'll really help the spiders index your site!
     
    luizeba, May 24, 2009 IP
  14. linkdealer

    linkdealer Active Member

    Messages:
    138
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    90
    #14
    Robots.txt file is the first file that a crawler visits. It gives crawlers an idea what pages need to be indexed and what should be ignored.
     
    linkdealer, Jun 8, 2009 IP
  15. S.K.L.Technologies

    S.K.L.Technologies Banned

    Messages:
    56
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #15
    can any one tell me importance of robot.txt file
     
    S.K.L.Technologies, Jun 14, 2009 IP
  16. elladrone

    elladrone Peon

    Messages:
    116
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #16
    the jury is still out there - my take, based on some experiments I did - it tells search engines (mostly Google) what YOU want/don't want included in their index, but it is not really blocking them from accessing the folders. use .htaccess if you want to protect folders, and mask your plugins in WP.
     
    elladrone, Dec 28, 2009 IP
  17. rndm

    rndm Greenhorn

    Messages:
    51
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #17
    Interesting tidbit...take a look at webmasterworlds robots.txt file. It's pretty funny.
     
    rndm, Aug 21, 2011 IP
  18. risteard

    risteard Peon

    Messages:
    9
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    It is not an outdated factor for SEO purpose.
    It is quite interesting one still to tell the Search Engines Robots to crawl which file or to not.
     
    risteard, Aug 24, 2011 IP
  19. smartsolution

    smartsolution Peon

    Messages:
    11
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #19
    I think it is essential
     
    smartsolution, Aug 27, 2011 IP
  20. mrhistorian

    mrhistorian Peon

    Messages:
    15
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #20
    if you are asking for Google only, yes it's important but not crucial.
     
    mrhistorian, Sep 12, 2011 IP