1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Wow is ROBOTS.txt important?

Discussion in 'robots.txt' started by skionxb, Jul 26, 2006.

  1. #1
    I am curious, if it's necessary to have robots.txt in your root directory. Let’s say I don't want to disallow anything, I don't have any hidden places :)

    Currently I have that in my robots.txt
    User-agent: *
    Basically I am telling the crawlers to crawl my whole site. Isn’t considered time wasting? Does it help to have that in the root directory? Because on of my colleague, says, it's not important at all, because there are lots of chances to make an error, thus you could loose all indexed pages from SEPRs. In my opinion, when SE spiders come to any site, the first place the go is to robots.txt

    Another thing, we all know that spiders in the future will be able to crawl CSS and JavaScipts external files. Would it be correct to place these to prohibit crawling the files?

    Disallow: /scripts.js
    Disallow: /styles.css

    So having User-agent: * in your robots.txt or <meta name="robots" content="index,follow"> are just waste of time?

    What do you think?
     
    skionxb, Jul 26, 2006 IP
  2. atiqi36

    atiqi36 Well-Known Member

    Messages:
    178
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    103
    #2

    i agree with you, i dont even have robot.txt file in any of my websites but they r good on SERPs and indexed well
     
    atiqi36, Jul 26, 2006 IP
  3. explorer

    explorer Well-Known Member

    Messages:
    463
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    110
    #3
    You're absolutely right, sites can do very well in the SERPs without a robots.txt.

    Having a robots.txt - even a very basic one - does stop your error logs being filled with messages like this:

    [error] [client 72.30.252.152] File does not exist: /home2/you/public_html/robots.txt

    (This is an error log from Yahoo's Inktomi Bot looking for a robots.txt and not finding it.)
     
    explorer, Sep 17, 2006 IP
  4. bnts

    bnts Well-Known Member

    Messages:
    2,329
    Likes Received:
    310
    Best Answers:
    0
    Trophy Points:
    165
    #4
    I dont think robot.txt is absolutely necessary, but its an addon if you have it..Its really simple to put up one..:)
     
    bnts, Sep 21, 2006 IP
  5. rockinaway

    rockinaway Guest

    Best Answers:
    0
    #5
    It is good to block bad bots though which can cause problems.

    One question, for example on a forum and there is the admin files folder, would you block that from the spiders?
     
    rockinaway, Sep 22, 2006 IP
  6. bnts

    bnts Well-Known Member

    Messages:
    2,329
    Likes Received:
    310
    Best Answers:
    0
    Trophy Points:
    165
    #6

    Wel, I think I block user/bin/ directory. I dont have much ideas on this unix thing. A guy told me to block that, that y i did...:(
     
    bnts, Sep 22, 2006 IP
  7. p.mukherjee

    p.mukherjee Banned

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I think robot.txt is not necessary, but its an addon if you have it
     
    p.mukherjee, Sep 24, 2006 IP
  8. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #8
    Well if you have an admin then its important to have so they do not crawl the admin.
     
    TheSyndicate, Sep 24, 2006 IP
  9. 3l3ctr1c

    3l3ctr1c Peon

    Messages:
    380
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #9
    "Well if you have an admin then its important to have so they do not crawl the admin."

    Yea and if bad minders (hackers) peek into the robots.txt file (Which most do),
    they can see what you didn`t wanted them to see...

    So password protecting the DIR is better instead of placing a robots.txt, yea but both of them can give more and better increased security.
     
    3l3ctr1c, Sep 29, 2006 IP
  10. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #10
    Yes sure a hacker can see where the admin is but the google will not serach it.
     
    TheSyndicate, Sep 29, 2006 IP
  11. Ibn Juferi

    Ibn Juferi Prominent Member

    Messages:
    6,221
    Likes Received:
    365
    Best Answers:
    0
    Trophy Points:
    310
    #11
    Well I use robots.txt, even a basic one, to stop those annoying logs from appearing. Seems to be that the spiders will always look for the robots.txt file first and if they don't, an error ensues. And I have a lot of spiders sniffing at my sites.

    - MENJ
     
    Ibn Juferi, Oct 1, 2006 IP
  12. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #12
    and is Rotbot.txt and Robots in the meta like index,follow all the same thing should you do both?:confused:
     
    TheSyndicate, Oct 1, 2006 IP
  13. master06

    master06 Peon

    Messages:
    2,806
    Likes Received:
    121
    Best Answers:
    0
    Trophy Points:
    0
    #13
    im using robots.txt its very important for block some bad bots.
     
    master06, Oct 3, 2006 IP
  14. Binko

    Binko Peon

    Messages:
    27
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #14
    A missing robots.txt will not prevent your pages from being indexed. The only thing you will notice with a missing robots.txt is a 404 error in the log files.

    If you have a robots.txt you can however block crawlers from spidering directories marked as "disallow" in your robots.txt file.

    In other words, a robots.txt file is used to "disallow" crawlers from indexing your pages but is not needed to "allow".

    I use robots.txt to prevent the crawlers from indexing private directories and member directories. It can also be done in the html page itseld with a META Robots tag.
     
    Binko, Oct 3, 2006 IP
  15. Franck S

    Franck S Peon

    Messages:
    775
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I'm glad I found this thread. I was wondering if robot txt is important.

    If I understood, I shouldn't bother with that?
     
    Franck S, Nov 18, 2006 IP
  16. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #16
    So using

    I use robots.txt to prevent the crawlers from indexing private directories and member directories. It can also be done in the html page itseld with a META Robots tag.

    Means you not to need to use a text file?
     
    TheSyndicate, Nov 18, 2006 IP
  17. phree_radical

    phree_radical Peon

    Messages:
    563
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #17
    One or the other is okay. It's just an option AFAIK. However, if you use robots.txt then you can use some features such as wildcards (for some of the SE's bots, Google included). If you use the meta tag, you can dynamically generate pages without having to edit robots.txt.

    Also, I'm not sure, but... I think that a bot actually has to download the file before it can get at the meta tag.

    If you don't need to block bots at all, I figure having a robots.txt just wastes bandwidth?

    Robots.txt is USEFUL for disallowing/de-indexing pages such as login or useless pages that suck the PR out of your site.
     
    phree_radical, Nov 19, 2006 IP
  18. weknowtheworld

    weknowtheworld Guest

    Messages:
    306
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Robots.txt : I think if it helps Search Engines to block something from my site, its useless for me....

    I want search engines to understand and spider each and every corner of my site.
     
    weknowtheworld, Nov 24, 2006 IP
  19. Robert Allen

    Robert Allen Peon

    Messages:
    2,685
    Likes Received:
    247
    Best Answers:
    0
    Trophy Points:
    0
    #19
    What is the point? The more it indexs, the more traffic you will get. I have never used robots.txt once, and i removed it after a month.

    Robots.txt isnt imporrtant, nor do i use it.

    Rob
     
    Robert Allen, Nov 24, 2006 IP
  20. Scolls

    Scolls Guest

    Messages:
    70
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #20
    It's just a way of regulating spider access to your site. It can actually save you bandwidth! For example, you might want to block certain bots, like some known email harvesters that are actually well-behaved, and honour robots.txt.
    In this case, it's a huge saving on bandwidth to have them only download your little robots.txt file rather than crawl your entire site.
    In the case of running an SE, you might not want your result pages to be crawled, so you could exclude these should people link to them.
    Perhaps you might also want to exclude images folders, etc.

    It's useful, but not mandatory. If you're not worried about bandwidth, nor what spider visits where, then you may omit it. But if you work out a good one for your site, you can save a considerable amount of bandwidth!
     
    Scolls, Dec 5, 2006 IP