Does DMOZ have a bot?

Discussion in 'ODP / DMOZ' started by QiSoftware, Apr 23, 2006.

  1. #1
    Does DMOZ have a bot? Recently, I disallowed must bots from crawling my site with the exception of 1. My personal blog -- Q's Wire... is listed with DMOZ, but I have been unable to get anything else on my site listed.

    DMOZ appears to only list the main url of the part of a site it wants and does not appear to need a bot to crawl other areas of the site. Is this a true statement? Because I have been trying to get my business site listed, I am wondering if my ban on bots could affect my chances of getting listed?

    Q...
     
    QiSoftware, Apr 23, 2006 IP
  2. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #2
    Only the human kind. :)
     
    minstrel, Apr 23, 2006 IP
    wrmineo likes this.
  3. wrmineo

    wrmineo Peon

    Messages:
    3,087
    Likes Received:
    379
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Actually, you're likely to see a DMOZ "bot" in your stats on a very rare occaison; and as Minstrel pointed out, it's an editor looking at your site after a submission or doing a checkup.

    However, they are some "spoof" DMOZ bots out there that the ODP has not been very diligent about putting a stop to.
     
    wrmineo, Apr 23, 2006 IP
  4. brizzie

    brizzie Peon

    Messages:
    1,724
    Likes Received:
    178
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Sites are reviewed manually and bots have no effect on that. Various automated tools are used that check to see if a link is dead and if it is then it is removed back to the pool of unreviewed sites for an editor to check manually again.
     
    brizzie, Apr 23, 2006 IP
  5. QiSoftware

    QiSoftware Well-Known Member

    Messages:
    805
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    158
    #5
    Thanks for the information.

    Q...
     
    QiSoftware, Apr 23, 2006 IP
  6. compostannie

    compostannie Peon

    Messages:
    1,693
    Likes Received:
    347
    Best Answers:
    0
    Trophy Points:
    0
    #6
    We have some terrific editing tools we can use that greatly increase our [SIZE=-1]efficiency [/SIZE]while editing. I don't know if those are considered bots, but I often feel like a bot while using them. Maybe that's what you see? :)
     
    compostannie, Apr 23, 2006 IP
  7. tonyinabox

    tonyinabox Peon

    Messages:
    1,988
    Likes Received:
    42
    Best Answers:
    0
    Trophy Points:
    0
    #7
    AND I SAW DMOZ/ODP BOT TODAY! :) is it the fake one?

    Host: 81.169.154.94
    Http Code: 200 Date: May 14 08:49:50 Http Version: HTTP/1.1 Size in Bytes: 0
    Referer: -
    Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; ODP entries test; http://tuezilla.de/test-odp-entries-agent.html)
     
    tonyinabox, May 14, 2006 IP
  8. macdesign

    macdesign Peon

    Messages:
    568
    Likes Received:
    59
    Best Answers:
    0
    Trophy Points:
    0
    #8
    As the link says - it's an ODP editor's personal qualtiy control bot. It appears that it only is used for areas of the directory where he edits.

    It's quite likely that individual ODP editors may have tools/bots that aid them in reviewing and checking sites. I'm developiog one to aid in sifting out spam in unreviewed in the categories where I edit.
     
    macdesign, May 14, 2006 IP
  9. Genie

    Genie Peon

    Messages:
    192
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Pages listed in the ODP will be visited occasionally by Robozilla, the project's link checker. It's best not to disallow Robozilla. It could result in the link being marked dead and removed from the directory automatically. Not being technically-minded, I don't know if that will always happen. But I know that it happened to someone who reported at RZ that his/her listing had vanished. There are also other link-checkers in operation, but I'm not sure how they appear in logs.
     
    Genie, May 14, 2006 IP
    sidjf likes this.
  10. macdesign

    macdesign Peon

    Messages:
    568
    Likes Received:
    59
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Great question, and since I did not know, I did a quick run on Tulip Chain and then checked my logs.

    See Tulip Chain
     
    macdesign, May 14, 2006 IP
  11. tonyinabox

    tonyinabox Peon

    Messages:
    1,988
    Likes Received:
    42
    Best Answers:
    0
    Trophy Points:
    0
    #11
    OMG your bot contains "Java/1.5.0_04"

    A LOT of people blog the bot identified with "Java/" because there are some bot eat bandwidth without hit robots.txt

    I think I have to check some thing and allow tulip chain. LOL
     
    tonyinabox, May 14, 2006 IP
  12. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #12
    Not directly related to DMOZ but related to this discussion:

    I think in general mass blocking of bots in robots.txt is a bad idea because the ones you have to worry about are not likely to pay any attention to robots.txt anyway, except to get a list of which directories you're blocking. In most cases, the best robots.txt file is one that says simply:

    User-agent: *
    Disallow: 
    
    Code (markup):
    One other item I often see in suggested "bots to block" is Xenu. This is a popular dead link checker used by many webmasters for directories, resource sites, and link partners. I use it myself. If Xenu is blocked, it will return either an error or "forbidden". Now, if it's the latter and it's not a busy day, I may take the time to go and check manually with a browser or I may not; if it's a busy day, I'll probably just delete the link from my site. Thus, the effect of blocking Xenu may well be that you'll lose a lot of existing backlinks to your site.

    Moral of the story: Unless you know what you're blocking, don't block it.
     
    minstrel, May 14, 2006 IP
  13. ishfish

    ishfish Peon

    Messages:
    158
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #13
    That doesn't really make much sense...
     
    ishfish, May 14, 2006 IP
  14. macdesign

    macdesign Peon

    Messages:
    568
    Likes Received:
    59
    Best Answers:
    0
    Trophy Points:
    0
    #14
    What does not make sense?

    To quote

    # URLs that are listed in the subset of the Open Directory that I am working on at the time
    # when a listed URL redirects to another URL, also that URL is requested.

    e.g it's not a full fledged bot that does all of DMOZ
     
    macdesign, May 14, 2006 IP
  15. ishfish

    ishfish Peon

    Messages:
    158
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #15
    That particular editor can edit every site in ODP. That's why "It appears that it only is used for areas of the directory where he edits" doesn't make sense.
     
    ishfish, May 14, 2006 IP
  16. tonyinabox

    tonyinabox Peon

    Messages:
    1,988
    Likes Received:
    42
    Best Answers:
    0
    Trophy Points:
    0
    #16
    my website is not in his range to edit. very far. not german.
     
    tonyinabox, May 14, 2006 IP
  17. orlady

    orlady Peon

    Messages:
    126
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #17
    As ishfish said, the editor who runs that bot has access to the entire directory: English, German, Russian, Chinese, French, Esperanto, Italian, Hindi, etc.

    Just this once, I agree with what minstrel says. It generally does not makes sense to use a robots.txt file to block robots -- unless, of course, you don't want people to be able to find your site.
     
    orlady, May 14, 2006 IP
  18. tonyinabox

    tonyinabox Peon

    Messages:
    1,988
    Likes Received:
    42
    Best Answers:
    0
    Trophy Points:
    0
    #18
    yeah, i don't use robots.txt file to block any bot, but I do block some files and directoies.

    and most techie people don't use robots.txt to block, but they use .htaccess to block. LOL seem that bad bad never respected robots.txt anyway.
     
    tonyinabox, May 15, 2006 IP