Does DMOZ have a bot? Recently, I disallowed must bots from crawling my site with the exception of 1. My personal blog -- Q's Wire... is listed with DMOZ, but I have been unable to get anything else on my site listed. DMOZ appears to only list the main url of the part of a site it wants and does not appear to need a bot to crawl other areas of the site. Is this a true statement? Because I have been trying to get my business site listed, I am wondering if my ban on bots could affect my chances of getting listed? Q...
Actually, you're likely to see a DMOZ "bot" in your stats on a very rare occaison; and as Minstrel pointed out, it's an editor looking at your site after a submission or doing a checkup. However, they are some "spoof" DMOZ bots out there that the ODP has not been very diligent about putting a stop to.
Sites are reviewed manually and bots have no effect on that. Various automated tools are used that check to see if a link is dead and if it is then it is removed back to the pool of unreviewed sites for an editor to check manually again.
We have some terrific editing tools we can use that greatly increase our [SIZE=-1]efficiency [/SIZE]while editing. I don't know if those are considered bots, but I often feel like a bot while using them. Maybe that's what you see?
AND I SAW DMOZ/ODP BOT TODAY! is it the fake one? Host: 81.169.154.94 Http Code: 200 Date: May 14 08:49:50 Http Version: HTTP/1.1 Size in Bytes: 0 Referer: - Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; ODP entries test; http://tuezilla.de/test-odp-entries-agent.html)
As the link says - it's an ODP editor's personal qualtiy control bot. It appears that it only is used for areas of the directory where he edits. It's quite likely that individual ODP editors may have tools/bots that aid them in reviewing and checking sites. I'm developiog one to aid in sifting out spam in unreviewed in the categories where I edit.
Pages listed in the ODP will be visited occasionally by Robozilla, the project's link checker. It's best not to disallow Robozilla. It could result in the link being marked dead and removed from the directory automatically. Not being technically-minded, I don't know if that will always happen. But I know that it happened to someone who reported at RZ that his/her listing had vanished. There are also other link-checkers in operation, but I'm not sure how they appear in logs.
Great question, and since I did not know, I did a quick run on Tulip Chain and then checked my logs. See Tulip Chain
OMG your bot contains "Java/1.5.0_04" A LOT of people blog the bot identified with "Java/" because there are some bot eat bandwidth without hit robots.txt I think I have to check some thing and allow tulip chain. LOL
Not directly related to DMOZ but related to this discussion: I think in general mass blocking of bots in robots.txt is a bad idea because the ones you have to worry about are not likely to pay any attention to robots.txt anyway, except to get a list of which directories you're blocking. In most cases, the best robots.txt file is one that says simply: User-agent: * Disallow: Code (markup): One other item I often see in suggested "bots to block" is Xenu. This is a popular dead link checker used by many webmasters for directories, resource sites, and link partners. I use it myself. If Xenu is blocked, it will return either an error or "forbidden". Now, if it's the latter and it's not a busy day, I may take the time to go and check manually with a browser or I may not; if it's a busy day, I'll probably just delete the link from my site. Thus, the effect of blocking Xenu may well be that you'll lose a lot of existing backlinks to your site. Moral of the story: Unless you know what you're blocking, don't block it.
What does not make sense? To quote # URLs that are listed in the subset of the Open Directory that I am working on at the time # when a listed URL redirects to another URL, also that URL is requested. e.g it's not a full fledged bot that does all of DMOZ
That particular editor can edit every site in ODP. That's why "It appears that it only is used for areas of the directory where he edits" doesn't make sense.
As ishfish said, the editor who runs that bot has access to the entire directory: English, German, Russian, Chinese, French, Esperanto, Italian, Hindi, etc. Just this once, I agree with what minstrel says. It generally does not makes sense to use a robots.txt file to block robots -- unless, of course, you don't want people to be able to find your site.
yeah, i don't use robots.txt file to block any bot, but I do block some files and directoies. and most techie people don't use robots.txt to block, but they use .htaccess to block. LOL seem that bad bad never respected robots.txt anyway.