I think his site costs much more than that even now. Google still doesn't rule the world, not even the internet. These guys can afford being out of the index..
from the link DA posted: I think anyone that has been around for a while realizes that .txt is usually ignored. Especially by "rogue bots" and rippers.
There's still the strange choice of using a robots.txt file -- think about it: what he has done is ban all the good bots that pay attention to robots.txt and hand the forum over to the bad bots that don't. If he doesn't want ANY bots, he's done it the wrong way. If he doesn't want any BAD bots, he's done it the wrong way. If he just wants to ban Google, MSNSearch, Yahoo!, Ask Jeeves, and the others that obey robots.txt, then he's done it the right way. That interview doesn't change my mind about that - it doesn't address the question at all. Edit: Damn that fast-typing Crazy_Rob!
Im confused. From the article the problem is too many spiders - but he is trying to make the site more spiderable according to that article.
Here is what I do. I have a hidden link to a php file on my page. This php file is forbidden of course in my robots.txt.That file records all the ips to a text file which is checked every 5 mins by a cron job that blocks those ips in iptables. Now it gets a little bit trickier when you have people using random proxies but its still done... I currently have over 1 million wav files and several hundred thousand gifs on a server so trust I have experience with this
Am i asking the impossible if you could make a tutorial on this Shoemoney? Your method sure sounds good. If you don't want or have the time to do a tut on this do you know a link or something that can help us that want to block out these type of bots and crawlers?
I just checked alexa and his new traffic is down tremendously, almost one million per day if I'm reading the thing right and the trend looks like the NASDAQ did during the 1999 - 2000 stock market crash. Sure any forum can survive without search engine traffic. They might just survive a lot smaller which appears to be the goal here.
WebmasterWorld has started to allow bots to spider again. Although their robots.txt file is cloaked, so you will only see it if you spoof your user agent. Was a fun month traffic-wise for digitalpoint.com though. Just goes to show what the difference between one ranking in search results can do for ya (there are thousands of terms where digitalpoint.com is 2nd to webmasterworld.com).
And it looks like DP is officially the most populer webmaster site on the internet, since WMW will keep crashing... I posted a link over there to there Alexa stats page showing there numbers crash in half, and with in two minutes...the post was deleted!!!!
Hehe... nope. Their traffic will be back to normal within a week or two. Perfect example though... [search=google]adsense forum[/search]
I noticed they are already showing over 280 thousand pages indexed in Google. However, most are just line enteries without a description. I think they had over two million internal pages indexed in Google before they put up the blanket "robots.txt" ban page. You may be right about it only taking a couple of weeks to get reindexed, but Google, Yahoo and MSN are going to eat up a bunch of bandwidth the next couple of weeks just getting many of the old archived pages back into the index.
Their IBLs are very old and have grown steadily over the years, it will take a little longer to pass them by.
How did they get the PR back so fast??!!! PR 7, and there all back.... http://www.google.com/search?num=10...:en-US:official&q=site:www.webmasterworld.com Results 1 - 100 of 145,000 English pages Bah!! They cheated!!!!!
I noticed that all my flash files in my swf folder were showing up doing a site: check, so I added a disallow parameter in my robots.txt That was over a week ago, and absolutely nothing has changed... so I agree with Minstrel, they seem to be getting a special treatment from Google