I've seen lots of odd behavior from Google's URL Removal Console. It's deleted pages that should not have been deleted -- and it's failed to delete many pages it was specifically asked to delete. This looks like a bug to me, but I could be misreading the robots.txt spec. I chose to delete pages based upon a robots.txt file. The software misread the robots.txt file somehow and deleted the entire web site. Here's the code: User-agent: Alexibot User-agent: Aqua_Products User-agent: BackDoorBot User-agent: BackDoorBot/1.0 User-agent: Black.Hole User-agent: BlackWidow User-agent: BlowFish User-agent: BlowFish/1.0 User-agent: Bookmark search tool User-agent: Bot mailto:craftbot@yahoo.com User-agent: BotALot User-agent: BotRightHere User-agent: BuiltBotTough User-agent: Bullseye User-agent: Bullseye/1.0 User-agent: BunnySlippers User-agent: Cegbfeieh User-agent: CheeseBot User-agent: CherryPicker User-agent: CherryPickerElite/1.0 User-agent: CherryPickerSE/1.0 User-agent: ChinaClaw User-agent: Copernic User-agent: CopyRightCheck User-agent: Crescent User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 User-agent: Custo User-agent: DISCo User-agent: DISCo Pump 3.0 User-agent: DISCo Pump 3.2 User-agent: DISCoFinder User-agent: DittoSpyder User-agent: Download Demon User-agent: Download Demon/3.2.0.8 User-agent: Download Demon/3.5.0.11 User-agent: EirGrabber User-agent: EmailCollector User-agent: EmailSiphon User-agent: EmailWolf User-agent: EroCrawler User-agent: Express WebPictures User-agent: Express WebPictures (www.express-soft.com) User-agent: ExtractorPro User-agent: EyeNetIE User-agent: FairAd Client User-agent: Flaming AttackBot User-agent: FlashGet User-agent: FlashGet WebWasher 3.2 User-agent: Foobot User-agent: FrontPage User-agent: FrontPage [NC,OR] User-agent: Gaisbot User-agent: GetRight User-agent: GetRight/2.11 User-agent: GetRight/3.1 User-agent: GetRight/3.2 User-agent: GetRight/3.3 User-agent: GetRight/3.3.3 User-agent: GetRight/3.3.4 User-agent: GetRight/4.0.0 User-agent: GetRight/4.1.0 User-agent: GetRight/4.1.1 User-agent: GetRight/4.1.2 User-agent: GetRight/4.2 User-agent: GetRight/4.2b (Portuguxeas) User-agent: GetRight/4.2c User-agent: GetRight/4.3 User-agent: GetRight/4.5 User-agent: GetRight/4.5a User-agent: GetRight/4.5b User-agent: GetRight/4.5b1 User-agent: GetRight/4.5b2 User-agent: GetRight/4.5b3 User-agent: GetRight/4.5b6 User-agent: GetRight/4.5b7 User-agent: GetRight/4.5c User-agent: GetRight/4.5d User-agent: GetRight/4.5e User-agent: GetRight/5.0beta1 User-agent: GetRight/5.0beta2 User-agent: GetWeb! User-agent: Go!Zilla User-agent: Go!Zilla (www.gozilla.com) User-agent: Go!Zilla 3.3 (www.gozilla.com) User-agent: Go!Zilla 3.5 (www.gozilla.com) User-agent: Go-Ahead-Got-It User-agent: Googlebot-Image User-agent: GrabNet User-agent: Grafula User-agent: HMView User-agent: HTTrack User-agent: HTTrack 3.0 User-agent: HTTrack [NC,OR] User-agent: Harvest User-agent: Harvest/1.5 User-agent: Image Stripper User-agent: Image Sucker User-agent: Indy Library User-agent: Indy Library [NC,OR] User-agent: InfoNaviRobot User-agent: InterGET User-agent: Internet Ninja User-agent: Internet Ninja 4.0 User-agent: Internet Ninja 5.0 User-agent: Internet Ninja 6.0 User-agent: Iron33/1.0.2 User-agent: JOC Web Spider User-agent: JennyBot User-agent: JetCar User-agent: Kenjin Spider User-agent: Kenjin.Spider User-agent: Keyword Density/0.9 User-agent: Keyword.Density User-agent: LNSpiderguy User-agent: LeechFTP User-agent: LexiBot User-agent: LinkScan/8.1a Unix User-agent: LinkScan/8.1a.Unix User-agent: LinkWalker User-agent: LinkextractorPro User-agent: MIDown tool User-agent: MIIxpc User-agent: MIIxpc/4.2 User-agent: MSIECrawler User-agent: Mass Downloader User-agent: Mass Downloader/2.2 User-agent: Mata Hari User-agent: Mata.Hari User-agent: Microsoft URL Control User-agent: Microsoft URL Control - 5.01.4511 User-agent: Microsoft URL Control - 6.00.8169 User-agent: Microsoft.URL User-agent: Mister PiX User-agent: Mister PiX version.dll User-agent: Mister Pix II 2.01 User-agent: Mister Pix II 2.02a User-agent: Mister.PiX User-agent: NICErsPRO User-agent: NPBot User-agent: NPbot User-agent: Navroad User-agent: NearSite User-agent: Net Vampire User-agent: Net Vampire/3.0 User-agent: NetAnts User-agent: NetAnts/1.10 User-agent: NetAnts/1.23 User-agent: NetAnts/1.24 User-agent: NetAnts/1.25 User-agent: NetMechanic User-agent: NetSpider User-agent: NetZIP User-agent: NetZip Downloader 1.0 Win32(Nov 12 1998) User-agent: NetZip-Downloader/1.0.62 (Win32; Dec 7 1998) User-agent: NetZippy+(http://www.innerprise.net/usp-spider.asp) User-agent: Octopus User-agent: Offline Explorer User-agent: Offline Explorer/1.2 User-agent: Offline Explorer/1.4 User-agent: Offline Explorer/1.6 User-agent: Offline Explorer/1.7 User-agent: Offline Explorer/1.9 User-agent: Offline Explorer/2.0 User-agent: Offline Explorer/2.1 User-agent: Offline Explorer/2.3 User-agent: Offline Explorer/2.4 User-agent: Offline Explorer/2.5 User-agent: Offline Navigator User-agent: Offline.Explorer User-agent: Openbot User-agent: Openfind User-agent: Openfind data gatherer User-agent: Oracle Ultra Search User-agent: PageGrabber User-agent: Papa Foto User-agent: PerMan User-agent: ProPowerBot/2.14 User-agent: ProWebWalker User-agent: Python-urllib User-agent: QueryN Metasearch User-agent: QueryN.Metasearch User-agent: RMA User-agent: Radiation Retriever 1.1 User-agent: ReGet User-agent: RealDownload User-agent: RealDownload/4.0.0.40 User-agent: RealDownload/4.0.0.41 User-agent: RealDownload/4.0.0.42 User-agent: RepoMonkey User-agent: RepoMonkey Bait & Tackle/v1.01 User-agent: SiteSnagger User-agent: SlySearch User-agent: SmartDownload User-agent: SmartDownload/1.2.76 (Win32; Apr 1 1999) User-agent: SmartDownload/1.2.77 (Win32; Aug 17 1999) User-agent: SmartDownload/1.2.77 (Win32; Feb 1 2000) User-agent: SmartDownload/1.2.77 (Win32; Jun 19 2001) User-agent: SpankBot User-agent: Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux User-agent: SuperBot User-agent: SuperBot/3.0 (Win32) User-agent: SuperBot/3.1 (Win32) User-agent: SuperHTTP User-agent: SuperHTTP/1.0 User-agent: Surfbot User-agent: Szukacz/1.4 User-agent: Teleport User-agent: Teleport Pro User-agent: Teleport Pro/1.29 User-agent: Teleport Pro/1.29.1590 User-agent: Teleport Pro/1.29.1634 User-agent: Teleport Pro/1.29.1718 User-agent: Teleport Pro/1.29.1820 User-agent: Teleport Pro/1.29.1847 User-agent: TeleportPro User-agent: Telesoft User-agent: The Intraformant User-agent: The.Intraformant User-agent: TheNomad User-agent: TightTwatBot User-agent: Titan User-agent: True_Robot User-agent: True_Robot/1.0 User-agent: TurnitinBot User-agent: TurnitinBot/1.5 User-agent: URL Control User-agent: URL_Spider_Pro User-agent: URLy Warning User-agent: URLy.Warning User-agent: VCI User-agent: VCI WebViewer VCI WebViewer Win32 User-agent: VoidEYE User-agent: WWW-Collector-E User-agent: WWWOFFLE User-agent: Web Image Collector User-agent: Web Sucker User-agent: Web.Image.Collector User-agent: WebAuto User-agent: WebAuto/3.40 (Win98; I) User-agent: WebBandit User-agent: WebBandit/3.50 User-agent: WebCapture 2.0 User-agent: WebCopier User-agent: WebCopier v.2.2 User-agent: WebCopier v2.5 User-agent: WebCopier v2.6 User-agent: WebCopier v2.7a User-agent: WebCopier v2.8 User-agent: WebCopier v3.0 User-agent: WebCopier v3.0.1 User-agent: WebCopier v3.2 User-agent: WebCopier v3.2a User-agent: WebEMailExtrac.* User-agent: WebEnhancer User-agent: WebFetch User-agent: WebGo IS User-agent: WebLeacher User-agent: WebReaper User-agent: WebReaper [info@webreaper.net] User-agent: WebReaper [webreaper@otway.com] User-agent: WebReaper v9.1 - www.otway.com/webreaper User-agent: WebReaper v9.7 - www.webreaper.net User-agent: WebReaper v9.8 - www.webreaper.net User-agent: WebReaper vWebReaper v7.3 - www,otway.com/webreaper User-agent: WebSauger User-agent: WebSauger 1.20b User-agent: WebSauger 1.20j User-agent: WebSauger 1.20k User-agent: WebStripper User-agent: WebStripper/2.03 User-agent: WebStripper/2.10 User-agent: WebStripper/2.12 User-agent: WebStripper/2.13 User-agent: WebStripper/2.15 User-agent: WebStripper/2.16 User-agent: WebStripper/2.19 User-agent: WebWhacker User-agent: WebZIP User-agent: WebZIP/2.75 (http://www.spidersoft.com) User-agent: WebZIP/3.65 (http://www.spidersoft.com) User-agent: WebZIP/3.80 (http://www.spidersoft.com) User-agent: WebZIP/4.0 (http://www.spidersoft.com) User-agent: WebZIP/4.1 (http://www.spidersoft.com) User-agent: WebZIP/4.21 User-agent: WebZIP/4.21 (http://www.spidersoft.com) User-agent: WebZIP/5.0 User-agent: WebZIP/5.0 (http://www.spidersoft.com) User-agent: WebZIP/5.0 PR1 (http://www.spidersoft.com) User-agent: WebZip User-agent: WebZip/4.0 User-agent: WebmasterWorldForumBot User-agent: Website Quester User-agent: Website Quester - www.asona.org User-agent: Website Quester - www.esalesbiz.com/extra/ User-agent: Website eXtractor User-agent: Website eXtractor (http://www.asona.org) User-agent: Website.Quester User-agent: Webster Pro User-agent: Webster.Pro User-agent: Wget User-agent: Wget/1.5.2 User-agent: Wget/1.5.3 User-agent: Wget/1.6 User-agent: Wget/1.7 User-agent: Wget/1.8 User-agent: Wget/1.8.1 User-agent: Wget/1.8.1+cvs User-agent: Wget/1.8.2 User-agent: Wget/1.9-beta User-agent: Widow User-agent: Xaldon WebSpider User-agent: Xaldon WebSpider 2.5.b3 User-agent: Xenu's User-agent: Xenu's Link Sleuth 1.1c User-agent: Zeus User-agent: Zeus 11389 Webster Pro V2.9 Win32 User-agent: Zeus 11652 Webster Pro V2.9 Win32 User-agent: Zeus 18018 Webster Pro V2.9 Win32 User-agent: Zeus 26378 Webster Pro V2.9 Win32 User-agent: Zeus 30747 Webster Pro V2.9 Win32 User-agent: Zeus 32297 Webster Pro V2.9 Win32 User-agent: Zeus 39206 Webster Pro V2.9 Win32 User-agent: Zeus 41641 Webster Pro V2.9 Win32 User-agent: Zeus 44238 Webster Pro V2.9 Win32 User-agent: Zeus 51070 Webster Pro V2.9 Win32 User-agent: Zeus 51674 Webster Pro V2.9 Win32 User-agent: Zeus 51837 Webster Pro V2.9 Win32 User-agent: Zeus 63567 Webster Pro V2.9 Win32 User-agent: Zeus 6694 Webster Pro V2.9 Win32 User-agent: Zeus 71129 Webster Pro V2.9 Win32 User-agent: Zeus 82016 Webster Pro V2.9 Win32 User-agent: Zeus 82900 Webster Pro V2.9 Win32 User-agent: Zeus 84842 Webster Pro V2.9 Win32 User-agent: Zeus 90872 Webster Pro V2.9 Win32 User-agent: Zeus 94934 Webster Pro V2.9 Win32 User-agent: Zeus 95245 Webster Pro V2.9 Win32 User-agent: Zeus 95351 Webster Pro V2.9 Win32 User-agent: Zeus 97371 Webster Pro V2.9 Win32 User-agent: Zeus Link Scout User-agent: asterias User-agent: b2w/0.1 User-agent: cosmos User-agent: eCatch User-agent: eCatch/3.0 User-agent: hloader User-agent: httplib User-agent: humanlinks User-agent: larbin User-agent: larbin (samualt9@bigfoot.com) User-agent: larbin samualt9@bigfoot.com User-agent: larbin_2.6.2 (kabura@sushi.com) User-agent: larbin_2.6.2 (larbin2.6.2@unspecified.mail) User-agent: larbin_2.6.2 (listonATccDOTgatechDOTedu) User-agent: larbin_2.6.2 (vitalbox1@hotmail.com) User-agent: larbin_2.6.2 kabura@sushi.com User-agent: larbin_2.6.2 larbin2.6.2@unspecified.mail User-agent: larbin_2.6.2 larbin@correa.org User-agent: larbin_2.6.2 listonATccDOTgatechDOTedu User-agent: larbin_2.6.2 vitalbox1@hotmail.com User-agent: libWeb/clsHTTP User-agent: lwp-trivial User-agent: lwp-trivial/1.34 User-agent: moget User-agent: moget/2.1 User-agent: pavuk User-agent: pcBrowser User-agent: psbot User-agent: searchpreview User-agent: spanner User-agent: suzuran User-agent: tAkeOut User-agent: toCrawl/UrlDispatcher User-agent: turingos User-agent: webfetch/2.1.0 User-agent: wget Disallow: / User-agent: * Disallow: /newsgroup/ Disallow: /bookstore/ Disallow: /translate/ Code (markup): What appears to have happened is that the stupid software read the instructions for the other robots and took that as instructions for itself. OK, that's stupid, but all programs and all programmers are stupid. That's just the way that life is. No reason to get upset about that. OK, I'm out a bunch of revenue for at least six months due to this bug. No big deal, other Google bugs in the past have cost me well into the six digit ranges, this one will only cost me a few thousand dollars. What really chaps my hide is that there is no way to report this bug to Google. That's irresponsible corporate arrogance that passes far into the realm of complete stupidity. If you can't accept criticism, you can't improve. Google's complaint department appears to be open only to the drooling sycophants who hang on Matt Cutts every word, feeding his ego while hoping for scraps of insider information. It's a sad way to run a major multinational corporation. Google needs some adult supervision.
Sad to hear that Will and I totally agree about your views regarding google or for that matter any big corporate body.. 100% True
Here's another bug I have found in the URL removal console. Looking at my processed requests, I see these two requests: 2006-10-17 15:44:12 GMT : removal of image http://www.example.com/example. complete 2006-10-17 15:44:12 GMT : removal of image http://www.example.com/example.shtm complete Code (markup): Well, one of this requests was processed incorrectly and deleted http://www.example.com/shtml -- which matches neither string. The obvious conclusion is that the URL removal console matches "string*" and that "example." matched "example.shtml." That would be reasonable, but it's not true. If you want to delete a subdirectory, such as /store/, you have to delete every file in that subdirectory manually. Deleting "/store/" does not work. From reading the minimal documentation, it really appears that Google has very little idea how their own tools work. Perhaps this tool was written by a contractor who left the organization years ago. That would not be so bad, except that: There is no mechanism to preview deletion requests before they are submitted for processing. There is no mechanism to cancel deletion requests before they are processed. There is no mechanism for undoing deletion requests after they are processed -- you have to wait a minimum of six months for Google to clear your pages out of the URL Removal Console purgatory. OK, that's bad -- but this is worse. If you have personal friends at Google, you can bypass this entire process and have your pages reincluded automatically. That's what happened when WebMasterWorld deleted their forum using the Google URL Removal Console. The rest of us, however, have to wait. We can't even get in the line to bribe Matt Cutts.
At the bottom of the Google URL Removal Console page, it says: The contact page says: Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 phone: (650) 253-0000 fax: (650) 253-0001 The operator at that number refused to transfer my call and refused to identify herself in any way (operator number, name, etc...). She did take my name and number. Bets on whether I get a call back? It is completely moronic to publish a piece of software without any method for users to report bugs in that software.
This is a serious bug! (the first one) Any website that has a similiar robots.txt setup can have their pages removed. From what I understand of the tool anyone can make the request to have a page removed.
w00 h00! The functionality of Google's URL Removal Console has been merged into Google's Webmaster Tools and it looks like the code has been completely rewritten. See the announcement here: http://googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html
I second the motion. I have seen people like them here in DP. Praising every word from matt whatt? No offence to them. But they are so arrogant that they didnt use their own head and stand for their own opinion. Hope they won't read this .