(Potential) bug in Google's URL removal console

Discussion in 'Google' started by Will.Spencer, Nov 14, 2006.

  1. #1
    I've seen lots of odd behavior from Google's URL Removal Console.

    It's deleted pages that should not have been deleted -- and it's failed to delete many pages it was specifically asked to delete.

    This looks like a bug to me, but I could be misreading the robots.txt spec.

    I chose to delete pages based upon a robots.txt file.

    The software misread the robots.txt file somehow and deleted the entire web site.

    Here's the code:
    
    User-agent: Alexibot
    User-agent: Aqua_Products
    User-agent: BackDoorBot
    User-agent: BackDoorBot/1.0
    User-agent: Black.Hole
    User-agent: BlackWidow
    User-agent: BlowFish
    User-agent: BlowFish/1.0
    User-agent: Bookmark search tool
    User-agent: Bot mailto:craftbot@yahoo.com
    User-agent: BotALot
    User-agent: BotRightHere
    User-agent: BuiltBotTough
    User-agent: Bullseye
    User-agent: Bullseye/1.0
    User-agent: BunnySlippers
    User-agent: Cegbfeieh
    User-agent: CheeseBot
    User-agent: CherryPicker
    User-agent: CherryPickerElite/1.0
    User-agent: CherryPickerSE/1.0
    User-agent: ChinaClaw
    User-agent: Copernic
    User-agent: CopyRightCheck
    User-agent: Crescent
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    User-agent: Custo
    User-agent: DISCo
    User-agent: DISCo Pump 3.0
    User-agent: DISCo Pump 3.2
    User-agent: DISCoFinder
    User-agent: DittoSpyder
    User-agent: Download Demon
    User-agent: Download Demon/3.2.0.8
    User-agent: Download Demon/3.5.0.11
    User-agent: EirGrabber
    User-agent: EmailCollector
    User-agent: EmailSiphon
    User-agent: EmailWolf
    User-agent: EroCrawler
    User-agent: Express WebPictures
    User-agent: Express WebPictures (www.express-soft.com)
    User-agent: ExtractorPro
    User-agent: EyeNetIE
    User-agent: FairAd Client
    User-agent: Flaming AttackBot
    User-agent: FlashGet
    User-agent: FlashGet WebWasher 3.2
    User-agent: Foobot
    User-agent: FrontPage
    User-agent: FrontPage [NC,OR]
    User-agent: Gaisbot
    User-agent: GetRight
    User-agent: GetRight/2.11
    User-agent: GetRight/3.1
    User-agent: GetRight/3.2
    User-agent: GetRight/3.3
    User-agent: GetRight/3.3.3
    User-agent: GetRight/3.3.4
    User-agent: GetRight/4.0.0
    User-agent: GetRight/4.1.0
    User-agent: GetRight/4.1.1
    User-agent: GetRight/4.1.2
    User-agent: GetRight/4.2
    User-agent: GetRight/4.2b (Portuguxeas)
    User-agent: GetRight/4.2c
    User-agent: GetRight/4.3
    User-agent: GetRight/4.5
    User-agent: GetRight/4.5a
    User-agent: GetRight/4.5b
    User-agent: GetRight/4.5b1
    User-agent: GetRight/4.5b2
    User-agent: GetRight/4.5b3
    User-agent: GetRight/4.5b6
    User-agent: GetRight/4.5b7
    User-agent: GetRight/4.5c
    User-agent: GetRight/4.5d
    User-agent: GetRight/4.5e
    User-agent: GetRight/5.0beta1
    User-agent: GetRight/5.0beta2
    User-agent: GetWeb!
    User-agent: Go!Zilla
    User-agent: Go!Zilla (www.gozilla.com)
    User-agent: Go!Zilla 3.3 (www.gozilla.com)
    User-agent: Go!Zilla 3.5 (www.gozilla.com)
    User-agent: Go-Ahead-Got-It
    User-agent: Googlebot-Image
    User-agent: GrabNet
    User-agent: Grafula
    User-agent: HMView
    User-agent: HTTrack
    User-agent: HTTrack 3.0
    User-agent: HTTrack [NC,OR]
    User-agent: Harvest
    User-agent: Harvest/1.5
    User-agent: Image Stripper
    User-agent: Image Sucker
    User-agent: Indy Library
    User-agent: Indy Library [NC,OR]
    User-agent: InfoNaviRobot
    User-agent: InterGET
    User-agent: Internet Ninja
    User-agent: Internet Ninja 4.0
    User-agent: Internet Ninja 5.0
    User-agent: Internet Ninja 6.0
    User-agent: Iron33/1.0.2
    User-agent: JOC Web Spider
    User-agent: JennyBot
    User-agent: JetCar
    User-agent: Kenjin Spider
    User-agent: Kenjin.Spider
    User-agent: Keyword Density/0.9
    User-agent: Keyword.Density
    User-agent: LNSpiderguy
    User-agent: LeechFTP
    User-agent: LexiBot
    User-agent: LinkScan/8.1a Unix
    User-agent: LinkScan/8.1a.Unix
    User-agent: LinkWalker
    User-agent: LinkextractorPro
    User-agent: MIDown tool
    User-agent: MIIxpc
    User-agent: MIIxpc/4.2
    User-agent: MSIECrawler
    User-agent: Mass Downloader
    User-agent: Mass Downloader/2.2
    User-agent: Mata Hari
    User-agent: Mata.Hari
    User-agent: Microsoft URL Control
    User-agent: Microsoft URL Control - 5.01.4511
    User-agent: Microsoft URL Control - 6.00.8169
    User-agent: Microsoft.URL
    User-agent: Mister PiX
    User-agent: Mister PiX version.dll
    User-agent: Mister Pix II 2.01
    User-agent: Mister Pix II 2.02a
    User-agent: Mister.PiX
    User-agent: NICErsPRO
    User-agent: NPBot
    User-agent: NPbot
    User-agent: Navroad
    User-agent: NearSite
    User-agent: Net Vampire
    User-agent: Net Vampire/3.0
    User-agent: NetAnts
    User-agent: NetAnts/1.10
    User-agent: NetAnts/1.23
    User-agent: NetAnts/1.24
    User-agent: NetAnts/1.25
    User-agent: NetMechanic
    User-agent: NetSpider
    User-agent: NetZIP
    User-agent: NetZip Downloader 1.0 Win32(Nov 12 1998)
    User-agent: NetZip-Downloader/1.0.62 (Win32; Dec 7 1998)
    User-agent: NetZippy+(http://www.innerprise.net/usp-spider.asp)
    User-agent: Octopus
    User-agent: Offline Explorer
    User-agent: Offline Explorer/1.2
    User-agent: Offline Explorer/1.4
    User-agent: Offline Explorer/1.6
    User-agent: Offline Explorer/1.7
    User-agent: Offline Explorer/1.9
    User-agent: Offline Explorer/2.0
    User-agent: Offline Explorer/2.1
    User-agent: Offline Explorer/2.3
    User-agent: Offline Explorer/2.4
    User-agent: Offline Explorer/2.5
    User-agent: Offline Navigator
    User-agent: Offline.Explorer
    User-agent: Openbot
    User-agent: Openfind
    User-agent: Openfind data gatherer
    User-agent: Oracle Ultra Search
    User-agent: PageGrabber
    User-agent: Papa Foto
    User-agent: PerMan
    User-agent: ProPowerBot/2.14
    User-agent: ProWebWalker
    User-agent: Python-urllib
    User-agent: QueryN Metasearch
    User-agent: QueryN.Metasearch
    User-agent: RMA
    User-agent: Radiation Retriever 1.1
    User-agent: ReGet
    User-agent: RealDownload
    User-agent: RealDownload/4.0.0.40
    User-agent: RealDownload/4.0.0.41
    User-agent: RealDownload/4.0.0.42
    User-agent: RepoMonkey
    User-agent: RepoMonkey Bait & Tackle/v1.01
    User-agent: SiteSnagger
    User-agent: SlySearch
    User-agent: SmartDownload
    User-agent: SmartDownload/1.2.76 (Win32; Apr 1 1999)
    User-agent: SmartDownload/1.2.77 (Win32; Aug 17 1999)
    User-agent: SmartDownload/1.2.77 (Win32; Feb 1 2000)
    User-agent: SmartDownload/1.2.77 (Win32; Jun 19 2001)
    User-agent: SpankBot
    User-agent: Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux
    User-agent: SuperBot
    User-agent: SuperBot/3.0 (Win32)
    User-agent: SuperBot/3.1 (Win32)
    User-agent: SuperHTTP
    User-agent: SuperHTTP/1.0
    User-agent: Surfbot
    User-agent: Szukacz/1.4
    User-agent: Teleport
    User-agent: Teleport Pro
    User-agent: Teleport Pro/1.29
    User-agent: Teleport Pro/1.29.1590
    User-agent: Teleport Pro/1.29.1634
    User-agent: Teleport Pro/1.29.1718
    User-agent: Teleport Pro/1.29.1820
    User-agent: Teleport Pro/1.29.1847
    User-agent: TeleportPro
    User-agent: Telesoft
    User-agent: The Intraformant
    User-agent: The.Intraformant
    User-agent: TheNomad
    User-agent: TightTwatBot
    User-agent: Titan
    User-agent: True_Robot
    User-agent: True_Robot/1.0
    User-agent: TurnitinBot
    User-agent: TurnitinBot/1.5
    User-agent: URL Control
    User-agent: URL_Spider_Pro
    User-agent: URLy Warning
    User-agent: URLy.Warning
    User-agent: VCI
    User-agent: VCI WebViewer VCI WebViewer Win32
    User-agent: VoidEYE
    User-agent: WWW-Collector-E
    User-agent: WWWOFFLE
    User-agent: Web Image Collector
    User-agent: Web Sucker
    User-agent: Web.Image.Collector
    User-agent: WebAuto
    User-agent: WebAuto/3.40 (Win98; I)
    User-agent: WebBandit
    User-agent: WebBandit/3.50
    User-agent: WebCapture 2.0
    User-agent: WebCopier
    User-agent: WebCopier v.2.2
    User-agent: WebCopier v2.5
    User-agent: WebCopier v2.6
    User-agent: WebCopier v2.7a
    User-agent: WebCopier v2.8
    User-agent: WebCopier v3.0
    User-agent: WebCopier v3.0.1
    User-agent: WebCopier v3.2
    User-agent: WebCopier v3.2a
    User-agent: WebEMailExtrac.*
    User-agent: WebEnhancer
    User-agent: WebFetch
    User-agent: WebGo IS
    User-agent: WebLeacher
    User-agent: WebReaper
    User-agent: WebReaper [info@webreaper.net]
    User-agent: WebReaper [webreaper@otway.com]
    User-agent: WebReaper v9.1 - www.otway.com/webreaper
    User-agent: WebReaper v9.7 - www.webreaper.net
    User-agent: WebReaper v9.8 - www.webreaper.net
    User-agent: WebReaper vWebReaper v7.3 - www,otway.com/webreaper
    User-agent: WebSauger
    User-agent: WebSauger 1.20b
    User-agent: WebSauger 1.20j
    User-agent: WebSauger 1.20k
    User-agent: WebStripper
    User-agent: WebStripper/2.03
    User-agent: WebStripper/2.10
    User-agent: WebStripper/2.12
    User-agent: WebStripper/2.13
    User-agent: WebStripper/2.15
    User-agent: WebStripper/2.16
    User-agent: WebStripper/2.19
    User-agent: WebWhacker
    User-agent: WebZIP
    User-agent: WebZIP/2.75 (http://www.spidersoft.com)
    User-agent: WebZIP/3.65 (http://www.spidersoft.com)
    User-agent: WebZIP/3.80 (http://www.spidersoft.com)
    User-agent: WebZIP/4.0 (http://www.spidersoft.com)
    User-agent: WebZIP/4.1 (http://www.spidersoft.com)
    User-agent: WebZIP/4.21
    User-agent: WebZIP/4.21 (http://www.spidersoft.com)
    User-agent: WebZIP/5.0
    User-agent: WebZIP/5.0 (http://www.spidersoft.com)
    User-agent: WebZIP/5.0 PR1 (http://www.spidersoft.com)
    User-agent: WebZip
    User-agent: WebZip/4.0
    User-agent: WebmasterWorldForumBot
    User-agent: Website Quester
    User-agent: Website Quester - www.asona.org
    User-agent: Website Quester - www.esalesbiz.com/extra/
    User-agent: Website eXtractor
    User-agent: Website eXtractor (http://www.asona.org)
    User-agent: Website.Quester
    User-agent: Webster Pro
    User-agent: Webster.Pro
    User-agent: Wget
    User-agent: Wget/1.5.2
    User-agent: Wget/1.5.3
    User-agent: Wget/1.6
    User-agent: Wget/1.7
    User-agent: Wget/1.8
    User-agent: Wget/1.8.1
    User-agent: Wget/1.8.1+cvs
    User-agent: Wget/1.8.2
    User-agent: Wget/1.9-beta
    User-agent: Widow
    User-agent: Xaldon WebSpider
    User-agent: Xaldon WebSpider 2.5.b3
    User-agent: Xenu's
    User-agent: Xenu's Link Sleuth 1.1c
    User-agent: Zeus
    User-agent: Zeus 11389 Webster Pro V2.9 Win32
    User-agent: Zeus 11652 Webster Pro V2.9 Win32
    User-agent: Zeus 18018 Webster Pro V2.9 Win32
    User-agent: Zeus 26378 Webster Pro V2.9 Win32
    User-agent: Zeus 30747 Webster Pro V2.9 Win32
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    User-agent: Zeus 39206 Webster Pro V2.9 Win32
    User-agent: Zeus 41641 Webster Pro V2.9 Win32
    User-agent: Zeus 44238 Webster Pro V2.9 Win32
    User-agent: Zeus 51070 Webster Pro V2.9 Win32
    User-agent: Zeus 51674 Webster Pro V2.9 Win32
    User-agent: Zeus 51837 Webster Pro V2.9 Win32
    User-agent: Zeus 63567 Webster Pro V2.9 Win32
    User-agent: Zeus 6694 Webster Pro V2.9 Win32
    User-agent: Zeus 71129 Webster Pro V2.9 Win32
    User-agent: Zeus 82016 Webster Pro V2.9 Win32
    User-agent: Zeus 82900 Webster Pro V2.9 Win32
    User-agent: Zeus 84842 Webster Pro V2.9 Win32
    User-agent: Zeus 90872 Webster Pro V2.9 Win32
    User-agent: Zeus 94934 Webster Pro V2.9 Win32
    User-agent: Zeus 95245 Webster Pro V2.9 Win32
    User-agent: Zeus 95351 Webster Pro V2.9 Win32
    User-agent: Zeus 97371 Webster Pro V2.9 Win32
    User-agent: Zeus Link Scout
    User-agent: asterias
    User-agent: b2w/0.1
    User-agent: cosmos
    User-agent: eCatch
    User-agent: eCatch/3.0
    User-agent: hloader
    User-agent: httplib
    User-agent: humanlinks
    User-agent: larbin
    User-agent: larbin (samualt9@bigfoot.com)
    User-agent: larbin samualt9@bigfoot.com
    User-agent: larbin_2.6.2 (kabura@sushi.com)
    User-agent: larbin_2.6.2 (larbin2.6.2@unspecified.mail)
    User-agent: larbin_2.6.2 (listonATccDOTgatechDOTedu)
    User-agent: larbin_2.6.2 (vitalbox1@hotmail.com)
    User-agent: larbin_2.6.2 kabura@sushi.com
    User-agent: larbin_2.6.2 larbin2.6.2@unspecified.mail
    User-agent: larbin_2.6.2 larbin@correa.org
    User-agent: larbin_2.6.2 listonATccDOTgatechDOTedu
    User-agent: larbin_2.6.2 vitalbox1@hotmail.com
    User-agent: libWeb/clsHTTP
    User-agent: lwp-trivial
    User-agent: lwp-trivial/1.34
    User-agent: moget
    User-agent: moget/2.1
    User-agent: pavuk
    User-agent: pcBrowser
    User-agent: psbot
    User-agent: searchpreview
    User-agent: spanner
    User-agent: suzuran
    User-agent: tAkeOut
    User-agent: toCrawl/UrlDispatcher
    User-agent: turingos
    User-agent: webfetch/2.1.0
    User-agent: wget
    Disallow: /
    
    User-agent: *
    Disallow: /newsgroup/
    Disallow: /bookstore/
    Disallow: /translate/
    
    Code (markup):
    What appears to have happened is that the stupid software read the instructions for the other robots and took that as instructions for itself.

    OK, that's stupid, but all programs and all programmers are stupid. That's just the way that life is. No reason to get upset about that.

    OK, I'm out a bunch of revenue for at least six months due to this bug. No big deal, other Google bugs in the past have cost me well into the six digit ranges, this one will only cost me a few thousand dollars.

    What really chaps my hide is that there is no way to report this bug to Google. That's irresponsible corporate arrogance that passes far into the realm of complete stupidity.

    If you can't accept criticism, you can't improve. Google's complaint department appears to be open only to the drooling sycophants who hang on Matt Cutts every word, feeding his ego while hoping for scraps of insider information.

    It's a sad way to run a major multinational corporation. Google needs some adult supervision.
     
    Will.Spencer, Nov 14, 2006 IP
    RomanticGuy likes this.
  2. max pain

    max pain Notable Member

    Messages:
    2,179
    Likes Received:
    521
    Best Answers:
    0
    Trophy Points:
    260
    #2
    Sad to hear that Will and I totally agree about your views regarding google or for that matter any big corporate body..

    100% True
     
    max pain, Nov 14, 2006 IP
  3. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #3
    Here's another bug I have found in the URL removal console.

    Looking at my processed requests, I see these two requests:

    
    2006-10-17 15:44:12 GMT :
    removal of image http://www.example.com/example.
    complete
    2006-10-17 15:44:12 GMT :
    removal of image http://www.example.com/example.shtm
    complete
    
    Code (markup):
    Well, one of this requests was processed incorrectly and deleted http://www.example.com/shtml -- which matches neither string.

    The obvious conclusion is that the URL removal console matches "string*" and that "example." matched "example.shtml."

    That would be reasonable, but it's not true. If you want to delete a subdirectory, such as /store/, you have to delete every file in that subdirectory manually. Deleting "/store/" does not work.

    From reading the minimal documentation, it really appears that Google has very little idea how their own tools work. Perhaps this tool was written by a contractor who left the organization years ago.

    That would not be so bad, except that:

    • There is no mechanism to preview deletion requests before they are submitted for processing.
    • There is no mechanism to cancel deletion requests before they are processed.
    • There is no mechanism for undoing deletion requests after they are processed -- you have to wait a minimum of six months for Google to clear your pages out of the URL Removal Console purgatory.

    OK, that's bad -- but this is worse. If you have personal friends at Google, you can bypass this entire process and have your pages reincluded automatically. That's what happened when WebMasterWorld deleted their forum using the Google URL Removal Console.

    The rest of us, however, have to wait. We can't even get in the line to bribe Matt Cutts.
     
    Will.Spencer, Nov 14, 2006 IP
  4. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #4
    At the bottom of the Google URL Removal Console page, it says:
    The contact page says:
    Google Inc.
    1600 Amphitheatre Parkway
    Mountain View, CA 94043
    phone: (650) 253-0000
    fax: (650) 253-0001​

    The operator at that number refused to transfer my call and refused to identify herself in any way (operator number, name, etc...).

    She did take my name and number. Bets on whether I get a call back? :rolleyes:

    It is completely moronic to publish a piece of software without any method for users to report bugs in that software.
     
    Will.Spencer, Nov 15, 2006 IP
  5. freespace

    freespace Well-Known Member

    Messages:
    718
    Likes Received:
    15
    Best Answers:
    1
    Trophy Points:
    140
    #5
    This is a serious bug! (the first one)

    Any website that has a similiar robots.txt setup can have their pages removed.

    From what I understand of the tool anyone can make the request to have a page removed.
     
    freespace, Nov 15, 2006 IP
  6. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #6
    Will.Spencer, Apr 20, 2007 IP
  7. godmode

    godmode Well-Known Member

    Messages:
    4,453
    Likes Received:
    156
    Best Answers:
    0
    Trophy Points:
    190
    #7
    will you hard work paid off :)
     
    godmode, Apr 20, 2007 IP
  8. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #8
    G webmaster tools has been very nice with this perfect url removel tool. i loved its functionality;)
     
    trichnosis, Apr 20, 2007 IP
  9. brealmz

    brealmz Well-Known Member

    Messages:
    335
    Likes Received:
    24
    Best Answers:
    3
    Trophy Points:
    138
    #9


    I second the motion. I have seen people like them here in DP. Praising every word from matt whatt? No offence to them. But they are so arrogant that they didnt use their own head and stand for their own opinion. Hope they won't read this :D.
     
    brealmz, Apr 20, 2007 IP