IBM and Opera (and others) are pushing for this so called multimodal browser, which is already a w3c standard. I'm not sure what the standard does, but I'd like to see user's voice be streamed to the server, as opposed to being submitted once the utterance is finished. if voice processing takes place at the server side, this feature can let the server work on the voice even before the sentence is finished. if voice processing takes place in the browser itself... it is kinda pointless... because this limits the kind of things that can be done (since many browsers run on fixed-point slow processors, like cell-phones, PDA).