Microsoft announced something they call historic yesterday which is that they have now created a technology that recognises the words in a conversation just like a human would. The average score for a human when transcribing a conversation is about 5.9 percent and now Microsoft has developed a system that’s equal to humans with respect to voice recognition.
The current figure beats the word error rate (WER) of 6.3 percent that the Microsoft research team reported last month and according to Microsoft, the milestone will have broad implications for consumer and business products that can be significantly augmented by speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.
While the technology is at par with what humans can produce, it needs to be developed further to make it more robust for real world scenarios like on our streets by security agencies. They also need to make it work with multiple users simultaneously too.
The machine didn’t also recognise every word perfectly and in reality too, humans may not be able to do this perfectly too but humans still have an edge and that’s another area they are looking to perfect. The rate at which the computer misheard a word like “have” for “is” or “a” for “the” – is the same as you’d expect from a person hearing the same conversation.