When we moved to Microsoft Phone System I was psyched: my phone would follow me around – who would want that? – it would work on any devices I carry around, it would notify me in my inbox of missed calls with information shall I have some contact information for them in my contact book, and also transcribe my VoiceMail for me so I don’t have to listen to them anymore.
Actually, my voicemail says to not leave a voicemail, just send me an email because I never check my VoiceMails anyways. And I thought
Microsoft Phone System Azure VoiceMail transcription would help with that.
After a couple month of usage, it turns out whatever engine Microsoft uses for transcription of audio to text is utter rubbish:
- it misses the whole transcription even if the audio is clear enough
- it usually doesn’t transcribe the start of the VoiceMail, so you’d miss the name of the caller
- it doesn’t understand Microsoft jargon
- it transcribes words that don’t even exist – not in my dictionary
It sucks so much, that I actually have to listen to the voicemail to get an idea of the audio Azure VM was trying to transcribe. To its defence, it seems that the deeper in the voicemail, the better the transcription works.
And so I took the audio file and have a few other transcription services online I found duckduckgo-ing around. they all did better!
- Original vs Azure VM
- Levenshtein: 393
- Orginal vs Sonix
- Levenshtein: 224
- Orginal vs Temi
- Levenshtein: 120
I am not familiar with the other provider but overall the transcription quality from all of them is so much accurate.
I believe Microsoft is missing out a lot. I had created a support ticket with Microsoft. It remained in the queue for 4 months after they finally threw me away with the promise of a new engine coming late 2018 – still waiting for it at the date of this post.
In the time of AI and contextually rich information, Azure VoiceMail could really use some grammar, spelling engines on top of a better transcription engine.