![]() ![]() ![]() The second method I used for measuring accuracy was to check text similarity. Amazon has a default model (which I used) and a niche medical model. For my testing I used the video model because it seemed to be the most accurate one of the bunch, even though it’s a little bit more expensive than their default model. Models: Google has a few different models for different use cases: phone call, video, command and default. As one can imagine, this is a daunting task, because punctuation is sometimes subjective/ambiguous and even humans can listen to the same audio and punctuate it slightly differently. Punctuation: Although for Google this feature is only available in Beta, all 3 APIs have the ability to automatically add punctuation to transcribed text. Multichannel recognition & Speaker Diarization: This is the ability for ASR to distinguish when there are different sources of audio ( e.g Zoom conference call) or in the case of speaker diarization, to determine which speaker in the audio is saying what when there are multiple speakers. All 3 services offer this feature, which in turn allows them to generate time-stamped transcripts separated by speaker/channel. This can be extremely helpful when transcribing audio with sensitive data such as certain customer service conversations or recordings in the medical field. In addition, Amazon also has the option to filter out personally Identifiable information (PII).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |