This release provides new Improved language packs for all Speechmatics' 31 commercially available languages with each language now contain a standard and enhanced model. The standard is the default model with the same or slightly improved accuracy before. The enhanced model is more accurate for all languages, and must be explicitly requested in the configuration. The enhanced model requires more compute resources to run and specific hardware. Please see in the quick start guide our recommendations for running the enhanced model.
It is now necessary to use processors that support Advanced Vector Extensions 2 (AVX2) when running the container in order to take advantage of latest performance optimisations.
It is also recommended when using the enhanced model to use hardware that supports the AVX512_VNNI flag for optimal processing performance. For more information please see the quick start guide.
The following are known issues in this release:
Issue ID | Summary | Detailed Description and Possible Workarounds |
---|---|---|
REQ-10634 | Putting "-" as an item in additional vocab configuration will cause the container to fail | Do not enter just a "-" on its own in Custom Dictionary either as an additional vocab item or in the sounds_like property . Hyphens are still supported when entered as part of phrases or words |
REQ-13240 | Chinese (cmn) container crashes occasionally when using certain additional vocabulary | Do not use whitespace characters in additional vocabulary sounds_like |
REQ-16256 | Audio Swapping between 8kHz and 16kHz causes memory leak | Repeatedly audio swapping between 8kHz and 16kHz files can cause an increase in memory over very long periods that causes the container to crash. If memory usage in this scenario becomes excessive it is recommended to restart the container |
REQ-17771 | Wide-space Unicode characters in Custom Dictionary cause a jobs to fail | This is now fixed and wide-spaced characters should be accepted |
REQ-20261 | The Japanese language pack may output fewer punctuation marks in certain scenarios | In some cases, users may see a decreased output in punctuation marks when transcribing in Japanese. Adjusting punctuation sensitivity sessions may improve output |
The following is a list of any resolved issues within this release
The following issues are addressed since the previous release:
Issue ID | Summary | Resolution Description |
---|---|---|
REQ-11135 | Unwanted hesitations in transcripts. | For the English language pack Speechmatics now tags hesitation words ('umm') with a metadata tag of "disfluency". Users can use this tag for post-processing to filter or analyze such words. This work does not make disfluencies better or more poorly recognised in transcript output |
REQ-11136 | Transcripts are direct written to the Real-time Virtual Appliance and Container logs | Transcripts are no longer written directly to the logs or persisted to disk, even temporarily, for security reasons. |
REQ-14795 | Configuration information was not written to logs in StartRecognitionMessage | Transcription Configuration information is now logged as part of th StartRecognitionMessage. Individual custom dictionary entries are redacted |
REQ-15515 | Internal buffer limit of 500 AddAudio messages/10 seconds of audio | The Container now has a buffer. If you are sending audio faster than real-time and send more than 500 AddAudio messages of 10 seconds of Audio you will not receive an audioAdded response until there is capacity again. Please ensure your client connection is resilient to avoid audio being dropped |
These are the General Availability (GA) release notes for the Real-time ASR container images. Following languages are supported:
Container images are labelled using the following scheme, where language codes adhere the ISO-639 standard:
rt-asr-transcriber-<language>:<version>
For example,
rt-asr-transcriber-en:1.4.0
Docker 17.06.0+
Pull the container image from the Speechmatics Docker registry.