/
Batch Container
/
Release Notes
This is the documentation for a previous version of our product. Click here to see the latest version.

Batch Container

High Level Summary

This release provides new Improved language packs for all Speechmatics' 31 commercially available languages with each language now contain a standard and enhanced model. The standard is the default model with the same or slightly improved accuracy before. The enhanced model is more accurate for all languages, and must be explicitly requested in the configuration. The enhanced model requires more compute resources to run and specific hardware. Please see in the quick start guide our recommendations for running the enhanced model.

Important Notices

It is now necessary to use processors that support Advanced Vector Extensions 2 (AVX2) when running the container in all scenarios in order to take advantage of latest performance optimisations.

It is also recommended when using the enhanced model to use hardware that supports the AVX512_VNNI flag for optimal processing performance. The enhanced model also has increased compute requirements and will run more slowly than the standard model. For more information please see the quick start guide.

What's New

8.2.0

  • New improved language packs for all 31 languages. By default a language pack will contain a standard and enhanced model for all 31 languages. The standard model is available to use, with no user change required. For using the enhanced model refer to the API guide for details
  • General improvements in pop culture terms recognition for the English language pack
  • Removal of foreign characters from English and German language packs
  • Profanity tagging in Italian and Spanish
  • Chinese Mandarin language pack now supports Traditional as well as Simplified Mandarin. Please see API guide for guidelines of how to do so

Known Limitations

Issue IDSummaryDetailed Description and Possible Workarounds
REQ-1409Proteus HCL with <unk> causes out of memory errorA custom dictionary list that contains the word '' causes the worker to crash.
REQ-10160Advanced punctuation for Spanish (es) does not contain inverted marks.Inverted marks [ ¿ ¡ ] are not currently available for Spanish advanced punctuation.
REQ-10627Double full stops when acronym is at the end of the sentenceIf there is an acronym at the end of the sentence, then a double full stop will be output, for example: "team G.B.."
REQ-10634Putting "-" as an item in additional vocab configuration will cause the container to failDo not enter just a "-" on its own in Custom Dictionary either as an additional vocab item or in the sounds_like property. Hyphens are still supported when entered as part of phrases or words
REQ-20261The Japanese language pack may output fewer punctuation marks in certain scenariosIn some cases, users may see a decreased output in punctuation marks when transcribing in Japanese. Please report this if this is the case

Supported Platforms

Docker (17.06.0+) running on Ubuntu, Debian, Fedora or CentOS.

Installation

Pull the Batch Container Docker image from the Speechmatics Docker repository.

Pre-requisites

You have a login (URL, username and password) for the Speechmatics Docker repository, and have a Docker environment (version 17.06.0 or above) running.

Related Documentation

  • Speechmatics Batch Container Quick Start Guide version 8.2.0
  • Speechmatics Batch Container API Guide version 8.2.0

Supported Languages

Below is the complete list of languages supported by Speechmatics:

  • English (en)
  • German (de)
  • Spanish (es)
  • French (fr)
  • Portuguese (pt)
  • Japanese (ja)
  • Korean (ko)
  • Dutch (nl)
  • Italian (it)
  • Swedish (sv)
  • Danish (da)
  • Polish (pl)
  • Catalan (ca)
  • Hindi (hi)
  • Russian (ru)
  • Mandarin (cmn)
  • Norwegian (no)
  • Arabic (ar)
  • Bulgarian (bg)
  • Czech (cs)
  • Greek (el)
  • Finnish (fi)
  • Hungarian (hu)
  • Croatian (hr)
  • Lithuanian (lt)
  • Latvian (lv)
  • Romanian (ro)
  • Slovak (sk)
  • Slovenian (sl)
  • Turkish (tr)
  • Malay (ms)

Container images are labelled using the following scheme, where language codes adhere the ISO-639 standard:

batch-asr-transcriber-<language>:<version>

For example,

batch-asr-transcriber-en:8.2.0