/
Batch Container
/
Release Notes
This is the documentation for a previous version of our product. Click here to see the latest version.

Batch Container

High-Level Summary

This release contains instructions on how to run the container with a parallelisation parameter to boost performane on multi-CPU platforms. It also contains improvements to telephony models for English and Spanish, and a number of bugfixes.

The following languages now support advanced punctuation: English (en), German (de), Spanish (es), French (fr), Dutch (nl), Danish (da), Turkish (tr), Malay (ms).

It is recommended that customers on previous releases upgrade to this version.

Important Notices

The legacy V1 API and related output formats will be discontinued in a future release as we align the product with the same V2 API used by the Speechmatics SaaS: https://asr.api.speechmatics.com/v2/docs. We recommend that customers familiarise themselves with the configuration object used to specify job configurations as any new features will only be supported using this mechanism. Future notices will be provided to announce the end of life of the V1 API, and provide detailed instructions on migrations to the V2 API.

What's New

  • SubRip (srt) subtitle format
  • The ability to transcribe in parallel across multiple CPUs
  • Improvements to the telephony models for English and Spanish

Issues Fixed

The following issues are addressed since the previous release:

Issue IDSummaryResolution Description
REQ-10377, REQ-9569Unfriendly error when empty audio file is submittedIt is now possible to submit zero length audio files.
REQ-11728Files less than 1 second skip decoding.Files that are < 0.3s long are considered to be zero length; but files between 0.3 and 1.0s (and above) will now be transcribed (assuming speech is detected in them).
REQ-10095StopIteration Exception in post processing can cause a job to failThe exception is now handled properly.

Known Limitations

Issue IDSummaryDetailed Description and Possible Workarounds
REQ-1409Proteus HCL with <unk> causes out of memory errorA custom dictionary list that contains the word '' causes the worker to crash.
REQ-10160Advanced punctuation for Spanish (es) does not contain inverted marks.Inverted marks [ ¿ ¡ ] are not currently available for Spanish advanced punctuation.
REQ-10627Double full stops when acronym is at the end of the sentenceIf there is an acronym at the end of the sentence, then a double full stop will be output, for example: "team G.B.."
REQ-11135A previous release (6.1.0) introduced unwanted hesitations in transcripts.Due to changes in the way that training data is now ingested to improve the accuracy of spontaneous speech for English (en) there is a greater likelihood that hesitations will be included in the output transcrtips. We plan to support a hesitation filtering capability in a future release for customers that do not want to see hesitations on transcripts.

Supported Platforms

Docker (17.06.0+) running on Ubuntu, Debian, Fedora or CentOS.

Upgrade Path

None.

Installation

Pull the Batch Container Docker image from the Speechmatics Docker repository.

Pre-requisites

You have a login (URL, username and password) for the Speechmatics Docker repository, and have a Docker environment (version 17.06.0 or above) running.

Related Documentation

  • Speechmatics Batch Container Quick Start Guide version 6.2.0
  • Speechmatics Batch Container API Guide version 6.2.0

For a complete list of languages that are supported by the Speechmatics Container, including those which have custom dictionary support, please go to the Speechmatics website: https://www.speechmatics.com/language-support/