/
Batch Container
/
Release Notes
This is the documentation for a previous version of our product. Click here to see the latest version.

Batch Container

Important Notices

The legacy V1 API and related output formats is no longer supported. V1 API examples have been removed from all batch container documentation. We recommend use of the V2 API and the config.json object documented in the Speech API. How to use the V2 API is documented within the Speech API document for 7.0.0.

What's New

7.0.3

  • Internal bug fixes

7.0.2

  • Internal bug fixes

7.0.1

  • We have changed how some words submitted using a Custom Dictionary are recognised for all languages. This change will affect words that use a splitting character (e.g. COVID-19, catch-22). This change should provide more accurate transcription of such words.

7.0.0

  • The V2 API is now the only supported method to transcribe files. The V1 API is no longer supported, and will be fully deprecated in an imminent release
  • Updated English and Spanish language packs
  • SubRip (srt) subtitle format. Customers may also modify how the SRT output is presented
  • The working directory is no longer /work, but is instead /home/smuser/work
  • The ability to pull files for transcription from an object store hosted by a cloud provider
  • The ability to send notifications or callbacks to a customer-specified endpoint
  • Users may also provide their own metadata information within a separate JSON file for better tracking and monitoring
  • Users can cache one or many Custom Dictionaries within a shared cache location specified by themselves. This improves performance overhead when transcribing files using the same custom dictionary that has already been cached. Users are responsible for managing their own cache. How to do so is described in more detail in the Speech API Guide
  • Users can run the container as a named user (e.g. not as root)

Issues Fixed

The following issues are addressed since the previous release:

Issue IDSummaryResolution Description
REQ-15418Custom dictionary with splitting characters gets incorrect pronunciationWhen using words with splitting characters in a Custom Dictionary (for example covid-19) where a number follows a word we now have the correct pronunciations created. Splitting characters include ["-", "_", "/", "<", ">", ":", " "]. This is for all languages For v7.0.1 only
REQ-13442Some unicode characters would cause transcription to failThis has now been resolved
REQ-13990The batch container will not run as a non-root user on DockerThis is now supported. Guidance on how to do this is in the Quick Start Guide
REQ-14062Occasionally a file in Spanish would not be fully transcribedThis has been resolved with the latest release of Spanish

Known Limitations

Issue IDSummaryDetailed Description and Possible Workarounds
REQ-1409Proteus HCL with <unk> causes out of memory errorA custom dictionary list that contains the word '' causes the worker to crash.
REQ-10160Advanced punctuation for Spanish (es) does not contain inverted marks.Inverted marks [ ¿ ¡ ] are not currently available for Spanish advanced punctuation.
REQ-10627Double full stops when acronym is at the end of the sentenceIf there is an acronym at the end of the sentence, then a double full stop will be output, for example: "team G.B.."
REQ-11135A previous release (6.1.0) introduced unwanted hesitations in transcripts.Due to changes in the way that training data is now ingested to improve the accuracy of spontaneous speech for English (en) there is a greater likelihood that hesitations will be included in the output transcripts. We plan to support a hesitation filtering capability in a future release for customers that do not want to see hesitations on transcripts.

Supported Platforms

Docker (17.06.0+) running on Ubuntu, Debian, Fedora or CentOS.

Installation

Pull the Batch Container Docker image from the Speechmatics Docker repository.

Pre-requisites

You have a login (URL, username and password) for the Speechmatics Docker repository, and have a Docker environment (version 17.06.0 or above) running.

Related Documentation

  • Speechmatics Batch Container Quick Start Guide version 7.0.0
  • Speechmatics Batch Container API Guide version 7.0.0

For a complete list of languages that are supported by the Speechmatics Container, including those which have custom dictionary support, please go to the Speechmatics website: https://www.speechmatics.com/language-support/