This is the documentation for a previous version of our product. Click here to see the latest version.

Batch Container

Important Notices

The legacy V1 API and related output formats is no longer supported. V1 API examples have been removed from all batch container documentation. We recommend use of the V2 API and the config.json object documented in the Speech API. How to use the V2 API is documented within the Speech API document for 7.0.0.

What's New

7.0.3

Internal bug fixes

7.0.2

Internal bug fixes

7.0.1

We have changed how some words submitted using a Custom Dictionary are recognised for all languages. This change will affect words that use a splitting character (e.g. COVID-19, catch-22). This change should provide more accurate transcription of such words.

7.0.0

The V2 API is now the only supported method to transcribe files. The V1 API is no longer supported, and will be fully deprecated in an imminent release
Updated English and Spanish language packs
SubRip (srt) subtitle format. Customers may also modify how the SRT output is presented
The working directory is no longer /work, but is instead /home/smuser/work
The ability to pull files for transcription from an object store hosted by a cloud provider
The ability to send notifications or callbacks to a customer-specified endpoint
Users may also provide their own metadata information within a separate JSON file for better tracking and monitoring
Users can cache one or many Custom Dictionaries within a shared cache location specified by themselves. This improves performance overhead when transcribing files using the same custom dictionary that has already been cached. Users are responsible for managing their own cache. How to do so is described in more detail in the Speech API Guide
Users can run the container as a named user (e.g. not as root)

Issues Fixed

The following issues are addressed since the previous release:

Issue ID	Summary	Resolution Description
REQ-15418	Custom dictionary with splitting characters gets incorrect pronunciation	When using words with splitting characters in a Custom Dictionary (for example covid-19) where a number follows a word we now have the correct pronunciations created. Splitting characters include ["-", "_", "/", "<", ">", ":", " "]. This is for all languages For v7.0.1 only
REQ-13442	Some unicode characters would cause transcription to fail	This has now been resolved
REQ-13990	The batch container will not run as a non-root user on Docker	This is now supported. Guidance on how to do this is in the Quick Start Guide
REQ-14062	Occasionally a file in Spanish would not be fully transcribed	This has been resolved with the latest release of Spanish

Known Limitations

Issue ID	Summary	Detailed Description and Possible Workarounds
REQ-1409	Proteus HCL with `<unk>` causes out of memory error	A custom dictionary list that contains the word '' causes the worker to crash.
REQ-10160	Advanced punctuation for Spanish (es) does not contain inverted marks.	Inverted marks [ ¿ ¡ ] are not currently available for Spanish advanced punctuation.
REQ-10627	Double full stops when acronym is at the end of the sentence	If there is an acronym at the end of the sentence, then a double full stop will be output, for example: "team G.B.."
REQ-11135	A previous release (6.1.0) introduced unwanted hesitations in transcripts.	Due to changes in the way that training data is now ingested to improve the accuracy of spontaneous speech for English (en) there is a greater likelihood that hesitations will be included in the output transcripts. We plan to support a hesitation filtering capability in a future release for customers that do not want to see hesitations on transcripts.

Supported Platforms

Docker (17.06.0+) running on Ubuntu, Debian, Fedora or CentOS.

Installation

Pull the Batch Container Docker image from the Speechmatics Docker repository.

Pre-requisites

You have a login (URL, username and password) for the Speechmatics Docker repository, and have a Docker environment (version 17.06.0 or above) running.