This is the documentation for a previous version of our product. Click here to see the latest version.

Batch Container Quick Start Guide

This guide will walk you through the steps needed to deploy the Speechmatics Batch Container ready for transcription.

Check system requirements
Pull the Docker Image
Run the Container

After these steps, the Docker Image can be used to create containers that will transcribe audio files. More information about using the Speechmatics container transcription service is detailed in the Speechmatics Container API guide.

System requirements

Speechmatics containerized deployments are built on the Docker platform. In order to operate the containers, the following requirements will need to be met.

Host requirements

An individual Docker image is required for each language transcription is required within. A single image can be used to create and run multiple containers concurrently, each running container will require the following resources:

1 vCPU
2-5GB RAM
100MB hard disk space

The host machine requires a processor with following minimum specification: Intel® Xeon® CPU E5-2630 v4 (Sandy Bridge) 2.20GHz (or equivalent). This is important because these chipsets (and later ones) support Advanced Vector Extensions (AVX). The machine learning algorithms used by Speechmatics ASR require the performance optimizations that AVX provides. You should also ensure that your hypervisor has AVX enabled.

Note: Each language pack required is distributed as a separate Docker image. Only the language packs required need to be installed on the Docker host.

Architecture

Each container:

Provides the ability to transcribe recorded speech in a predefined language. The container will receive input from most audio and video formats, and will provide the following output:
- Transcript word
- Word confidence
- Timing information
- Speaker change and labelling information
Takes one input file and outputs the resulting transcript
Can run in a mode that parallelises processing across multiple processor cores
Supports input file sizes up to 2 hours in length or 4GB in size
All data is transitory, once a container completes its transcription it removes all record of the operation, no data is persisted.

In addition, multiple instances of the container can be run on the same Docker host. This enables scaling of a single language or multiple-languages as required.

Accessing the Image

The Speechmatics Docker image is obtained from the Speechmatics Docker repository (jfrog.io). If you do not have a Speechmatics software repository account or have lost your details, please contact Speechmatics support support@speechmatics.com.

The latest information about the containers can be found in the solutions section of the support portal. If a support account is not available or the Containers section is not visible in the support portal, please contact Speechmatics support support@speechmatics.com for help.

Prior to pulling any Docker images, the following must be known:

Speechmatics Docker URL – provided by the Speechmatics team
Language Code – the ISO language code (for example fr for French)
LICENSE_KEY - which is required to start a container
TAG – which is used to identify the image version

Getting the Image

After gaining access to the relevant details for the Speechmatics software repository, follow the steps below to login and pull the Docker images that are required.

Software Repository Login

Ensure the Speechmatics Docker URL and software repository username and password are available. The endpoint being used will require Docker to be installed. For example:

docker login https://speechmatics-docker-example.jfrog.io

You will be prompted for username and password. If successful, you will see the response:

Login Succeeded

If unsuccessful, please verify your credentials and URL. If problems persist, please contact Speechmatics support.

Pulling the Image

To pull the Docker image to the local environment follow the instructions below. Each supported language pack comes as a different Docker image, so the process will need to be repeated for each language pack required.

Example: pulling Global English (en) with the 9.1.0 TAG:

docker pull speechmatics-docker-example.jfrog.io/transcriber-en:6.3.0

Example: pulling the Spanish (es) model with the 9.1.0 TAG:

docker pull speechmatics-docker-example.jfrog.io/transcriber-es:6.3.0

The image will start to download. This could take a while depending on your connection speed.

Note: Speechmatics require all customers to cache a copy of the Docker image(s) within their own environment. Please do not pull directly from the Speechmatics software repository for each deployment.

Licensing

The Docker images we provide have a configured expiry date and must be used in conjunction with the license key that has been issued to you. The Docker images and license key are specific to your organisation, and should not be shared with any third parties. License keys must be provided at runtime through a LICENSE_KEY environment value, like this:

-e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702

Using the Container

Once the Docker image has been pulled into a local environment, it can be started using the Docker run command. More details about operating and managing the container are available in the Docker API documentation.

There are two different methods for passing an audio file into a container:

STDIN: Streams audio file into the container though the standard command line entry point
File Location: Pulls audio file from a file location

Here are some examples below to demonstrate these modes of operating the containers.

Example 1: passing a file using the cat command to the Spanish (es) container

cat ~/sm_audio.wav | docker run -i \
  -e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
  speechmatics-docker-example.jfrog.io/transcriber-es:6.3.0

Example 2: pulling an audio file from a volume-ma directory into the container

docker run -i -v ~/sm_audio.wav:/input.audio \
  -e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
  speechmatics-docker-example.jfrog.io/transcriber-es:6.3.0

NOTE: the audio file must be mapped into the container with :/input.audio

The Docker run options used are:

Name	Description
`--env, -e`	Set environment variables
`--interactive , -i`	Keep STDIN open even if not attached
`--volume , -v`	Bind mount a volume

See Docker docs for a full list of the available options.

Both the methods will produce the same transcribed outcome. STDOUT is used to provide the transcription in a JSON format. Here's an example:

{
  "format": "1.0",
  "license": "docker-example build (Mon Nov 12 12:00:00 2019): 122 days remaining",
  "speakers": [
    {
      "duration": "2.28",
      "name": "UU",
      "time": "0.57"
    }
  ],
  "words": [
    {
      "confidence": "1.00",
      "duration": "0.51",
      "name": "This",
      "time": "0.57"
    },
    {
      "confidence": "1.00",
      "duration": "0.21",
      "name": "is",
      "time": "1.17"
    },
    {
      "confidence": "1.00",
      "duration": "0.09",
      "name": "a",
      "time": "1.38"
    },
    {
      "confidence": "1.00",
      "duration": "0.54",
      "name": "quick",
      "time": "1.47"
    },
    {
      "confidence": "1.00",
      "duration": "0.60",
      "name": "test",
      "time": "2.22"
    },
    {
      "confidence": "1.00",
      "duration": "0.03",
      "name": ".",
      "time": "2.82"
    }
  ]
}

Determining success

The exit code of the container will determine if the transcription was successful. There are two exit code possibilities:

Exit Code == 0 : The transcript was a success; the output will contain a JSON output defining the transcript (more info below)
Exit Code != 0 : the output will contain a stack trace and other useful information. This output should be used in any communication with Speechmatics support to aid understanding and resolution of any problems that may occur

Troubleshooting

Enabling Logging

If you are seeing problems then we recommend that you enable logging and open a support ticket with Speechmatics support: support@speechmatics.com.

To enable logging you add two environment variables:

SM_JOB_ID - a job id, for example: 1
SM_LOG_DIR - the directory inside the container where to write the logs, for example: /logs

The following example shows how to do this, using the -stderr=true argument to dump the logs to stderr:

docker run --rm -e SM_JOB_ID=123 -e SM_LOG_DIR=/logs \
  -v ~/sm_audio.wav:/input.audio \
  -e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
  speechmatics-docker-example.jfrog.io/transcriber-en:6.3.0 /
  -stderr=true

When raising a support ticket it is normally easier to write the log output to a specific file. You can do this by creating a volume mount where the logs will be accessible from after the container has finished. Before running the container you need to create a directory for the log file and ensure it has the correct permissions. In this example we use a local logs directory to store the output of the log for a job with ID 124:

mkdir -p logs/124
sudo chown -R nobody:nogroup logs/
sudo chmod -R a+rwx logs/

docker run --rm -v ${PWD}/logs:/logs -e SM_JOB_ID=124 -e SM_JOB_ID=/logs \
  -v ~/sm_audio.wav:/input.audio \
  -e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
  speechmatics-docker-example.jfrog.io/transcriber-en:6.3.0

tail logs/124/sigurd.log

Common Problems

Common problems to look out for are when there is no recognisable audio in the input file, or it is in a format that is unsupported. You should also ensure, when using the config object, that the JSON is correctly formatted.

Building an Image

Using STDIN to pass files in and obtain the transcription may not be sufficient for all use cases. It is possible to build a new Docker Image that will use the Speechmatics Image as a layer. This will allow greater flexibility and a mechanism to fit into custom workflows. To include the Speechmatics Docker Image inside another image, ensure to add the pulled Docker image into the Dockerfile for the new application.

Requirements for a custom image

To ensure the Speechmatics Docker image works as expected inside the custom image, please consider the following:

Any audio that needs to be transcribed must to be copied to a file called "/input.audio" inside the running container
To initiate transcription, call the application pipeline. The pipeline will start the transcription service and use /input.audio as the audio source
Once pipeline finishes transcribing, ensure you move the transcription data outside the container
Shutdown the container after each transcription of an audio file

Dockerfile

To add a Speechmatics Docker image into a custom one, the Dockerfile must be modified to include the full image name of the locally available image.

Example: Adding Global English (en) with tag 9.1.0 to the Dockerfile

FROM speechmatics-docker-example.jfrog.io/transcriber-en:6.3.0
ADD download_audio.sh /usr/local/bin/download_audio.sh
RUN chmod +x /usr/local/bin/download_audio.sh
CMD ["/usr/local/bin/download_audio.sh"]

Once the above image is built, and a container instantiated from it, a script called download_audio.sh will be executed (this could do something like pulling a file from a webserver and copying it to /input.audio before starting the pipeline application). This is a very basic Dockerfile to demonstrate a way of orchestrating the Speechmatics Docker Image.

NOTE: For support purposes, it is assumed the Docker Image provided by Speechmatics has been unmodified. If you experience issues, Speechmatics support will require you to replicate the issues with the unmodified Docker image e.g. speechmatics-docker-example.jfrog.io/transcriber-en:6.3.0