Batch Container Quick Start Guide

This guide will walk you through the steps needed to deploy the Speechmatics Batch Container ready for transcription.

Check system requirements
Pull the Docker Image
Run the Container

After these steps, the Docker Image can be used to create containers that will transcribe audio files. More information about using the Speechmatics container transcription service is detailed in the Speechmatics Container API guide.

System requirements

Speechmatics containerized deployments are built on the Docker platform. In order to operate the containers, the following requirements will need to be met.

System requirements

An individual Docker image is required for each language transcription is required within. A single image can be used to create and run multiple containers concurrently, each running container will require the following resources:

1 vCPU
2-5GB RAM
100MB hard disk space

If you are using the enhanced model, it is recommended to use the upper limit of the RAM recommendations

Please Note: When using the parallel processing functionality, of the batch container, this will require more resource due to the intensive memory required. When using parallel processing, we recommend using (NxRAM requirements) where N is the number of cores intended to be used for parallel processing. So if 2 cores were required for parallel processing, the RAM requirements would be up to 10GB

Host recommended specs

The host machine requires a processor with following microarchitecture specification to run at expected performance:

If using the standard model offering at least the Broadwell Class is required
If using the enhanced model offering at least the CascadeLake class is required
It is also recommended if using the enhanced model that the hardware supports the AVX512_VNNI flag, as this will greatly improve transcription processing speed
- Examples of this among popular hosting providers include the Microsoft Azure DSV-4 class, and the Amazon M5n EC2 server class
- Disabling hyperthreading when running the enhanced model can also improve transcription speed. How to do so when running on Amazon Web Services is shown here, and for Microsoft Azure please see here

AVX flags

Advanced Vector Extensions (AVX) are necessary to allow Speechmatics to carry out transcription.

For the enhanced model, it is recommended to use the AVX512_VNNI flag, which will substantially improve transcription processing speed.
For the standard model, it is necessary to use at least a processor that supports Advanced Vector Extensions 2 (AVX2).
- You should also ensure your hypervisor is enabled to use AVX2.

Architecture

Each container:

Processes one input file and outputs a resulting transcript in a predefined language in a number of supported outputs
- The output can be altered by means of a configuration object passed with the file
- These outputs and relevant metadata are described in more detail in the Speech API guide
Is licensed for languages and speech features which vary depending upon each individual contract
- Speech features are described after the Speech API guide
Requires either a license file or license token before transcription starts.
Can run in a mode that parallelises processing across multiple cores
Supports input file sizes up to 2 hours in length or 4GB in size
Treats all data is transitory. Once a container completes its transcription it removes all record of the operation.

Supported Languages

The following languages are supported:

Language	Language Code
Arabic	(ar)
Bulgarian	(bg)
Cantonese	(yue)
Catalan	(ca)
Croatian	(hr)
Czech	(cs)
Danish	(da)
Dutch	(nl)
English	(en)
Finnish	(fi)
French	(fr)
German	(de)
Greek	(el)
Hindi	(hi)
Hungarian	(hu)
Italian	(it)
Indonesian	(id)
Japanese	(ja)
Korean	(ko)
Latvian	(lv)
Lithuanian	(lt)
Malay	(ms)
Mandarin	(cmn)
Norwegian	(no)
Polish	(pl)
Portuguese	(pt)
Romanian	(ro)
Russian	(ru)
Slovakian	(sk)
Slovenian	(sl)
Spanish	(es)
Swedish	(sv)
Turkish	(tr)

Please also note any languages outside this list are not explicitly supported. Only one language can be processed within each request. Each language above also has a two-letter ISO639-1 code that must be provided for any transcription request.

Supported File Formats

Only the following file formats are supported:

aac
amr
flac
m4a
mov
mp3
mp4
mpeg
ogg
wav

In addition, multiple instances of the container can be run on the same Docker host. This enables scaling of a single language or multiple-languages as required.

Accessing the Image

The Speechmatics Docker image is obtained from the Speechmatics Docker repository (jfrog.io). If you do not have a Speechmatics software repository account or have lost your details, please contact Speechmatics support support@speechmatics.com.

The latest information about the containers can be found in the solutions section of the support portal. If a support account is not available or the Containers section is not visible in the support portal, please contact Speechmatics support support@speechmatics.com for help.

Prior to pulling any Docker images, the following must be known:

Speechmatics Docker URL – provided by the Speechmatics Support team
Language Code – the ISO language code (for example fr for French)
LICENSE_TOKEN - The value of the signed claims token which is used to validate the license file. This is required to run the Container. Speechmatics Support will provide this within the license file generated for each customer
TAG – which is used to identify the image version

Getting the Image

After gaining access to the relevant details for the Speechmatics software repository, follow the steps below to login and pull the Batch Container image(s) required.

Software Repository Login

Ensure the Speechmatics Docker URL and software repository username and password are available. The endpoint being used will require Docker to be installed. For example:

docker login https://speechmatics-docker-public.jfrog.io

You will be prompted for username and password. If successful, you will see the response:

Login Succeeded

If unsuccessful, please verify your credentials and URL. If problems persist, please contact Speechmatics support.

Pulling the Image

To pull the Batch Container image to the local environment follow the instructions below. Each supported language pack comes as a different Docker image, so the process will need to be repeated for each language pack required.

Example: pulling Global English (en) with the 9.1.0 TAG:

docker pull speechmatics-docker-public.jfrog.io/batch-asr-transcriber-en:9.1.0

Example: pulling the Spanish (es) model with the 9.1.0 TAG:

docker pull speechmatics-docker-public.jfrog.io/batch-asr-transcriber-es:9.1.0

The image will start to download. This could take a while depending on your connection speed.

Note: Speechmatics require all customers to cache a copy of the Docker image(s) within their own environment. Please do not pull directly from the Speechmatics software repository for each deployment.

As of Feb 2021, all Speechmatics containers are built using Docker Buildkit. This should not impact your internal management of the Speechmatics Container. If you use JFrog to host the Speechmatics container there may be some UI issues see here, but these are cosmetic and should not impact your ability to pull and run the container. If your internal registry uses Nexus and self-signed certificates, please make sure you are on Nexus version 3.15 or above or you may encounter errors.

Licensing

You should have received a confidential license file from Speechmatics containing a token to use to license your container. The contents of the file received should look similar to this:

{
    "contractid": 1,
    "creationdate": "2020-03-24 17:43:35",
    "customer": "Speechmatics",
    "id": "c18a4eb990b143agadeb384cbj7b04c3",
    "is_trial": true,
    "metadata": {
        "key_pair_id": 1,
        "request": {
            "customer": "Speechmatics",
            "features": [
                "MAPBA",
                "LANY"
            ],
            "isTrial": true,
            "notValidAfter": "2021-01-01",
            "validFrom": "2020-01-01"
        }
    },
    "signedclaimstoken": "example",
}

The validFrom and notValidAfter keys in the license file specify the start and end dates for the validity of your license. The license is valid from 00:00 UTC on the start date to 00:00 UTC on the expiry date. After the expiry date, the container will continue to run but will not transcribe audio. You should apply for a new license before this happens.

Licensing does not require an internet connection.

There are two ways to apply the license to the container.

As a volume-mapped file

The license file should be mapped to the path /license.json within the container. For example:

docker run ... -v /my_license.json:/license.json:ro batch-asr-transcriber-en:9.1.0

As an environment variable

Setting an environment variable named LICENSE_TOKEN is also a valid way to license the container. The contents of this variable should be set to the value of the signedclaimstoken from within the license file.

For example, copy the signedclaimstoken from the license file (without the quotation marks) and set the enviroment variable as below. The token example is not a full example:

docker run ... -e LICENSE_TOKEN=eyJhbGciOiJ... batch-asr-transcriber-en:9.1.0

There should be no reason to do this, but if both a volume-mapped file and an environment variable are provided simultaneously then the volume-mapped file will be ignored.

Using the Container

Once the Docker image has been pulled into a local environment, it can be started using the Docker run command. More details about operating and managing the container are available in the Docker API documentation.

There are two different methods for passing an audio file into a container:

STDIN: Streams audio file into the container though the standard command line entry point
File Location: Pulls audio file from a file location

Here are some examples below to demonstrate these modes of operating the containers.

Example 1: passing a file using the cat command to the Spanish (es) container

cat ~/$AUDIO_FILE | docker run -i \
  -e LICENSE_TOKEN=eyJhbGciOiJ... \
  batch-asr-transcriber-es:9.1.0

Example 2: pulling an audio file from a mapped directory into the container

docker run -i -v ~/$AUDIO_FILE:/input.audio \
  -e LICENSE_TOKEN=eyJhbGciOiJ... \
  batch-asr-transcriber-es:9.1.0

NOTE: the audio file must be mapped into the container with :/input.audio

The Docker run options used are:

Name	Description
`--env, -e`	Set environment variables
`--interactive , -i`	Keep STDIN open even if not attached
`--volume , -v`	Bind mount a volume

See Docker docs for a full list of the available options.

Both the methods will produce the same transcribed outcome. STDOUT is used to provide the transcription in a JSON format. Here's an example:

{
  "format": "2.7",
  "metadata": {
    "created_at": "2020-06-30T15:43:50.871Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en",
      "diarization": "none",
      "additional_vocab": [
        {
          "content": "Met Office"
        },
        {
          "content": "Fitzroy"
        },
        {
          "content": "Forties"
        }
      ]
    }
  },
  "results": [
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "Are",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.61,
      "start_time": 3.49,
      "type": "word"
      },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "on",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.73,
      "start_time": 3.61,
      "type": "word"
      },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "the",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.79,
      "start_time": 3.73,
      "type": "word"
      },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "rise",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 4.27,
      "start_time": 3.79,
      "type": "word"
    }
  ]
}

Intermediate files

The intermediate files created during the transcription are stored in /home/smuser/work. This is the case whether running the container as a root or non-root user.

Determining success

The exit code of the container will determine if the transcription was successful. There are two exit code possibilities:

Exit Code == 0 : The transcript was a success; the output will contain a JSON output defining the transcript (more info below)
Exit Code != 0 : the output will contain a stack trace and other useful information. This output should be used in any communication with Speechmatics support to aid understanding and resolution of any problems that may occur

Troubleshooting

Enabling Logging

If you are seeing problems then we recommend that you enable logging and open a support ticket with Speechmatics support: support@speechmatics.com.

The following example shows how to enable logging, using the -stderr argument to output the logs to stderr:

 docker run --rm -e SM_JOB_ID=123 -e SM_LOG_DIR=/logs \
-v ~/$AUDIO_FILE:/input.audio \
-e LICENSE_TOKEN=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
batch-asr-transcriber-en:9.1.0 \
-stderr

To store the output of logs, add two environment variables:

SM_JOB_ID: - a job id, for example: 1
SM_LOG_DIR: - the directory inside the container where to write the logs, for example: /logs

When raising a support ticket it is normally easier to write the log output to a specific file. You can do this by creating a volume mount where the logs will be accessible from after the container has finished. Before running the container you need to create a directory for the log file and ensure it has the correct permissions. In this example we use a local logs directory to store the output of the log for a job with ID 124:

mkdir -p logs/124 /
sudo chown -R nobody:nogroup logs/
sudo chmod -R a+rwx logs/

then

docker run --rm -v ${PWD}/logs:/logs -e SM_JOB_ID=124 -e SM_JOB_ID=/logs \
-v ~/sm_audio.wav:/input.audio \
-e LICENSE_TOKEN=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
 batch-asr-transcriber-en:9.1.0
tail logs/124/sigurd.log

Common Problems

There are occassions where the transcription container will fail to transcribe the media file provided and will exit without error code 0 (success). Speechmatics heavily advise enabling logging (see instruction above). The logs will show some of the reasons for the failed job especially when multiple errors can cause the same error code. Below are some errors with suggestions and how they can be revolved.

Error Code	Error	Resolution
1	“err: signal: illegal instruction”	This means that the models couldn’t be loaded within the container. Please ensure that the host that’s running the Docker engine has an AVX compatible CPU. The following can also be done inside the container to check that AVX is listed in the CPU flags. $ docker run -it --entrypoint /bin/bash batch-asr-transcriber-en:9.1.0 $ cat /proc/cpuinfo \\| grep flags
1	“Unable to set up logging”	This can occur when a directory is volume mapped into the containers and a log file cannot be created into that directory. Example command to map in a tmp directory inside the container to /xxx path: $ docker run --rm -e SM_LOG_DIR=/xxx -e SM_JOB_ID=1 -v $PWD/tmp:/xxx batch-asr-transcriber-en:9.1.0
1	“/input.audio is not valid”	If volume mapping the file into the container, ensure that a valid audio file is being mapped in.
1	“failed to get sample rate”	The sample rate from the audio file that was passed for recognition did not have a sample rate. Check the audio file is valid and that a sample rate can be read. The following ffmpeg can be used to identify it there is a valid sample rate: $ ffmpeg -i /home/user/example.wav
1	“exit status 1”	If the container is memory (RAM) starved it can quit during the transcription process. Verify the minimum resource (CPU and RAM) requirements are being assigned to a transcription container. The inspect command in docker can be useful to identify if the lack of memory shutdown the container. Look out for the “OOMKilled” value. Here is an example. . $ docker inspect --format='{{json .State}}' $containerID
1	"License Error: illegal base64 data at input byte $NUMBER	The license token value has been truncated or otherwise altered from the initial value generated. Please ensure that you have copied token value correctly or that the license file is not corupt
1	"ERROR sentryserver could not load license: stat /license.json: no such file or directory"	The license file or license token has not been passed when attempting to run the container. Please ensure that the license file or license token value is passed as documented
2	--parallel/-parallel: invalid check_parallel value: '0'	If using the parallel option to speed up the processing time on files more than 5 minutes in length the -–parallel switch needs to have an integer at least 1. A non-zero value must be provided if the parallel command is to be used. The example below shows a valid command: $ docker run -i –v /home/user/config.json:/config.json -v /home/user/example.wav:/input.audio -e LICENSE_TOKEN=$TOKEN_VALUE batch-asr-transcriber-en:9.1.0 --parallel 2

If you still continue to face issues, please contact Speechmatics support support@speechmatics.com.

Modifying the Image

Building an Image

Using STDIN to pass files in and obtain the transcription may not be sufficient for all use cases. It is possible to build a new Docker Image that will use the Speechmatics Image as a layer if required for your specific workflow. To include the Speechmatics Docker Image inside another image, ensure to add the pulled Docker image into the Dockerfile for the new application.

Requirements for a custom image

To ensure the Speechmatics Docker image works as expected inside the custom image, please consider the following:

Any audio that needs to be transcribed must to be copied to a file called /input.audio inside the running container
To initiate transcription, call the application pipeline. The pipeline will start the transcription service and use /input.audio as the audio source.
When running pipeline, the working directory must be set to /opt/orchestrator, using either the Dockerfile WORKDIR directive, the cd command or similar means.
Once pipeline finishes transcribing, ensure you move the transcription data outside the container

Dockerfile

To add a Speechmatics Docker image into a custom one, the Dockerfile must be modified to include the full image name of the locally available image.

Example: Adding Global English (en) with tag 9.1.0 to the Dockerfile

FROM batch-asr-transcriber-en:9.1.0
ADD download_audio.sh /usr/local/bin/download_audio.sh
RUN chmod +x /usr/local/bin/download_audio.sh
CMD ["/usr/local/bin/download_audio.sh"]

Once the above image is built, and a container instantiated from it, a script called download_audio.sh will be executed (this could do something like pulling a file from a webserver and copying it to /input.audio before starting the pipeline application). This is a very basic Dockerfile to demonstrate a way of orchestrating the Speechmatics Docker Image.

NOTE: For support purposes, it is assumed the Docker Image provided by Speechmatics has been unmodified. If you experience issues, Speechmatics support will require you to replicate the issues with the unmodified Docker image e.g. batch-asr-transcriber-en:9.1.0

Additional Security Features

This section documents addition measures you can take to run the Batch Container where there are restrictive requirements on data storage or user access.

Custom Mapping Temporary Directories to run the Batch Container

Users may wish to run the Batch Container in an environment where they cannot or do not want to write anything to disk, and instead use temporary storage like tmpfs or ramfs to ensure regulatory compliance. The Batch Container supports mounting temporary directories for the storage of all intermediate files created during transcription, as well as mounting the directories where input, output and job configuration files are placed. Files can also be locally retrieved from by using the fetch_url functionality in the configuration object.

Speechmatics also supports the --job-config variable to specify the location of the configuration object. The job config location must specify the location in the container at which the config file can be found. If this too needs to be in a temporary directory (e.g. tmp), rather than tmpfs this must be a volume from a host machine in which the configuration object can be found.

An example is below, where the intermediate files and configuration object are in temporary storage. Please note the --job-config argument must come after the image name

docker run --rm -i \
--read-only --tmpfs /home/smuser \
-v <path/to/dir/in/host/containing/config.json>:/tmp \
-e LICENSE_TOKEN=$TOKEN_VALUE \
batch-asr-transcriber-en:9.1.0 \
--job-config /tmp/config.json

This example sets up a tmpfs for intermediate files created by transcription, which means that all such files are written to transient storage, and not to disk. The configuration object is mounted in a retrievable folder in tmp.

An alternative is to use tmp as tmpfs and then mount an additional read-only volume in an path inside the container in which the config can be found

docker run --rm -i \
--read-only --tmpfs /home/smuser --tmpfs /tmp \
-v <path/to/dir/in/host/containing/config.json>:/configs_dir:ro \
-e LICENSE_TOKEN=$TOKEN_VALUE \
batch-asr-transcriber-en:9.1.0 \
--job-config /example_configs_dir/config.json

If the Container is run using Kubernetes, users can use the emptyDir to mount tmpfs in the needed directories (/home/smuser and /tmp). Configuration files can also be stored in an emptyDir if any of the containers in the pod is able to put it there. This could be achieved in deployment software like Kubernetes by using an initContainer or using the sidecar pattern o fetch the configuration from its original location and storing it in the emptyDir volume. Then the transcriber should be called with the --job-config argument pointing to the path in the emptyDir volume in which the config was stored..

Users can also pull files from temporary locations using fetch_url functionality Below is a configuration example:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "fetch_data": {
    "url": "file:///tmp/$FILENAME.wav"
  }
}

Running a batch container as a non-root user

There are some use cases where you may not be able to run the batch container as a root user. This may be because you are working in a hosting environment that mandates the use of a named user rather than root.

You must start the container with the command docker run –user $USERNUMBER:$GROUPID. User number and group ID are non-zero numerical values from a value of 1 up to a value of 65535. So a valid example would be:

docker run -user 1000:3000.

Getting Transcription Output as a non-root user

If you take transcription via the default STDOUT, then this will not change as a non-root user. An example is below:

docker run -u 1020:4000 \ 
 -v /Users/$USER/work/pipeline/mydev/config.json:/config.json \
 -v /Users/$USER/work/pipeline/mydev/input.audio:/input.audio \  
    ${IMAGE_NAME}

If you want to map the output to a specific directory, you must volume map a directory to which a non-root user would have access.

Running a Batch Container as a non-root user on Kubernetes

Please Note The examples below do not constitute an explicit recommendation to run as non-root user, merely a guideline on how to do so with Kubernetes only where this is an unavoidable requirement.

If you require named users to be deployed on Kubernetes Pods, you must set the following Security Config. The user and group must correspond to the user and group you use when starting the container

securityContext:

  runAsUser: {non-zero numerical value between 0 and 65535}
  runAsGroup: {non-zero numerical value between 0 and 65535}

There is more information on how to configure security settings on Kubernetes pods here

Some Kubernetes deployments may mandate the use of PodSecurity Admissions Controllers. These provide stricter security requirements. More information on them can be found here. If your deployment does require this set up, here is an example configuration that would allow you to carry out transcription as a non-root user.

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
    apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
    seccomp.security.alpha.kubernetes.io/defaultProfileName:  'runtime/default'
    apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
spec:
  privileged: false
  # Required to prevent escalations to root.
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  # Allow core volume types.
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    # Assume that persistentVolumes set up by the cluster admin are safe to use.
    - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    # Require the container to run without root privileges.
    rule: 'MustRunAsNonRoot'
  seLinux:
    # This policy assumes the nodes are using AppArmor rather than SELinux.
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535
  fsGroup:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535
  readOnlyRootFilesystem: false