This guide will walk you through the steps needed to deploy the Speechmatics Batch Container ready for transcription.
After these steps, the Docker Image can be used to create containers that will transcribe audio files. More information about using the Speechmatics container transcription service is detailed in the Speechmatics Container API guide.
Speechmatics containerized deployments are built on the Docker platform. In order to operate the containers, the following requirements will need to be met.
An individual Docker image is required for each language transcription is required within. A single image can be used to create and run multiple containers concurrently, each running container will require the following resources:
The host machine requires a processor with following minimum specification: Intel® Xeon® CPU E5-2630 v4 (Sandy Bridge) 2.20GHz (or equivalent). This is important because these chipsets (and later ones) support Advanced Vector Extensions (AVX). The machine learning algorithms used by Speechmatics ASR require the performance optimizations that AVX provides. You should also ensure that your hypervisor has AVX enabled.
Note: Each language pack required is distributed as a separate Docker image. Only the language packs required need to be installed on the Docker host.
Each container:
In addition, multiple instances of the container can be run on the same Docker host. This enables scaling of a single language or multiple-languages as required.
The Speechmatics Docker image is obtained from the Speechmatics Docker repository (jfrog.io). If you do not have a Speechmatics software repository account or have lost your details, please contact Speechmatics support support@speechmatics.com.
The latest information about the containers can be found in the solutions section of the support portal. If a support account is not available or the Containers section is not visible in the support portal, please contact Speechmatics support support@speechmatics.com for help.
Prior to pulling any Docker images, the following must be known:
fr
for French)LICENSE_KEY
- which is required to start a containerTAG
– which is used to identify the image versionAfter gaining access to the relevant details for the Speechmatics software repository, follow the steps below to login and pull the Docker images that are required.
Ensure the Speechmatics Docker URL and software repository username and password are available. The endpoint being used will require Docker to be installed. For example:
docker login https://speechmatics-docker-example.jfrog.io
You will be prompted for username and password. If successful, you will see the response:
Login Succeeded
If unsuccessful, please verify your credentials and URL. If problems persist, please contact Speechmatics support.
To pull the Docker image to the local environment follow the instructions below. Each supported language pack comes as a different Docker image, so the process will need to be repeated for each language pack required.
Example: pulling Global English (en) with the 7.0.0 TAG:
docker pull speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0
Example: pulling the Spanish (es) model with the 7.0.0 TAG:
docker pull speechmatics-docker-example.jfrog.io/transcriber-es:7.0.0
The image will start to download. This could take a while depending on your connection speed.
Note: Speechmatics require all customers to cache a copy of the Docker image(s) within their own environment. Please do not pull directly from the Speechmatics software repository for each deployment.
The Docker images we provide have a configured expiry date and must be used in conjunction with the license key that has been issued to you. The Docker images and license key are specific to your organisation, and should not be shared with any third parties. License keys must be provided at runtime through a LICENSE_KEY
environment value, like this:
-e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702
Once the Docker image has been pulled into a local environment, it can be started using the Docker run
command. More details about operating and managing the container are available in the Docker API documentation.
There are two different methods for passing an audio file into a container:
Here are some examples below to demonstrate these modes of operating the containers.
Example 1: passing a file using the cat
command to the Spanish (es) container
cat ~/sm_audio.wav | docker run -i \
-e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
speechmatics-docker-example.jfrog.io/transcriber-es:7.0.0
Example 2: pulling an audio file from a volume-ma directory into the container
docker run -i -v ~/sm_audio.wav:/input.audio \
-e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
speechmatics-docker-example.jfrog.io/transcriber-es:7.0.0
NOTE: the audio file must be mapped into the container with :/input.audio
The Docker run
options used are:
Name | Description |
---|---|
--env, -e | Set environment variables |
--interactive , -i | Keep STDIN open even if not attached |
--volume , -v | Bind mount a volume |
See Docker docs for a full list of the available options.
Both the methods will produce the same transcribed outcome. STDOUT is used to provide the transcription in a JSON format. Here's an example:
{
"format": "2.4",
"license": "productsteam build (Thu May 14 14:33:09 2020): 953 days remaining",
"metadata": {
"created_at": "2020-06-30T15:43:50.871Z",
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "none",
"additional_vocab": [
{
"content": "Met Office"
},
{
"content": "Fitzroy"
},
{
"content": "Forties"
}
]
}
},
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "Are",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.61,
"start_time": 3.49,
"type": "word"
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "on",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.73,
"start_time": 3.61,
"type": "word"
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "the",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.79,
"start_time": 3.73,
"type": "word"
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "rise",
"language": "en",
"speaker": "UU"
}
],
"end_time": 4.27,
"start_time": 3.79,
"type": "word"
}
]
}
The working directory is home/smuser/work
now rather than work
. This is the case whether running the container as a root or non-root user.
The exit code of the container will determine if the transcription was successful. There are two exit code possibilities:
If you are seeing problems then we recommend that you enable logging and open a support ticket with Speechmatics support: support@speechmatics.com.
To enable logging you add two environment variables:
SM_JOB_ID
- a job id, for example: 1
SM_LOG_DIR
- the directory inside the container where to write the logs, for example: /logs
The following example shows how to do this, using the -stderr=true
argument to dump the logs to stderr:
docker run --rm -e SM_JOB_ID=123 -e SM_LOG_DIR=/logs \
-v ~/sm_audio.wav:/input.audio \
-e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0 \
-stderr=true
When raising a support ticket it is normally easier to write the log output to a specific file. You can do this by creating a volume mount where the logs will be accessible from after the container has finished. Before running the container you need to create a directory for the log file and ensure it has the correct permissions. In this example we use a local logs directory to store the output of the log for a job with ID 124:
mkdir -p logs/124
sudo chown -R nobody:nogroup logs/
sudo chmod -R a+rwx logs/
docker run --rm -v ${PWD}/logs:/logs -e SM_JOB_ID=124 -e SM_JOB_ID=/logs \
-v ~/sm_audio.wav:/input.audio \
-e LICENSE_KEY=f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 \
speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0
tail logs/124/sigurd.log
There are occassions where the transcription container will fail to transcribe the media file provided and will exit without error code 0 (success). Speechmatics heavily advise enabling logging (see instruction above). The logs will show some of the reasons for the failed job especially when multiple errors can cause the same error code. Below are some errors with suggestions and how they can be revolved.
Error Code | Error | Resolution |
---|---|---|
1 | “err: signal: illegal instruction” | This means that the models couldn’t be loaded within the container. Please ensure that the host that’s running the Docker engine has an AVX compatible CPU. The following can also be done inside the container to check that AVX is listed in the CPU flags. $ docker run -it --entrypoint /bin/bash speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0 $ cat /proc/cpuinfo \| grep flags |
1 | “Unable to set up logging” | This can occur when a directory is volume mapped into the containers and a log file cannot be created into that directory. Example command to map in a tmp directory inside the container to /xxx path: $ docker run --rm -e SM_LOG_DIR=/xxx -e SM_JOB_ID=1 -v $PWD/tmp:/xxx speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0 |
1 | “err: licensing failed” | This generally occurs if either no license key or the wrong key is supplied. Use the license key provided by Speechmatics. Example command: $ docker run -i –v /home/user/config.json:/config.json -v /home/user/example.wav:/input.audio -e LICENSE_KEY= f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0 --stderr |
1 | “/input.audio is not valid” | If volume mapping the file into the container, ensure that a valid audio file is being mapped in. |
1 | “failed to get sample rate” | The sample rate from the audio file that was passed for recognition did not have a sample rate. Check the audio file is valid and that a sample rate can be read. The following ffmpeg can be used to identify it there is a valid sample rate: $ ffmpeg -i /home/user/example.wav |
1 | “exit status 1” | If the container is memory (RAM) starved it can quit during the transcription process. Verify the minimum resource (CPU and RAM) requirements are being assigned to a transcription container. The inspect command in docker can be useful to identify if the lack of memory shutdown the container. Look out for the “OOMKilled” value. Here is an example. . $ docker inspect --format='{{json .State}}' $containerID |
2 | “The value of --parallel must be >= 1, but 0 was supplied” OR “invalid value "--stderr" for flag --parallel: parse error” | If using the parallel option to speed up the processing time on files more than 5 minutes in length the -–parallel switch needs to have an integer at least 1. A non-zero value must be provided if the parallel command is to be used. The example below shows a valid command: $ docker run -i –v /home/user/config.json:/config.json -v /home/user/example.wav:/input.audio -e LICENSE_KEY= f787b0051e2768b1f619d75faab97f23ee9b7931890c05f97e9f550702 speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0 --parallel 2 |
If you still continue to face issues, please contact Speechmatics support support@speechmatics.com.
Using STDIN to pass files in and obtain the transcription may not be sufficient for all use cases. It is possible to build a new Docker Image that will use the Speechmatics Image as a layer. This will allow greater flexibility and a mechanism to fit into custom workflows. To include the Speechmatics Docker Image inside another image, ensure to add the pulled Docker image into the Dockerfile for the new application.
To ensure the Speechmatics Docker image works as expected inside the custom image, please consider the following:
pipeline
. The pipeline will start the transcription service and use /input.audio
as the audio sourceTo add a Speechmatics Docker image into a custom one, the Dockerfile must be modified to include the full image name of the locally available image.
Example: Adding Global English (en) with tag 7.0.0 to the Dockerfile
FROM speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0
ADD download_audio.sh /usr/local/bin/download_audio.sh
RUN chmod +x /usr/local/bin/download_audio.sh
CMD ["/usr/local/bin/download_audio.sh"]
Once the above image is built, and a container instantiated from it, a script called download_audio.sh
will be executed (this could do something like pulling a file from a webserver and copying it to /input.audio
before starting the pipeline application). This is a very basic Dockerfile to demonstrate a way of orchestrating the Speechmatics Docker Image.
NOTE: For support purposes, it is assumed the Docker Image provided by Speechmatics has been unmodified. If you experience issues, Speechmatics support will require you to replicate the issues with the unmodified Docker image e.g. speechmatics-docker-example.jfrog.io/transcriber-en:7.0.0
There are some use cases where you may not be able to run the batch container as a root user. This may be because you are working in a hosting environment that mandates the use of a named user rather than root.
You must start the container with the command docker run –user $USERNUMBER:$GROUPID
. User number and group ID are non-zero numerical values from a value of 1 up to a value of 65535. So a valid example would be:
docker run -user 1000:3000.
Getting Transcription Output as a non-root user
If you take transcription via the default STDOUT, then this will not change as a non-root user. An example is below:
docker run -u 1020:4000 \
-v /Users/$USER/work/pipeline/mydev/config.json:/config.json \
-v /Users/$USER/work/pipeline/mydev/input.audio:/input.audio \
${IMAGE_NAME}
If you want to map the output to a specific directory, you must volume map a directory to which a non-root user would have access.
Please Note The examples below do not constitute an explicit recommendation to run as non-root user, merely a guideline on how to do so with Kubernetes only where this is an unavoidable requirement.
If you require named users to be deployed on Kubernetes Pods, you must set the following Security Config. The user and group must correspond to the user and group you use when starting the container
securityContext:
runAsUser: {non-zero numerical value between 0 and 65535}
runAsGroup: {non-zero numerical value between 0 and 65535}
There is more information on how to configure security settings on Kubernetes pods here
Some Kubernetes deployments may mandate the use of PodSecurity Admissions Controllers. These provide stricter security requirements. More information on them can be found here. If your deployment does require this set up, here is an example configuration that would allow you to carry out transcription as a non-root user.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'runtime/default'
apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
# Assume that persistentVolumes set up by the cluster admin are safe to use.
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
# Require the container to run without root privileges.
rule: 'MustRunAsNonRoot'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
readOnlyRootFilesystem: false