This is the documentation for a previous version of our product. Click here to see the latest version.

Batch Virtual Appliance

Deprecation Note

Speechmatics now supports the V2 Speech API in the Batch Virtual Appliance, which is now the recommend way to interact with the Batch Virtual Appliance. The API itself is almost identical to the V2 API in the Speechmatics Cloud Offering found here except where explicitly noted in this document, and has a wider range of features than our V1 API. The V2 API fully supports the use of the configuration object that has been partially supported in the V1 API. We recommend that customers should migrate to the V2 API as soon as possible. The V1 API is now deprecated and will be removed by February 2022. No new feature will be added to the V1 API.

This release supports both the V2 and V1 API. Customers can use either version to submit files and retrieve transcripts. If you submit a file using the V1 API, you must retrieve it using the V1 API. You should not submit a file with one API and retrieve it with another one.

Where we have added new functionality that is V2 exclusive it is explicitly highlighted in this documentation.

Overview

The Speechmatics Batch Virtual Appliance exposes a REST Speech API to enable communication between a client application and the appliance over a HTTP or HTTPS connection. This provides the ability to convert a media file into a text transcript, providing words, speaker, and timing information.

Currently the Batch Virtual Appliance supports two API formats: V1 and V2. Both are currently described within this document.

Terms

For the purposes of this guide the following terms are used.

TermDescription
ClientAn application connecting to the Batch Virtual Appliance using the Transcription API. The client will provide audio containing speech, and process the transcripts received as a result.
Management APIThe REST API that allows administrators to manage the virtual appliance over port 8080 (or 443 for secure access). To access the documentation you can use the following endpoints: http://${APPLIANCE_HOST}:8080/docs/ or https://${APPLIANCE_HOST}/docs/, where ${APPLIANCE_HOST} is the IP address or hostname of your appliance.
Speech V2 APIThe REST API that allows users of the appliance to submit ASR jobs over port 8082 (or 443 for secure access). The endpoints https://${APPLIANCE_HOST}/v2/jobs/ or http://${APPLIANCE_HOST}:8082/v2/jobs/ can be used.
Speech V1 APIThe REST API that allows users of the appliance to submit ASR jobs over port 8082 (or 443 for secure access). The endpoints https://${APPLIANCE_HOST}/v1/user/1/jobs/ or http://${APPLIANCE_HOST}:8082/v1/user/1/jobs/ can be used. V1 API-specific features are still described in this document, but is frozen and no new features will be added, and the v1 API itself will soon be removed.
Batch Virtual ApplianceThe appliance (VM) that provides ASR transcription capability.

Getting Started

In order to use the REST Speech API you need access to a Batch Virtual Appliance. See the Speechmatics Virtual Appliance Installation and Admin Guide on how to install, configure, and license the appliance.

You do not need user credentials (such as an authorization token) to use the Speech API with the Batch Virtual Appliance.

Audio Formats

A variety of audio formats for input are supported; there is no need to specify the audio format when it is submitted for transcription; the Batch Virtual Appliance automatically detects the format and handles it using the correct decoder. The current audio formats are supported:

  • aac
  • amr
  • flac
  • m4a
  • mp3
  • mp4
  • mpeg
  • ogg
  • wav

Note: the native formats are 16KHz or 8KHz (PCM32 LE) WAV; for the best results and performance we recommend that you submit files in that format.

Accessing the API

V2 API

The V2 API is the primary way via which all customers should submit media and retrieve transcripts on the Batch Virtual Appliance.

  • HTTP and HTTPS are supported. We recommend using HTTPS wherever possible. How to set up SSL configuration is documented in the Installation Guide
  • The port used for HTTP connection is port 8082 only
  • All V2 features are supported using this API version
  • The enhanced model can only be requested using this API version

V1 API

The V1 API has been frozen. It is still able to be used with no degradation in feature functionality, but no new features will be added, and it will be discontinued from February 2022.

  • HTTP and HTTPS are supported. We recommend using HTTPS wherever possible. How to set up SSL configuration is documented in the Installation Guide
  • The port used for HTTP connection is 8082 only
  • The V1 supports a subset of V2 features via the configuration object; however not all V2 features are supported when using the V1. This includes:
    • URL fetching
    • Notifications via the configuration object
    • Tracking metadata via the configuration object

File Size Limits

The maximum file size supported is 4GB, or up to 2 hours in length. Anything larger must be chunked into smaller sections in order to be successfully transcribed.

Transcription Formats

In the V2 API, three output formats are available: json-v2 (the default), txt, and srt. The current version of this output is 2.6. If the output format is set to txt, the file is returned in plain text rather than JSON format. If the output format is set to srt, the file is returned in the SubRip subtitle format instead.

In the V1 API, four output formats for transcription are available: json (the default), json-v2, txt, and srt. If you want JSON output it is recommended to use json-v2.

Troubleshooting

If you have problems making a call, ensure that you are using exactly the same URI format as shown in this document. For instance, not including the trailing '/' character on the URIs will cause a 302 redirect response to be sent – if your client does not handle redirects then this may cause problems.

Tools

Code samples in this guide expect you to use curl for making HTTP requests to the Management API, and the jq tool to parse and display JSON responses.

The easiest way to access the APIs and online help is via the following URL on the appliance:

http://${APPLIANCE_HOST}:8080/help/

This page allows you to access the documentation from the browser as well as providing links to the APIs.

Windows

On a Windows PC you can use these download and installation links to get these tools:

https://curl.haxx.se/download.html
https://stedolan.github.io/jq/download/

Linux

Use the relevant package manager for your flavor of Linux, which will either be:

$ apt install curl jq

or

$ yum install curl jq

Mac OS X

On the Mac, the easiest way to install these utilities is using Homebrew:

$ brew install curl jq

Language Pack Codes

LanguageISO Code
Global Englishen
Germande
Spanishes
Frenchfr
Italianit
Dutchnl
Portuguesept
Japaneseja
Koreanko
Danishda
Polishpl
Catalanca
Hindihi
Russianru
Swedishsv
Bulgarianbg
Sloveniansl
Czechcs
Greekel
Finnishfi
Hungarianhu
Croatianhr
Lithuanianlt
Latvianlv
Romanianro
Slovakiansk
Mandarincmn
Norwegianno
Arabicar
Turkishtr
Malayms