This is the documentation for a previous version of our product. Click here to see the latest version.

Batch Virtual Appliance

Overview 3.6.0

The Speechmatics Batch Virtual Appliance exposes a REST Speech API to enable communication between a client application and the appliance over a HTTP or HTTPS connection. This provides the ability to convert a media file into a text transcript, providing words, speaker, and timing information.

Terms

For the purposes of this guide the following terms are used.

TermDescription
ClientAn application connecting to the Batch Virtual Appliance using the Transcription API. The client will provide audio containing speech, and process the transcripts received as a result.
Management APIThe REST API that allows administrators to manage the virtual appliance over port 8080 (or 443 for secure access). To access the documentation you can use the following endpoints: http://${APPLIANCE_HOST}:8080/docs/ or https://${APPLIANCE_HOST}/8080/docs/, where ${APPLIANCE_HOST} is the IP address or hostname of your appliance.
Speech APIThe REST API that allows users of the appliance to submit ASR jobs over port 8082 (or 443 for secure access). The endpoints http://${APPLIANCE_HOST}:8082/v1.0/ or https://${APPLIANCE_HOST}/v1.0/ can be used. This is the API that is described in this document.
Batch Virtual ApplianceThe appliance (VM) that provides ASR transcription capability.

Getting Started

In order to use the REST Speech API you need access to a Batch Virtual Appliance. See the Speechmatics Virtual Appliance Installation and Admin Guide on how to install and configure the appliance.

You do not need user credentials (such as an authorization token) to use the Speech API with the Batch Virtual Appliance. Otherwise the Speech API is very similar to the Speechmatics API V2 used to transcribe speech to text on the Speechmatics Speech as a Service (SaaS) platform (available from https://asr.api.speechmatics.com/v2/).

Audio Formats

A variety of audio formats for input are supported; there is no need to specify the audio format when it is submitted for transcription; the Batch Virtual Appliance automatically detects the format and handles it using the correct decoder. The following formats have been tested: WAV, MP3, M4A.

Note: the native formats are 16KHz or 8KHz (PCM32 LE) WAV; for the best results and performance we recommend that you submit files in that format.

API Versions

The legacy V1 API that the Batch Virtual Appliance currently supports will be discontinued in the near future. We will align the product with the same V2 API used by the Speechmatics SaaS: https://asr.api.speechmatics.com/v2/docs. We recommend that customers familiarise themselves with the configuration object used to specify job configurations, and also prepare to integrate to the V2 API. The configuration object in V2 is a superset of the configuration object in V1, so you can start to adopt it now to minimise required integration changes.

Endpoint for Speech API

The same endpoints are used for both legacy V1 and V2 APIs: http://${APPLIANCE_HOST}:8082/v1.0/, or https://${APPLIANCE_HOST}/v1.0/.

There are three things to note here:

  • The scheme used is http or https (https is recommended for production deployments); if you want to use https then consult the SSL Configuration section of the Install and Admin Guide.
  • The port used is 8082 (http), or 443 (https). As 443 is a default port, you do not need to specify it.
  • The endpoint /v1.0 is used, even when using the newer V2 features.

We recommend that you use the config object to access the appliance; the older V1 API will be retired in future. For full details of the new V2 API, see the Speechmatics ASR REST API section.

Transcription Formats

Three output formats for transcription are available: json (the default), json-v2, txt, and srt. The default format (json) is the base output format similar to the Speechmatics V1 SaaS. The json-v2 format is a richer format that fully supports new features such as channel diarization, custom dictionary and advanced punctuation. The current version of this output format is 2.4. If the output format is set to txt, the file is returned in plain text rather than JSON format. If the output format is set to srt, the file is returned in the SubRip subtitle format instead.

Authentication

No authentication is required in order to call the API.

Troubleshooting

If you have problems making a call, ensure that you are using exactly the same URI format as shown in this document. For instance, not including the trailing '/' character on the URIs will cause a 302 redirect response to be sent – if your client does not handle redirects then this may cause problems.

Tools

Code samples in this guide expect you to use curl for making HTTP requests to the Management API, and the jq tool to parse and display JSON responses.

The easiest way to access the APIs and online help is via the following URL on the appliance:

http://${APPLIANCE_HOST}:8080/help/

This page allows you to access the documentation from the browser as well as providing links to the APIs.

Windows

On a Windows PC you can use these download and installation links to get these tools:

https://curl.haxx.se/download.html
https://stedolan.github.io/jq/download/

Linux

Use the relevant package manager for your flavor of Linux, which will either be:

$ apt install curl jq

or

$ yum install curl jq

Mac OS X

On the Mac, the easiest way to install these utilities is using Homebrew:

$ brew install curl jq

Language Pack Codes

LanguageISO Code
Global Englishen
Germande
Spanishes
Frenchfr
Italianit
Dutchnl
Portuguesept
Japaneseja
Koreanko
Danishda
Polishpl
Catalanca
Hindihi
Russianru
Swedishsv
Bulgarianbg
Sloveniansl
Czechcs
Greekel
Finnishfi
Hungarianhu
Croatianhr
Lithuanianlt
Latvianlv
Romanianro
Slovakiansk
Mandarincmn
Norwegianno
Arabicar
Turkishtr
Malayms