/
Batch SaaS
/
API Reference

Speechmatics ASR REST API

The Speechmatics' Automatic Speech Recognition (ASR) REST API is used to submit ASR jobs, receive job status and results, and retrieve usage.

Contact information:

In case of system issues, requests, or unavailability, please contact support@speechmatics.com

Jobs API

Version: 2.8.0

/jobs

POST

Summary:

Create a new job.

Parameters

NameLocated inDescriptionRequiredSchema
AuthorizationheaderCustomer API tokenYesstring
configformDataJSON containing a JobConfig model indicating the type and parameters for the recognition job.Yesstring
data_fileformDataThe data file to be processed. Alternatively the data file can be fetched from a url specified in JobConfig.Nofile

Responses

CodeDescriptionSchema
201OKCreateJobResponse
400Bad requestErrorResponse
401UnauthorizedErrorResponse
403ForbiddenErrorResponse
429Rate Limited
500Internal Server ErrorErrorResponse

GET

Summary:

List all jobs.

Parameters
NameLocated inDescriptionRequiredSchema
AuthorizationheaderCustomer API tokenYesstring
Responses
CodeDescriptionSchema
200OKRetrieveJobsResponse
401UnauthorizedErrorResponse
429Rate Limited
500Internal Server ErrorErrorResponse

/jobs/{jobid}

GET

Summary:

Get job details for a specific job, including progress and any error reports.

Parameters

NameLocated inDescriptionRequiredSchema
AuthorizationheaderCustomer API tokenYesstring
jobidpathID of the job.Yesstring

Responses

CodeDescriptionSchema
200OKRetrieveJobResponse
401UnauthorizedErrorResponse
404Not foundErrorResponse
410Job ExpiredErrorResponse
429Rate Limited
500Internal Server ErrorErrorResponse

DELETE

Summary:

Delete a job and remove all associated resources.

Parameters

NameLocated inDescriptionRequiredSchema
AuthorizationheaderCustomer API tokenYesstring
jobidpathID of the job to delete.Yesstring
forcequeryUse true to force delete a job that may still be running. Default is false.Nostring

Responses

CodeDescriptionSchema
200The job that was deleted.DeleteJobResponse
401UnauthorizedErrorResponse
404Not foundErrorResponse
410Job ExpiredErrorResponse
423Resource LockedErrorResponse
429Rate Limited
500Internal Server ErrorErrorResponse

/jobs/{jobid}/data

GET

Summary:

Get the data file used as input to a job.

Parameters

NameLocated inDescriptionRequiredSchema
AuthorizationheaderCustomer API tokenYesstring
jobidpathID of the job.Yesstring

Responses

CodeDescriptionSchema
200OKfile
401UnauthorizedErrorResponse
404Not foundErrorResponse
410GoneErrorResponse
429Rate Limited
500Internal Server ErrorErrorResponse

/jobs/{jobid}/transcript

GET

Summary:

Get the transcript for a transcription job.

Parameters

NameLocated inDescriptionRequiredSchema
AuthorizationheaderCustomer API tokenYesstring
jobidpathID of the job.Yesstring
formatqueryThe transcription format (by default the json-v2 format is returned).Nostring

Responses

CodeDescriptionSchema
200OKRetrieveTranscriptResponse
401UnauthorizedErrorResponse
404Not foundErrorResponse
410Job ExpiredErrorResponse
429Rate Limited
500Internal Server ErrorErrorResponse

Usage API

/usage

GET

Summary:

Get usage information for an account.

Parameters

NameLocated inDescriptionRequiredSchema
AuthorizationheaderCustomer API tokenYesstring
sincequeryStart date for usage information in ISO 8601 formatNostring
untilqueryEnd date for usage information in ISO 8601 formatNostring

Responses

CodeDescriptionSchema
200OKUsageResponse
401UnauthorizedErrorResponse
429Rate Limited
500Internal Server ErrorErrorResponse

Object Models

ErrorResponse

NameTypeDescriptionRequired
codeintegerThe HTTP status code.Yes
errorstringThe error message.Yes
detailstringThe details of the error.No

TrackingData

NameTypeDescriptionRequired
titlestringThe title of the job.No
referencestringExternal system reference.No
tags[string]No
detailsobjectCustomer-defined JSON structure.No

DataFetchConfig

NameTypeDescriptionRequired
urlstringYes
auth_headers[string]A list of additional headers to be added to the input fetch request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token.No

TranscriptionConfig

NameTypeDescriptionRequired
languagestringLanguage pack to process the audio input, normally specified as an ISO language codeYes
domainstringRequest a specialized language pack optimized for a particular domain, e.g. "finance". Domain is only supported for selected languages.No
output_localestringLanguage locale to be used when generating the transcription output, normally specified as an ISO language codeNo
additional_vocab[object]List of custom words or phrases that should be recognized. Alternative pronunciations can be specified to aid recognition.No
punctuation_overridesPunctuationConfigConfiguration for punctuation settings. permitted_marks defines the punctuation marks which the client is prepared to accept in transcription output, or the special value 'all' (the default). Unsupported marks are ignored. This value is used to guide the transcription process. sensitivity ranges between zero and one. Higher values will produce more punctuation. The default is 0.5.No
diarizationstringSpecify whether speaker or channel labels are added to the transcript. The default is none. - none: no speaker or channel labels are added. - speaker: speaker attribution is performed based on acoustic matching; all input channels are mixed into a single stream for processing. - channel: multiple input channels are processed individually and collated into a single transcript. - speaker_change: the output indicates when the speaker in the audio changes. No speaker attribution is performed. This is a faster method than speaker. The reported speaker changes may not agree with speaker. - channel_and_speaker_change: both channel and speaker_change are switched on. The speaker change is indicated if more than one speaker are recorded in one channel.No
speaker_diarization_configSpeakerDiarizationConfigConfiguration for speaker diarization. Includes speaker_sensitivity: Range between 0 and 1. A higher sensitivity will increase the likelihood of more unique speakers returning. For example, if you see fewer speakers returned than expected, you can try increasing the sensitivity value or if too many speakers are returned try reducing this value. The default is 0.5.No
speaker_change_sensitivityfloatUsed for the speaker change feature. Range between 0 and 1. Controls how responsive the system is for potential speaker changes. High value indicates high sensitivity. Defaults to 0.4.No
channel_diarization_labels[string]Transcript labels to use when using collating separate input channels.No
operating_pointstringSpecify whether to use a standard or enhanced model for transription. By default the model used is standardNo
enable_entitiesBooleanSpecify whether to enable entity types within JSON output, as well as additional spoken_form and written_form metadata. By default falseNo

PunctuationConfig

Additional configuration for the Advanced Punctuation feature.

NameTypeDescriptionRequired
permitted_marksstringDefines the punctuation marks which the client is prepared to accept in transcription output, or the special value 'all' (the default). Unsupported marks are ignored. This value is used to guide the transcription process.No
sensitivityfloatRanges between zero and one. Higher values will produce more punctuation. The default is 0.5.No

SpeakerDiarizationConfig

Additional configuration for the Speaker Diarization feature.

NameTypeDescriptionRequired
speaker_sensitivityfloatUsed for speaker diarization feature. Range between 0 and 1. A higher sensitivity will increase the likelihood of more unique speakers returning. For example, if you see fewer speakers returned than expected, you can try increasing the sensitivity value, or if too many speakers are returned try reducing this value. The default is 0.5.No

NotificationConfig

NameTypeDescriptionRequired
urlstringThe url to which a notification message will be sent upon completion of the job. The job id and status are added as query parameters, and any combination of the job inputs and outputs can be included by listing them in contents. If contents is empty, the body of the request will be empty. If only one item is listed, it will be sent as the body of the request with Content-Type set to an appropriate value such as application/octet-stream or application/json. If multiple items are listed they will be sent as named file attachments using the multipart content type. If contents is not specified, the transcript item will be sent as a file attachment named data_file, for backwards compatibility. If the job was rejected or failed during processing, that will be indicated by the status, and any output items that are not available as a result will be omitted. The body formatting rules will still be followed as if all items were available. The user-agent header is set to Speechmatics-API/2.0, or Speechmatics API V2 in older API versions.Yes
contents[string]Specifies a list of items to be attached to the notification message. When multiple items are requested, they are included as named file attachments.No
methodstringThe method to be used with http and https urls. The default is post.No
auth_headers[string]A list of additional headers to be added to the notification request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token.No

OutputConfig

NameTypeDescriptionRequired
srt_overridesobjectParameters that override default values of srt conversion. max_line_length: sets maximum count of characters per subtitle line including white space. max_lines: sets maximum count of lines in a subtitle section.No

JobConfig

JSON object that contains various groups of job configuration parameters. Based on the value of type, a type-specific object such as transcription_config is required to be present to specify all configuration settings or parameters needed to process the job inputs as expected.

If the results of the job are to be forwarded on completion, notification_config can be provided with a list of callbacks to be made; no assumptions should be made about the order in which they will occur.

Customer specific job details or metadata can be supplied in tracking, and this information will be available where possible in the job results and in callbacks.

NameTypeDescriptionRequired
typestringYes
fetch_dataDataFetchConfigNo
transcription_configTranscriptionConfigNo
notification_configNotificationConfigNo
trackingTrackingDataNo
output_configOutputConfigNo

CreateJobResponse

NameTypeDescriptionRequired
idstringThe unique ID assigned to the job. Keep a record of this for later retrieval of your completed job.Yes

JobDetails

Document describing a job. JobConfig will be present in JobDetails returned for GET jobs/ request in the Batch SaaS and in Batch Appliance, but it will not be present in JobDetails returned as item in RetrieveJobsResponse for the Batch Appliance.

NameTypeDescriptionRequired
created_atdateTimeThe UTC date time the job was created.Yes
data_namestringName of the data file submitted for job.Yes
durationintegerThe file duration (in seconds). May be missing for fetch URL jobs.No
errorsarrayerrors encountered only when either fetching audio or sending notifications to a customer-specified endpoint.No
idstringThe unique id assigned to the job.Yes
statusstringThe status of the job. running - The job is actively running. done - The job completed successfully. rejected - The job was accepted at first, but later could not be processed by the transcriber. deleted - The user deleted the job. * expired - The system deleted the job. Usually because the job was in the done state for a very long time.Yes
configJobConfigNo

RetrieveJobsResponse

NameTypeDescriptionRequired
jobs[JobDetails]Yes

RetrieveJobResponse

NameTypeDescriptionRequired
jobJobDetailsYes

DeleteJobResponse

NameTypeDescriptionRequired
jobJobDetailsYes

JobInfo

Summary information about an ASR job, to support identification and tracking.

NameTypeDescriptionRequired
created_atdateTimeThe UTC date time the job was created.Yes
data_namestringName of data file submitted for job.Yes
durationintegerThe data file audio duration (in seconds).Yes
idstringThe unique id assigned to the job.Yes
trackingTrackingDataNo

RecognitionMetadata

Summary information about the output from an ASR job, comprising the job type and configuration parameters used when generating the output.

NameTypeDescriptionRequired
created_atdateTimeThe UTC date time the transcription output was created.Yes
typestringYes
transcription_configTranscriptionConfigNo
output_configOutputConfigNo
language_pack_infoLanguagePackInfoNo

LanguagePackInfo

Properties of the language pack

NameTypeDescriptionRequired
language_descriptionstringFull descriptive name of the language, e.g. 'Japanese'No
word_delimiterstringThe character used to separate wordsYes
writing_directionstringThe direction the words in the language are written and read in. One of left-to-right or right-to-leftNo
itnbooleanWhether or not ITN (inverse text normalization) is available for the language packNo
adaptedbooleanWhether or not language model adaptation has been applied to the language packNo

RecognitionDisplay

NameTypeDescriptionRequired
directionstringYes

RecognitionAlternative

List of possible job output item values, ordered by likelihood.

NameTypeDescriptionRequired
contentstringYes
confidencefloatYes
languagestringYes
displayRecognitionDisplayNo
speakerstringNo
tags[string]No

RecognitionResult

An ASR job output item. The primary item types are word and punctuation. Other item types may be present, for example to provide semantic information of different forms.

NameTypeDescriptionRequired
channelstringNo
start_timefloatYes
end_timefloatYes
entity_classstringIf an entity has been recognised, what type of entity it is. Displayed even if enable_entities is falseYes
spoken_formarrayFor entity results only, the spoken_form is the transcript of the words directly spoken. Only valid if enable_entities is trueNo
written_formarrayFor entity results only, the written_form is a standardized form of the spoken words. Only valid if enable_entities is trueNo
is_eosbooleanWhether the punctuation mark is an end of sentence character. Only applies to punctuation marks.No
typestringNew types of items may appear without being requested; unrecognized item types can be ignored. Current types are word, punctuation, speaker_change, and entityYes
attaches_tostringIf type is punctuation, details the attachment direction of the punctuation mark. This information can be used to produce a well-formed text representation by placing the word_delimiter from LanguagePackInfo on the correct side of the punctuation mark. One of previous, next, both or noneNo
alternatives[RecognitionAlternative]No

RetrieveTranscriptResponse

NameTypeDescriptionRequired
formatstringSpeechmatics JSON transcript format version number.Yes
jobJobInfoYes
metadataRecognitionMetadataYes
results[RecognitionResult]Yes

UsageResponse

NameTypeDescriptionRequired
sincestringStart date for usage in ISO 8601 date format.Yes
untilstringEnd date for usage in ISO 8601 date format.Yes
summary[UsageSummaryResult]Total usage over all languages and operating points for each combination of SaaS mode and job type. Can be null.Yes
details[UsageDetailsResult]Usage for each combination of SaaS mode, job type, language and operating point. Can be null.Yes

UsageSummaryResult

NameTypeDescriptionRequired
modestringSaaS mode: always 'batch'Yes
typestringJob type: one of transcription or alignmentYes
countintegerTotal number of jobs with this type and modeYes
duration_hrsfloatTotal audio file duration in hours for jobs with this type and modeYes

UsageDetailsResult

NameTypeDescriptionRequired
modestringSaaS mode: always 'batch'Yes
typestringJob type: one of transcription or alignmentYes
languagestringJob language codeYes
operating_pointstringOperating point for transcription. One of standard or enhanced.Yes
countintegerTotal number of jobs with this type, mode, language and operating_pointYes
duration_hrsfloatTotal audio file duration in hours for jobs with this type, mode, language and operating_pointYes