Speechmatics ASR REST API

The Speechmatics' Automatic Speech Recognition (ASR) REST API is used to submit ASR jobs, receive job status and results, and retrieve usage.

Jobs API
Usage API
Object Models

Contact information:

In case of system issues, requests, or unavailability, please contact support@speechmatics.com

Jobs API

Parameters

Name	Located in	Description	Required	Schema
Authorization	header	Customer API token	Yes	string
config	formData	JSON containing a `JobConfig` model indicating the type and parameters for the recognition job.	Yes	string
data_file	formData	The data file to be processed. Alternatively the data file can be fetched from a url specified in `JobConfig`.	No	file

Responses

Code	Description	Schema
201	OK	CreateJobResponse
400	Bad request	ErrorResponse
401	Unauthorized	ErrorResponse
403	Forbidden	ErrorResponse
429	Rate Limited
500	Internal Server Error	ErrorResponse

Summary:

List all jobs.

Parameters

Name	Located in	Description	Required	Schema
Authorization	header	Customer API token	Yes	string

Responses

Code	Description	Schema
200	OK	RetrieveJobsResponse
401	Unauthorized	ErrorResponse
429	Rate Limited
500	Internal Server Error	ErrorResponse

Summary:

Get job details for a specific job, including progress and any error reports.

Parameters

Name	Located in	Description	Required	Schema
Authorization	header	Customer API token	Yes	string
jobid	path	ID of the job.	Yes	string

Responses

Code	Description	Schema
200	OK	RetrieveJobResponse
401	Unauthorized	ErrorResponse
404	Not found	ErrorResponse
410	Job Expired	ErrorResponse
429	Rate Limited
500	Internal Server Error	ErrorResponse

Summary:

Delete a job and remove all associated resources.

Parameters

Name	Located in	Description	Required	Schema
Authorization	header	Customer API token	Yes	string
jobid	path	ID of the job to delete.	Yes	string
force	query	Use `true` to force delete a job that may still be running. Default is `false`.	No	string

Responses

Code	Description	Schema
200	The job that was deleted.	DeleteJobResponse
401	Unauthorized	ErrorResponse
404	Not found	ErrorResponse
410	Job Expired	ErrorResponse
423	Resource Locked	ErrorResponse
429	Rate Limited
500	Internal Server Error	ErrorResponse

Summary:

Get the data file used as input to a job.

Parameters

Name	Located in	Description	Required	Schema
Authorization	header	Customer API token	Yes	string
jobid	path	ID of the job.	Yes	string

Responses

Code	Description	Schema
200	OK	file
401	Unauthorized	ErrorResponse
404	Not found	ErrorResponse
410	Gone	ErrorResponse
429	Rate Limited
500	Internal Server Error	ErrorResponse

Summary:

Get the transcript for a transcription job.

Parameters

Name	Located in	Description	Required	Schema
Authorization	header	Customer API token	Yes	string
jobid	path	ID of the job.	Yes	string
format	query	The transcription format (by default the `json-v2` format is returned).	No	string

Responses

Code	Description	Schema
200	OK	RetrieveTranscriptResponse
401	Unauthorized	ErrorResponse
404	Not found	ErrorResponse
410	Job Expired	ErrorResponse
429	Rate Limited
500	Internal Server Error	ErrorResponse

Usage API

Parameters

Name	Located in	Description	Required	Schema
Authorization	header	Customer API token	Yes	string
since	query	Start date for usage information in ISO 8601 format	No	string
until	query	End date for usage information in ISO 8601 format	No	string

Responses

Code	Description	Schema
200	OK	UsageResponse
401	Unauthorized	ErrorResponse
429	Rate Limited
500	Internal Server Error	ErrorResponse

Object Models

ErrorResponse

Name	Type	Description	Required
code	integer	The HTTP status code.	Yes
error	string	The error message.	Yes
detail	string	The details of the error.	No

TrackingData

Name	Type	Description	Required
title	string	The title of the job.	No
reference	string	External system reference.	No
tags	[string]		No
details	object	Customer-defined JSON structure.	No

DataFetchConfig

Name	Type	Description	Required
url	string		Yes
auth_headers	[string]	A list of additional headers to be added to the input fetch request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token.	No

TranscriptionConfig

Name	Type	Description	Required
language	string	Language pack to process the audio input, normally specified as an ISO language code	Yes
domain	string	Request a specialized language pack optimized for a particular domain, e.g. "finance". Domain is only supported for selected languages.	No
output_locale	string	Language locale to be used when generating the transcription output, normally specified as an ISO language code	No
additional_vocab	[object]	List of custom words or phrases that should be recognized. Alternative pronunciations can be specified to aid recognition.	No
punctuation_overrides	PunctuationConfig	Configuration for punctuation settings. `permitted_marks` defines the punctuation marks which the client is prepared to accept in transcription output, or the special value 'all' (the default). Unsupported marks are ignored. This value is used to guide the transcription process. `sensitivity` ranges between zero and one. Higher values will produce more punctuation. The default is 0.5.	No
diarization	string	Specify whether speaker or channel labels are added to the transcript. The default is `none`. - none: no speaker or channel labels are added. - speaker: speaker attribution is performed based on acoustic matching; all input channels are mixed into a single stream for processing. - channel: multiple input channels are processed individually and collated into a single transcript. - speaker_change: the output indicates when the speaker in the audio changes. No speaker attribution is performed. This is a faster method than speaker. The reported speaker changes may not agree with speaker. - channel_and_speaker_change: both channel and speaker_change are switched on. The speaker change is indicated if more than one speaker are recorded in one channel.	No
speaker_diarization_config	SpeakerDiarizationConfig	Configuration for speaker diarization. Includes `speaker_sensitivity`: Range between 0 and 1. A higher sensitivity will increase the likelihood of more unique speakers returning. For example, if you see fewer speakers returned than expected, you can try increasing the sensitivity value or if too many speakers are returned try reducing this value. The default is 0.5.	No
speaker_change_sensitivity	float	Used for the `speaker change` feature. Range between 0 and 1. Controls how responsive the system is for potential speaker changes. High value indicates high sensitivity. Defaults to 0.4.	No
channel_diarization_labels	[string]	Transcript labels to use when using collating separate input channels.	No
operating_point	string	Specify whether to use a `standard` or `enhanced` model for transription. By default the model used is `standard`	No
enable_entities	Boolean	Specify whether to enable `entity` types within JSON output, as well as additional `spoken_form` and `written_form` metadata. By default `false`	No

PunctuationConfig

Additional configuration for the Advanced Punctuation feature.

Name	Type	Description	Required
permitted_marks	string	Defines the punctuation marks which the client is prepared to accept in transcription output, or the special value 'all' (the default). Unsupported marks are ignored. This value is used to guide the transcription process.	No
sensitivity	float	Ranges between zero and one. Higher values will produce more punctuation. The default is 0.5.	No

SpeakerDiarizationConfig

Additional configuration for the Speaker Diarization feature.

Name	Type	Description	Required
speaker_sensitivity	float	Used for `speaker diarization` feature. Range between 0 and 1. A higher sensitivity will increase the likelihood of more unique speakers returning. For example, if you see fewer speakers returned than expected, you can try increasing the sensitivity value, or if too many speakers are returned try reducing this value. The default is 0.5.	No

NotificationConfig

Name	Type	Description	Required
url	string	The url to which a notification message will be sent upon completion of the job. The job `id` and `status` are added as query parameters, and any combination of the job inputs and outputs can be included by listing them in `contents`. If `contents` is empty, the body of the request will be empty. If only one item is listed, it will be sent as the body of the request with `Content-Type` set to an appropriate value such as `application/octet-stream` or `application/json`. If multiple items are listed they will be sent as named file attachments using the multipart content type. If `contents` is not specified, the `transcript` item will be sent as a file attachment named `data_file`, for backwards compatibility. If the job was rejected or failed during processing, that will be indicated by the status, and any output items that are not available as a result will be omitted. The body formatting rules will still be followed as if all items were available. The user-agent header is set to `Speechmatics-API/2.0`, or `Speechmatics API V2` in older API versions.	Yes
contents	[string]	Specifies a list of items to be attached to the notification message. When multiple items are requested, they are included as named file attachments.	No
method	string	The method to be used with http and https urls. The default is post.	No
auth_headers	[string]	A list of additional headers to be added to the notification request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token.	No

OutputConfig

Name	Type	Description	Required
srt_overrides	object	Parameters that override default values of srt conversion. max_line_length: sets maximum count of characters per subtitle line including white space. max_lines: sets maximum count of lines in a subtitle section.	No

JSON object that contains various groups of job configuration parameters. Based on the value of type, a type-specific object such as transcription_config is required to be present to specify all configuration settings or parameters needed to process the job inputs as expected.

If the results of the job are to be forwarded on completion, notification_config can be provided with a list of callbacks to be made; no assumptions should be made about the order in which they will occur.

Customer specific job details or metadata can be supplied in tracking, and this information will be available where possible in the job results and in callbacks.

Name	Type	Required
type	string	Yes
fetch_data	DataFetchConfig	No
transcription_config	TranscriptionConfig	No
notification_config	NotificationConfig	No
tracking	TrackingData	No
output_config	OutputConfig	No

CreateJobResponse

Name	Type	Description	Required
id	string	The unique ID assigned to the job. Keep a record of this for later retrieval of your completed job.	Yes

JobDetails

Document describing a job. JobConfig will be present in JobDetails returned for GET jobs/ request in the Batch SaaS and in Batch Appliance, but it will not be present in JobDetails returned as item in RetrieveJobsResponse for the Batch Appliance.

Name	Type	Description	Required
created_at	dateTime	The UTC date time the job was created.	Yes
data_name	string	Name of the data file submitted for job.	Yes
duration	integer	The file duration (in seconds). May be missing for fetch URL jobs.	No
errors	array	errors encountered only when either fetching audio or sending notifications to a customer-specified endpoint.	No
id	string	The unique id assigned to the job.	Yes
status	string	The status of the job. `running` - The job is actively running. `done` - The job completed successfully. `rejected` - The job was accepted at first, but later could not be processed by the transcriber. `deleted` - The user deleted the job. * `expired` - The system deleted the job. Usually because the job was in the `done` state for a very long time.	Yes
config	JobConfig		No

RetrieveJobsResponse

Name	Type	Description	Required
jobs	[JobDetails]		Yes

RetrieveJobResponse

Name	Type	Description	Required
job	JobDetails		Yes

DeleteJobResponse

Name	Type	Description	Required
job	JobDetails		Yes

JobInfo

Summary information about an ASR job, to support identification and tracking.

Name	Type	Description	Required
created_at	dateTime	The UTC date time the job was created.	Yes
data_name	string	Name of data file submitted for job.	Yes
duration	integer	The data file audio duration (in seconds).	Yes
id	string	The unique id assigned to the job.	Yes
tracking	TrackingData		No

RecognitionMetadata

Summary information about the output from an ASR job, comprising the job type and configuration parameters used when generating the output.

Name	Type	Description	Required
created_at	dateTime	The UTC date time the transcription output was created.	Yes
type	string		Yes
transcription_config	TranscriptionConfig		No
output_config	OutputConfig		No
language_pack_info	LanguagePackInfo		No

LanguagePackInfo

Properties of the language pack

Name	Type	Description	Required
language_description	string	Full descriptive name of the language, e.g. 'Japanese'	No
word_delimiter	string	The character used to separate words	Yes
writing_direction	string	The direction the words in the language are written and read in. One of `left-to-right` or `right-to-left`	No
itn	boolean	Whether or not ITN (inverse text normalization) is available for the language pack	No
adapted	boolean	Whether or not language model adaptation has been applied to the language pack	No

RecognitionDisplay

Name	Type	Description	Required
direction	string		Yes

RecognitionAlternative

List of possible job output item values, ordered by likelihood.

Name	Type	Required
content	string	Yes
confidence	float	Yes
language	string	Yes
display	RecognitionDisplay	No
speaker	string	No
tags	[string]	No

RecognitionResult

An ASR job output item. The primary item types are word and punctuation. Other item types may be present, for example to provide semantic information of different forms.

Name	Type	Description	Required
channel	string		No
start_time	float		Yes
end_time	float		Yes
entity_class	string	If an entity has been recognised, what type of entity it is. Displayed even if `enable_entities` is false	Yes
spoken_form	array	For `entity` results only, the spoken_form is the transcript of the words directly spoken. Only valid if `enable_entities` is `true`	No
written_form	array	For `entity` results only, the written_form is a standardized form of the spoken words. Only valid if `enable_entities` is `true`	No
is_eos	boolean	Whether the punctuation mark is an end of sentence character. Only applies to punctuation marks.	No
type	string	New types of items may appear without being requested; unrecognized item types can be ignored. Current types are `word`, `punctuation`, `speaker_change`, and `entity`	Yes
attaches_to	string	If `type` is `punctuation`, details the attachment direction of the punctuation mark. This information can be used to produce a well-formed text representation by placing the `word_delimiter` from LanguagePackInfo on the correct side of the punctuation mark. One of `previous`, `next`, `both` or `none`	No
alternatives	[RecognitionAlternative]		No

RetrieveTranscriptResponse

Name	Type	Description	Required
format	string	Speechmatics JSON transcript format version number.	Yes
job	JobInfo		Yes
metadata	RecognitionMetadata		Yes
results	[RecognitionResult]		Yes

UsageResponse

Name	Type	Description	Required
since	string	Start date for usage in ISO 8601 date format.	Yes
until	string	End date for usage in ISO 8601 date format.	Yes
summary	[UsageSummaryResult]	Total usage over all languages and operating points for each combination of SaaS mode and job type. Can be `null`.	Yes
details	[UsageDetailsResult]	Usage for each combination of SaaS mode, job type, language and operating point. Can be `null`.	Yes

UsageSummaryResult

Name	Type	Description	Required
mode	string	SaaS mode: always 'batch'	Yes
type	string	Job type: one of `transcription` or `alignment`	Yes
count	integer	Total number of jobs with this `type` and `mode`	Yes
duration_hrs	float	Total audio file duration in hours for jobs with this `type` and `mode`	Yes

UsageDetailsResult

Name	Type	Description	Required
mode	string	SaaS mode: always 'batch'	Yes
type	string	Job type: one of `transcription` or `alignment`	Yes
language	string	Job language code	Yes
operating_point	string	Operating point for transcription. One of `standard` or `enhanced`.	Yes
count	integer	Total number of jobs with this `type`, `mode`, `language` and `operating_point`	Yes
duration_hrs	float	Total audio file duration in hours for jobs with this `type`, `mode`, `language` and `operating_point`	Yes