Alignment allows the user to submit an audio file and a text file, and get back the speech timing information. This allows users to determine when exactly a given word was spoken in the context of the supplied audio file.
If you do not have access to use the alignment feature, and you would like to, please speak to your Account Manager.
The following documentation will show you how to request alignment, and how to retrieve an aligned file.
The following audio formats are supported:
Alignment corresponds to Speechmatics' Batch SaaS policy. All files are stored for seven days, after which point they are deleted. Files can be deleted earlier by explicitly requesting so. How to do so is documented below.
The input text file must be UTF-8 encoded plain text file. Characters outside this format will mean the job is rejected.
During the alignment process, Speechmatics tries to extract words from the text.
Any string of characters separated by whitespace (space, tab, newline, etc.) is considered as a word.
Any markup in the text file, with SGML-like tags with angled-brackets is considered as comments.
For example, text within the comment delimiters (<!--
, -->
) or angle brackets (<
, >
) is ignored.
Therefore, given this text:
Hello <markup> world <!-- comment > comment --> how are you?
The following words will be aligned with the provided audio file:
Hello world how are you?
The timing information (termed as alignment files) are available in two formats:
word_start_and_end
): This is the default format:<time=0.12>Hello<time=0.23> <markup> <time=0.34>world<time=0.45> <!-- comment > comment -->
<time=0.56>how<time=0.67> <time=0.78>are<time=0.89> <time=0.90>you?<time=1.00>
one_per_line
). This must be specified when you request the transcript via HTTP request. [00:00:00.1] Hello <markup> world <!-- comment > comment --> how are you?
Alignment uses the Speechmatics Batch SaaS platform which supports the following endpoints for production use:
Region | Environment | Endpoints |
---|---|---|
EU | EU1 | eu.asr.api.speechmatics.com eu1.asr.api.speechmatics.com asr.api.speechmatics.com |
EU | EU2 | eu2.asr.api.speechmatics.com |
US | US1 | us.asr.api.speechmatics.com us1.asr.api.speechmatics.com |
US | US2 | us2.asr.api.speechmatics.com |
Authorization Tokens are replicated between all environments in the same region. Therefore, you can use any environment in a region that you are entitled to access.
All production environments are active and highly available. Multiple environments can be used to balance requests or provide a failover in the event of disruption to one environment.
Note that jobs are created in the environment corresponding to the endpoint used. You must use the same endpoint for all requests relating to a specific job.
If you attempt to use an endpoint for a region you are not contracted to use, that request will be unsuccessful. If you want to use a different region, please contact sales@speechmatics.com.
If you wish to receive an aligned transcript via notification, you must allow access to (allowlist) the relevant IPs below to ensure successful delivery:
Regions | IP Addresses |
---|---|
EU1 | 40.74.41.91, 52.236.157.154, 40.74.37.0, 20.73.209.153, 20.73.142.44 |
EU2 | 20.105.89.153, 20.105.89.173, 20.105.89.184, 20.105.89.98, 20.105.88.228 |
US1 | 52.149.21.32, 52.149.21.10, 52.137.102.83, 40.64.107.92, 40.64.107.99 |
US2 | 52.146.58.224, 52.146.58.159, 52.146.59.242, 52.146.59.213, 52.146.58.64 |
Creating an alignment job is similar in process to transcription job.
An HTTP POST request must be made to /v2/jobs
endpoint with following form fields:
config
: The job config for alignmentdata_file
: The media file containing the speech. Can be passed in via config
if the file is stored in an online locationtext_file
: The text file containing the transcript. Can be passed in via config
if the file is stored in an online locationIf you do not provide all of the above the job will be rejected.
The job config must state that the job type is alignment and the language of the audio and text.
{
"type": "alignment",
"alignment_config": {
"language": "en"
}
}
The corresponding curl request looks like so:
curl https://asr.api.speechmatics.com/v2/jobs/ \
-X POST \
-H "Authorization: Bearer <TOKEN>" \
-F data_file=@/tmp/file.mp3 \
-F text_file=@/tmp/speech.txt \
-F config='{"type": "alignment", "alignment_config": { "language": "en" }}'
A successful request will return a HTTP 201 response, and will contain a unique alphanumeric Job ID, which will be returned as id in the HTTP response.
HTTP/2 201
date: Mon, 11 Oct 2021 16:45:44 GMT
content-type: application/vnd.speechmatics.v2+json
content-length: 20
strict-transport-security: max-age=15724800; includeSubDomains
request-id: 802b2603d62d23b5bb113836ec0a8d21
{"id":"r0btay8pxr"}
You can retrieve the status of an alignment job by making a HTTP GET request that includes the Job ID in the request endpoint.
An example is below:
curl https://asr.api.speechmatics.com/v2/jobs/<JOB_ID> \
-X GET \
-H "Authorization: Bearer <TOKEN>"
An example response is below:
{
"job": {
"config": {
"alignment_config": {
"language": "en"
},
"type": "alignment"
},
"created_at": "2021-09-24T10:51:13.641Z",
"data_name": "63f662ce-4b82-4471-b0e0-380abb83f666.m4a",
"duration": 281,
"id": "g0sjrmiqng",
"status": "done"
}
}
The status will be one of the following:
Done
: The file is ready to be retrievedRunning
: The file is still being processed and not yet readyRejected
: The job has failedIf you have submitted multiple jobs, you can retrieve a list of the 100 most recent jobs submitted in the past 7 days by making a GET request without a Job ID. If a job has been deleted it will not be included in the list.
An example is below:
curl https://asr.api.speechmatics.com/v2/jobs/ \
-X GET \
-H "Authorization: Bearer <TOKEN>"
An aligned file can be retrieved from the /v2/jobs/<JOB_ID>/alignment
endpoint.
By default, the 'Word Start and End' alignment format is returned.
This can be overridden with the query parameter tags
in the HTTP GET request as illustrated below:
curl https://asr.api.speechmatics.com/v2/jobs/<JOB_ID>/alignment?tags=one_per_line \
-X GET \
-H "Authorization: Bearer <TOKEN>"
Use the following endpoints to retrieve the inputs files used for an alignment job:
/v2/jobs/<JOB_ID>/text
: to get the text file submitted/v2/jobs/<JOB_ID>/data
: to get the audio file submittedIf you want to delete a submitted job you can do so via sending a HTTP DELETE request specifying the Job ID. All files, including aligned files, will be deleted from the Speechmatics SaaS.
curl https://asr.api.speechmatics.com/v2/jobs/<JOB_ID> \
-X DELETE \
-H "Authorization: Bearer <TOKEN>"
The response will show a status
of deleted
as shown below:
{
"job": {
"config": {
"alignment_config": {
"language": "en"
},
"type": "alignment"
},
"created_at": "2021-09-24T10:51:13.641Z",
"data_name": "63f662ce-4b82-4471-b0e0-380abb83f666.m4a",
"duration": 281,
"id": "g0sjrmiqng",
"status": "deleted"
}
}
Speechmatics supports retrieving files from an online location. If you store your digital media and transcripts in cloud storage (for example AWS S3 or Azure Blob Storage) you can also submit a job by providing the URL of the audio file or transcript.
To retrieve files from an online location, you must specify the location for the media and/or transcript in the configuration of your request. You can locally upload a media file and retrieve a text file from an online location (or vice versa):
{
"type": "alignment",
"fetch_data":{"url":"$MY_AUDIO_URL"},
"fetch_text":{"url":"$MY_TRANSCRIPT"},
"alignment_config": { "language": "en" }
}
You should not use fetch_data
or fetch_text
with locally uploaded files simultaneously, as this will cause the job to fail.
Alignment jobs can also be used with callback notifications by including the notification_config
section in the job config when submitting the job. Please ensure you have allowlisted Speechmatics' egress IPs to allow notifications.
{
"type": "alignment",
"alignment_config": {
"language": "en"
},
"notification_config": [
{
"contents": [
"alignment"
],
"url": "https://lorem.ipsum/"
},
{
"contents": [
"alignment.one_per_line", "text"
],
"method": "post",
"url": "https://dolor.sit.amet/"
}
]
}
The following outputs are supported:
alignment
, alignment.one_per_line
, alignment.word_start_and_end
: the Aligned transcripttext
: the non-aligned transcript submitted as part of the job requestdata
: the media file submitted as part of the job requestjobinfo
: the summary information about the job, to support identification and tracking