Entities are commonly recognisable classes of information that appear in languages, for example numbers and dates. Formatting these entities is commonly referred to as Inverse Text Normalisation (ITN). Speechmatics will output entities in a predictable, consistent written form, reducing post-processing work required aiming to make the transcript more readable.
The language pack will use these formatted entities by default in the transcription for all outputs (JSON, text and srt). Additional metadata about these entities can be requested via the API including the spoken words without formatting and the entity class that was used to format it.
Entities are supported in the following languages:
Speechmatics now includes an enable_entities
parameter. This can be requested via the API. By default this is false
.
Changing enable_entities
to true
will enable a richer set of metadata in the JSON output only. Customers can choose between the default written form, spoken form, or a mixture, for their own workflows.
The changes are as following:
type
- entity
in the JSON output in addition to word
and punctuation
. For example: "1.99" would have a type
of entity
and a corresponding entity_class
of decimal
entity
will contain the formatted text in the content
section, like other words and punctuationcontent
can include spaces, non-breaking spaces, and symbols (e.g. $/£/%)entity_class
has been introduced. This provides more detail about how the entity has been formatted. A full list of entity classes is provided below.spoken_form
- Each individual word
within the entity, written out in words as it was spoken. Each individual word has its own start time, end time, and confidence score. For example: "one", "million", "dollars"written_form
- The same output as within entity
content, with a type of word
instead. If there are spaces in the content it will be split into individual words. For example: "$1", "million"Please see an example configuration file that would request entities:
{
"type": "transcription",
"transcription_config": {
"language": "en",
"enable_entities": true
}
}
The following entity_classes
can be returned. Entity classes indicate how the numerals are formatted. In some cases, the choice of class can be contextual and the class may not be what was expected (for example "2001" may be a "cardinal" instead of "date"). The number of entity_classes
may grow or shrink in the future.
N.B. Please note existing behaviour for English where numbers from zero to 10 (excluding where they are output as a decimal/money/percentage) are output as words is unchanged.
Entity Class | Formatting Behaviour | Spoken Word Form Example | Written Form Example |
---|---|---|---|
alphanum | A series of three or more alphanumerics, where an alphanumeric is a digit less than 10, a character or symbol | triple seven five four | 77754 |
cardinal | Any number greater than ten is converted to numbers. Numbers ten or below remain as words. Includes negative numbers | nineteen | 19 |
credit card | A long series of spoken digits less than 10 are converted to numbers. Support for common credit cards | one one one one two two two two three three three three four four four four | 1111222233334444 |
date | Day, month and year, or a year on its own. Any words spoken in the date are maintained (including "the" and "of") | fifteenth of January twenty twenty two | 15th of January 2022 |
decimal | A series of numbers divided by a separator | eighteen point one two | 18.12 |
fraction | Small fractions are kept as words ("half"), complex fractions are converted to numbers separated by "/" | three sixteenths | 3/16 |
money | Currency words are converted to symbols before or after the number (depending on the language) | twenty dollars | $20 |
ordinal | Ordinals greater than 10 are output as numbers | forty second | 42nd |
percentage | Numbers with a per cent have the per cent converted to a % symbol | duecento percento | 200% |
span | A range expressed as "x to y" where x and y correspond to another entity class | one hundred to two hundred million pounds | 100 to £200 million |
time | Times are converted to numbers | eleven forty a m | 11:40 a.m. |
word | Entities that do not match a specific class | hundreds | hundreds |
Each language has a specific style applied to it for thousands, decimals and where the symbol is positioned for money or percentages.
For example
Here is an example of a transcript requested with enable_entities
set to true:
entity
that is "17th of January 2022", including spacesentity_class
of date
spoken_form
is split into the following individual words: "seventeenth", "of", "January", "twenty", "twenty", "two". Each word has its own start and end timewritten_form
split into the following individual words: "17th", "of", "January", "2022". Each word has its own start and end time Note:
speaker
parameter is added per word within the entity, spoken and written formchannel
parameter is only added on the results
parent within the entity and not included in spoken and written form "results": [
{
"alternatives": [
{
"confidence": 0.99,
"content": "17th of January 2022",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.14,
"entity_class": "date",
"spoken_form": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "seventeenth",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.41,
"start_time": 0.72,
"type": "word"
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "of",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.53,
"start_time": 1.41,
"type": "word"
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "January",
"language": "en",
"speaker": "UU"
}
],
"end_time": 2.04,
"start_time": 1.53,
"type": "word"
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "twenty",
"language": "en",
"speaker": "UU"
}
],
"end_time": 2.46,
"start_time": 2.04,
"type": "word"
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "twenty",
"language": "en",
"speaker": "UU"
}
],
"end_time": 2.79,
"start_time": 2.46,
"type": "word"
},
{
"alternatives": [
{
"confidence": 0.97,
"content": "two",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.14,
"start_time": 2.79,
"type": "word"
}
],
"start_time": 0.72,
"type": "entity",
"written_form": [
{
"alternatives": [
{
"confidence": 0.99,
"content": "17th",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.33,
"start_time": 0.72,
"type": "word"
},
{
"alternatives": [
{
"confidence": 0.99,
"content": "of",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.93,
"start_time": 1.33,
"type": "word"
},
{
"alternatives": [
{
"confidence": 0.99,
"content": "January",
"language": "en",
"speaker": "UU"
}
],
"end_time": 2.54,
"start_time": 1.93,
"type": "word"
},
{
"alternatives": [
{
"confidence": 0.99,
"content": "2022",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.14,
"start_time": 2.54,
"type": "word"
}
]
}
]
If enable_entities
is set to false
, the output is as below:
"results": [
{
"alternatives": [
{
"confidence": 0.99,
"content": "17th",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.33,
"start_time": 0.72,
"type": "word"
},
{
"alternatives": [
{
"confidence": 0.99,
"content": "of",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.93,
"start_time": 1.33,
"type": "word"
},
{
"alternatives": [
{
"confidence": 0.99,
"content": "January",
"language": "en",
"speaker": "UU"
}
],
"end_time": 2.54,
"start_time": 1.93,
"type": "word"
},
{
"alternatives": [
{
"confidence": 0.99,
"content": "2022",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.14,
"start_time": 2.54,
"type": "word"
}
]
}