This is the documentation for a previous version of our product. Click here to see the latest version.

Formatting Common Entities

Overview

Entities are commonly recognisable classes of information that appear in languages, for example numbers and dates. Formatting these entities is commonly referred to as Inverse Text Normalisation (ITN). Speechmatics will output entities in a predictable, consistent written form, reducing post-processing work required aiming to make the transcript more readable.

The language pack will use these formatted entities by default in the transcription for all outputs (JSON, text and srt). Additional metadata about these entities can be requested via the API including the spoken words without formatting and the entity class that was used to format it.

Supported Languages

Entities are supported in the following languages:

Cantonese
Chinese Mandarin (Simplified and Traditional)
English
French
German
Hindi
Italian
Japanese
Portuguese
Russian
Spanish

Using the enable_entities parameter

Speechmatics now includes an enable_entities parameter. This can be requested via the API. By default this is false.

Changing enable_entities to true will enable a richer set of metadata in the JSON output only. Customers can choose between the default written form, spoken form, or a mixture, for their own workflows.

The changes are as following:

A new type - entity in the JSON output in addition to word and punctuation. For example: "1.99" would have a type of entity and a corresponding entity_class of decimal
The entity will contain the formatted text in the content section, like other words and punctuation
- The content can include spaces, non-breaking spaces, and symbols (e.g. $/£/%)
A new output element, entity_class has been introduced. This provides more detail about how the entity has been formatted. A full list of entity classes is provided below.
The start and end time of the entity will span all the words that make up that entity
The entity also contains two ways that the content will be output:
- spoken_form - Each individual word within the entity, written out in words as it was spoken. Each individual word has its own start time, end time, and confidence score. For example: "one", "million", "dollars"
- written_form - The same output as within entity content, with a type of word instead. If there are spaces in the content it will be split into individual words. For example: "$1", "million"

Configuration example

Please see an example configuration file that would request entities:

{
  "type": "transcription",
  "transcription_config": {
        "language": "en",
        "enable_entities": true
    }
}

Different entity classes

The following entity_classes can be returned. Entity classes indicate how the numerals are formatted. In some cases, the choice of class can be contextual and the class may not be what was expected (for example "2001" may be a "cardinal" instead of "date"). The number of entity_classes may grow or shrink in the future.

N.B. Please note existing behaviour for English where numbers from zero to 10 (excluding where they are output as a decimal/money/percentage) are output as words is unchanged.

Entity Class	Formatting Behaviour	Spoken Word Form Example	Written Form Example
alphanum	A series of three or more alphanumerics, where an alphanumeric is a digit less than 10, a character or symbol	triple seven five four	77754
cardinal	Any number greater than ten is converted to numbers. Numbers ten or below remain as words. Includes negative numbers	nineteen	19
credit card	A long series of spoken digits less than 10 are converted to numbers. Support for common credit cards	one one one one two two two two three three three three four four four four	1111222233334444
date	Day, month and year, or a year on its own. Any words spoken in the date are maintained (including "the" and "of")	fifteenth of January twenty twenty two	15th of January 2022
decimal	A series of numbers divided by a separator	eighteen point one two	18.12
fraction	Small fractions are kept as words ("half"), complex fractions are converted to numbers separated by "/"	three sixteenths	3/16
money	Currency words are converted to symbols before or after the number (depending on the language)	twenty dollars	$20
ordinal	Ordinals greater than 10 are output as numbers	forty second	42nd
percentage	Numbers with a per cent have the per cent converted to a % symbol	duecento percento	200%
span	A range expressed as "x to y" where x and y correspond to another entity class	one hundred to two hundred million pounds	100 to £200 million
time	Times are converted to numbers	eleven forty a m	11:40 a.m.
word	Entities that do not match a specific class	hundreds	hundreds

Output locale styling

Each language has a specific style applied to it for thousands, decimals and where the symbol is positioned for money or percentages.

For example

English contains commas as separators for numbers above 9999 (example: "20,000"), the money symbol at the start (example: "$10") and full stops for decimals (example: "10.5")
German contains full stops as separators for numbers above 9999 (example: "20.000"), the money symbol comes after with a non-breaking space (example: "10 $") and commas for decimals (example: "10,5")
French contains non-breaking spaces as separators for numbers above 9999 (example: "20 000"), the money symbol comes after with a non-breaking space (example: "10 $") and commas for decimals (example: "10,5")

Example output

Here is an example of a transcript requested with enable_entities set to true:

An entity that is "17th of January 2022", including spaces
- The start and end times span the entire entity
- An entity_class of date
- The spoken_form is split into the following individual words: "seventeenth", "of", "January", "twenty", "twenty", "two". Each word has its own start and end time
- the written_form split into the following individual words: "17th", "of", "January", "2022". Each word has its own start and end time

Note:

By default and when speaker diarization is enabled, speaker parameter is added per word within the entity, spoken and written form
When channel diarization is enabled, channel parameter is only added on the results parent within the entity and not included in spoken and written form

   "results": [
    {
      "alternatives": [
        {
          "confidence": 0.99,
          "content": "17th of January 2022",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.14,
      "entity_class": "date",
      "spoken_form": [
        {
          "alternatives": [
            {
              "confidence": 1.0,
              "content": "seventeenth",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 1.41,
          "start_time": 0.72,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 1.0,
              "content": "of",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 1.53,
          "start_time": 1.41,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 1.0,
              "content": "January",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 2.04,
          "start_time": 1.53,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 1.0,
              "content": "twenty",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 2.46,
          "start_time": 2.04,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 1.0,
              "content": "twenty",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 2.79,
          "start_time": 2.46,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 0.97,
              "content": "two",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 3.14,
          "start_time": 2.79,
          "type": "word"
        }
      ],
      "start_time": 0.72,
      "type": "entity",
      "written_form": [
        {
          "alternatives": [
            {
              "confidence": 0.99,
              "content": "17th",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 1.33,
          "start_time": 0.72,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 0.99,
              "content": "of",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 1.93,
          "start_time": 1.33,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 0.99,
              "content": "January",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 2.54,
          "start_time": 1.93,
          "type": "word"
        },
        {
          "alternatives": [
            {
              "confidence": 0.99,
              "content": "2022",
              "language": "en",
              "speaker": "UU"
            }
          ],
          "end_time": 3.14,
          "start_time": 2.54,
          "type": "word"
        }
      ]
    }
  ]

If enable_entities is set to false, the output is as below:

  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.99,
          "content": "17th",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 1.33,
      "start_time": 0.72,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 0.99,
          "content": "of",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 1.93,
      "start_time": 1.33,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 0.99,
          "content": "January",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 2.54,
      "start_time": 1.93,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 0.99,
          "content": "2022",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.14,
      "start_time": 2.54,
      "type": "word"
    }
  ]
}

Batch Container

API Guide

Batch Container

Migration - Technical Guide