Skip to contentSkip to navigationSkip to topbar
On this page

TwiML™ Voice: <Transcription>


(warning)

Real-Time Transcriptions, including the <Transcriptions> TwiML noun and API, use artificial intelligence or machine learning technologies. By enabling or using any of the features or functionalities within Programmable Voice that are identified as using artificial intelligence or machine learning technology, you acknowledge and agree that your use of these features or functionalities is subject to the terms of the Predictive and Generative AI/ML Features Addendum(link takes you to an external page).

Real-Time Transcriptions is not PCI compliant or a HIPAA Eligible Service and should not be used in Voice Intelligence workflows that are subject to HIPAA or PCI.

Real-Time Transcription is currently available as a Public Beta product and information contained in this document is subject to change. This means that some of the features are not yet implemented and others may be changed before the product is declared as Generally Available. Public Beta products are not covered by a Twilio Service Level Agreement.

The <Transcription> TwiML noun allows you to transcribe live calls in near real-time. It is used in conjunction with <Start>. When Twilio executes the <Start><Transcription> instruction during a call, it forks the raw audio stream to a speech-to-text transcription engine that can provide streaming responses almost instantly.

This page covers <Transcription>'s supported attributes and provides sample code.

(information)

Other Transcriptions at Twilio

Please note that the <Transcription> TwiML noun is associated with Twilio's Real-Time Transcriptions product. It is not to be confused with Recording Transcriptions(link takes you to an external page).

For Public Beta, Real-Time Transcriptions will not be stored on Twilio, so consumers of <Transcription> should plan to leverage the statusCallbackUrl accordingly if transcript storage is required.

Below is a basic example of <Start><Transcription>:

1
<Start>
2
<Transcription statusCallbackUrl="https://example.com/your-callback-url"/>
3
</Start>

Noun attributes

noun-attributes page anchor

The table below lists <Transcription>'s supported attributes, which modify the <Transcription> behavior. All attributes are optional.

Attribute NameAllowed ValuesDefault Value
nameUnique name for the Real-Time Transcriptionnone
statusCallbackUrlValid relative or absolute URLnone
languageCodeA BCP-47 standard code(link takes you to an external page) (e.g. "en-US")en-US
trackinbound_track, outbound_track, both_tracksboth_tracks
inboundTrackLabelAn alphanumeric label to associate to the inbound track being transcribednone
outboundTrackLabelAn alphanumeric label to associate to the outbound track being transcribednone
transcriptionEngineName of speech-to-text transcription provider. Valid values are: googlegoogle
speechModel(Google only) Any speechModel valuetelephony
profanityFilter(Google only) true or falsetrue
partialResults(Google only) true or falsefalse
hints(Google only) Comma-separated list of expected phrases or keywords for recognitionNone
enableAutomaticPunctuation(Google only) true or falsetrue

name

name page anchor

The user-specified name of this Real-Time Transcription. This name can be used to stop the Real-Time Transcription.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', name: 'Contact center transcription'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" name="Contact center transcription" />
5
</Start>
6
</Response>

The statusCallbackUrl attribute is the relative or absolute URL of an endpoint. Twilio sends Real-Time Transcription status updates and the call's transcript data to this URL.

Twilio sends a POST request to this URL whenever one of the following occurs:

  • A Real-Time Transcription session starts. This is called the transcription-started event.
  • Utterances (partial or final) of transcribed audio is available. This is called the transcription-content event.
  • A Real-Time Transcription session stops. This is called the transcription-stopped event. This event occurs when a Real-Time Transcription session is stopped via API or TwiML, or when the call ends.
  • An error occurs. This is called the transcription-error event.
1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url"/>
5
</Start>
6
</Response>

The transcription-started event

the-transcription-started-event page anchor

When a Real-Time Transcription is started and a session is created, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-started event. This event provides initial details about the transcription session.

These HTTP requests contain the properties listed below.

PropertyDescriptionExample
AccountSidTwilio Account SIDAC11b76cdc7d217e72a72be6422d46a7ca
CallSidTwilio Call SIDCA57af2620f427810cb4e430371e8d6e0f
TranscriptionSidUnique identifier for this Real-Time Transcription sessionGT20dfa03c8cf8aa8d0c4aeccde5558b66
TimestampTime of the event in UTC ISO 8601 timestamp2023-10-19T22:33:22.611Z
SequenceIdInteger sequence number of the event1
TranscriptionEventThe event typetranscription-started
ProviderConfigurationJSON string of the transcription provider{\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"}
TranscriptionEngineThe name of the transcription enginegoogle
NameFriendly name of the Real-Time Transcription sessionsession1
TrackThe track being transcribed: inbound_track, outbound_track, or both_tracksinbound_track
InboundTrackLabelLabel associated with the inbound trackcustomer
OutboundTrackLabelLabel associated with the outbound trackagent
PartialResultsWhether partial results are enabled (true or false)true
LanguageCodeThe language code for the transcriptionen-US

Example of a transcription-started event payload:

1
{
2
"TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
3
"Timestamp": "2024-06-25T18:45:12.135751Z",
4
"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5
"ProviderConfiguration": "{\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"}",
6
"Name": "Chris Transcription",
7
"OutboundTrackLabel": "agent",
8
"LanguageCode": "en-US",
9
"PartialResults": "false",
10
"InboundTrackLabel": "customer",
11
"TranscriptionEvent": "transcription-started",
12
"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
13
"TranscriptionEngine": "google",
14
"Track": "both_tracks",
15
"SequenceId": "1"
16
}

The transcription-content event

the-transcription-content-event page anchor

When an individual utterance (partial or final) of audio is transcribed, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-content event. This event provides TranscriptionData results for the transcribed audio.

(information)

Stability and Confidence

Stability(link takes you to an external page) and Confidence(link takes you to an external page) depend on partialResults. For example, if partialResults is true, then the stability property will be included in the event payload, and confidence will not. However, if partialResults is false, the opposite will be true. Always refer to Google's specific documentation (examples(link takes you to an external page)) for more details on each of these properties.

These HTTP requests contain the properties listed below.

PropertyDescriptionExample
AccountSidTwilio Account SIDAC11b76cdc7d217e72a72be6422d46a7ca
CallSidTwilio Call SIDCA57af2620f427810cb4e430371e8d6e0f
TranscriptionSidUnique identifier for this Real-Time Transcription sessionGT20dfa03c8cf8aa8d0c4aeccde5558b66
TimestampTime of the event in UTC ISO 8601 timestamp2023-10-19T22:33:22.611Z
SequenceIdInteger sequence number of the event2
TranscriptionEventThe event typetranscription-content
LanguageCodeA BCP-47 standard language code (e.g. "en-US")en-US
TrackThe track being transcribed: inbound_track or outbound_trackinbound_track
TranscriptionDataJSON string containing transcription content. Note that TranscriptionData.Confidence is a decimal number.{\"Transcript\":\"to be or not to be\",\"Confidence\":0.96823084}
StabilityString representing estimate of the likelihood Google will not change the guess it made about this partial result transcript. This property is only provided when partialResults is true.Range between 0.0 (unstable) and 1.0 (stable). Example: 0.8
FinalBoolean value indicating whether this event contains the final utterance (or partial utterance)false

Example of a transcription-content event payload when partialResults is equal to false:

1
{
2
"LanguageCode": "en-US",
3
"TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
4
"TranscriptionEvent": "transcription-content",
5
"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
6
"TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for quality purposes. How can I assist you today?\",\"confidence\":0.9956335}",
7
"Timestamp": "2024-06-25T18:45:21.454203Z",
8
"Final": "true",
9
"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
10
"Track": "outbound_track",
11
"SequenceId": "2"
12
}

Example of a transcription-content event payload when partialResults is equal to true:

1
{
2
"LanguageCode": "en-US",
3
"TranscriptionSid": "GT6ebb54a123f0c86b70605a4925836f69",
4
"Stability": "0.9",
5
"TranscriptionEvent": "transcription-content",
6
"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
7
"TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for\"}",
8
"Timestamp": "2024-06-25T16:30:21.600697Z",
9
"Final": "false",
10
"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
11
"Track": "outbound_track",
12
"SequenceId": "70"
13
}

The transcription-stopped event

the-transcription-stopped-event page anchor

When a Real-Time Transcription session is stopped or ends, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-stopped event. This event provides final details about the transcription session.

These HTTP requests contain the properties listed below.

PropertyDescriptionExample
AccountSidTwilio Account SIDAC11b76cdc7d217e72a72be6422d46a7ca
CallSidTwilio Call SIDCA57af2620f427810cb4e430371e8d6e0f
TranscriptionSidUnique identifier for this Real-Time Transcription sessionGT20dfa03c8cf8aa8d0c4aeccde5558b66
TimestampTime of the event, in UTC ISO 8601 format2023-10-19T22:33:22.611Z
SequenceIdInteger sequence number of the event3
TranscriptionEventThe event typetranscription-stopped

An example of the transcription-stopped event payload:

1
{
2
"TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
3
"TranscriptionEvent": "transcription-stopped",
4
"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5
"Timestamp": "2024-06-25T18:45:23.839266Z",
6
"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
7
"SequenceId": "3"
8
}

The transcription-error event

the-transcription-error-event page anchor

When an error occurs during a Real-Time Transcription session, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-error event.

(information)

Error Documentation

Documentation on Real-Time Transcription errors can be found on the Error and Warning Dictionary(link takes you to an external page) and range from 32650-32655. Errors are also viewable in the Twilio Console(link takes you to an external page).

These HTTP requests contain the properties listed below.

PropertyDescriptionExample
AccountSidTwilio Account SIDACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
CallSidTwilio Call SIDCAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
TranscriptionSidUnique identifier for this Real-Time Transcription sessionGT20dfa03c8cf8aa8d0c4aeccde5558b66
TimestampTime of the event in UTC ISO 8601 timestamp2023-10-19T22:33:22.611Z
SequenceIdInteger sequence number of the event3
TranscriptionEventThe event typetranscription-error
TranscriptionErrorCodeError code32655
TranscriptionErrorError descriptionProvider Unavailable

Example of a transcription-error event payload:

1
{
2
"TranscriptionSid": "GT20dfa03c8cf8aa8d0c4aeccde5558b66",
3
"Timestamp": "2023-10-19T22:33:22.611Z",
4
"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5
"SequenceId": "3",
6
"TranscriptionEvent": "transcription-error",
7
"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
8
"TranscriptionErrorCode": "32655",
9
"TranscriptionError": "Provider Unavailable"
10
}

The languageCode attribute specifies the language in which the transcription should be performed. It accepts a BCP-47 standard language code(link takes you to an external page), such as en-US for American English. This attribute is useful for ensuring that the transcription engine correctly understands and processes the spoken language.

The following TwiML example demonstrates how to specify the languageCode attribute for a transcription for Mexican Spanish. This ensures that the transcription is performed in the specified language, which is particularly useful for calls in languages other than English.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', languageCode: 'es-MX'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" languageCode="es-MX" />
5
</Start>
6
</Response>

The track attribute specifies which audio track should be transcribed. It can take one of the following values: inbound_track, outbound_track, or both_tracks. This attribute is useful for determining whether to transcribe the audio coming from the caller, the callee, or both.

The following TwiML example demonstrates how to specify the track attribute for a transcription.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', track: 'inbound_track'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" track="inbound_track" />
5
</Start>
6
</Response>

The inboundTrackLabel attribute allows you to associate an alphanumeric label with the inbound track being transcribed. This can be useful for identifying and differentiating the inbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.

Refer to the Track labels section below to understand the importance of using labels.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" />
5
</Start>
6
</Response>

Example 1: Inbound Call

example-1-inbound-call page anchor

In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the inbound audio track (agent's speech) is labeled for clarity in the transcription results.

1
<Response>
2
<Start>
3
<Transcription track="inbound_track" inboundTrackLabel="agent" />
4
</Start>
5
</Response>

In this example, the inbound audio track is labeled as "agent". This is useful for scenarios like customer support calls, where distinguishing the agent's responses from the customer's speech is crucial for understanding the interaction.

Example 2: Outbound Call

example-2-outbound-call page anchor

In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the inbound audio track (customer's speech) is labeled for clarity in the transcription results.

1
<Response>
2
<Start>
3
<Transcription track="inbound_track" inboundTrackLabel="customer" />
4
</Start>
5
</Response>

In this example, the inbound audio track is labeled as "customer". This is useful for scenarios like sales calls, where distinguishing the customer's speech in the transcription can help in analyzing customer feedback and engagement.

The outboundTrackLabel attribute allows you to associate an alphanumeric label with the outbound track being transcribed. This can be useful for identifying and differentiating the outbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.

Refer to the Track labels section below to understand the importance of using labels.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" />
5
</Start>
6
</Response>

In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the outbound audio track (customer's speech) is labeled for clarity in the transcription results.

1
<Response>
2
<Start>
3
<Transcription track="outbound_track" outboundTrackLabel="customer" />
4
</Start>
5
</Response>

In this example, the outbound audio track is labeled as "customer". This is useful for scenarios like customer support calls, where distinguishing the customer's speech from the agent's responses is crucial for understanding the interaction.

In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the outbound audio track (agent's speech) is labeled for clarity in the transcription results.

1
<Response>
2
<Start>
3
<Transcription track="outbound_track" outboundTrackLabel="agent" />
4
</Start>
5
</Response>

In this example, the outbound audio track is labeled as "agent". This is useful for scenarios like sales calls, where distinguishing the agent's speech in the transcription can help in analyzing the effectiveness of the sales pitch.

The transcriptionEngine attribute allows you to specify the name of the speech-to-text transcription provider to be used. This can be useful for leveraging specific features or optimizations provided by different transcription engines.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', transcriptionEngine: 'google'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" transcriptionEngine="google" />
5
</Start>
6
</Response>

The speechModel attribute allows you to specify which speech model to use for the transcription.

Maps to Transcription Model(link takes you to an external page) in Google terminology. Different speech models can optimize for different use cases, such as phone calls, video, or enhanced models for higher accuracy.

Refer to the Google documentation to understand each speech model's specific capabilities and configurations.

The telephony speech model is optimized for phone call audio and can provide better accuracy for this type of audio.

The long speech model is optimized for long-form audio, such as lectures or extended conversations, and can provide better accuracy for lengthy audio.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', speechModel: 'telephony', transcriptionEngine: 'google'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" speechModel="telephony" transcriptionEngine="google" />
5
</Start>
6
</Response>

Maps directly to the profanityFilter(link takes you to an external page) in Google's RecognitionFeatures object. The profanityFilter attribute allows you to enable or disable the filtering of profane words in the transcription. When enabled, the transcription engine will attempt to mask or omit any detected profanities in the transcription results.

The example below demonstrates how to enable the profanity filter for the transcription. This is useful for ensuring that any profane language is masked or omitted in the transcription output.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', profanityFilter: false, transcriptionEngine: 'google'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" profanityFilter="false" transcriptionEngine="google" />
5
</Start>
6
</Response>

Maps to StreamingRecognitionResult(link takes you to an external page) specifically when ("is_final"=false) in Google Terminology. The partialResults attribute allows you to enable or disable the delivery of interim transcription results. When enabled, the transcription engine will send partial (interim)(link takes you to an external page) results as the transcription progresses, providing more immediate feedback before the final result is available.

The example below demonstrates how to enable partial results for the transcription.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', partialResults: true, transcriptionEngine: 'google'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" partialResults="true" transcriptionEngine="google" />
5
</Start>
6
</Response>

The hints attribute contains a list of words or phrases that the transcription provider can expect to encounter during a Real-Time Transcription. Using the hints attribute can improve the transcription provider's recognition of words or phrases you expect from your callers.

You may provide up to 500 words or phrases in this list, separating each entry with a comma. Your hints may be up to 100 characters each, and you should separate each word in a phrase with a space, e.g.:

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: 'Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback" />
5
</Start>
6
</Response>

The hints attribute also supports Google's class token(link takes you to an external page) list to improve recognition(link takes you to an external page). You can pass a class token directly in the hints attribute, as shown in the example below.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: '$OOV_CLASS_ALPHANUMERIC_SEQUENCE'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="$OOV_CLASS_ALPHANUMERIC_SEQUENCE" />
5
</Start>
6
</Response>

enableAutomaticPunctuation

enableautomaticpunctuation page anchor

Maps to Automatic Punctuation(link takes you to an external page) in Google Terminology. The enableAutomaticPunctuation attribute allows you to enable or disable automatic punctuation in the transcription. When enabled, the transcription engine will automatically insert punctuation marks such as periods, commas, and question marks, improving the readability of the transcribed text.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const start = response.start();
5
start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', enableAutomaticPunctuation: true, transcriptionEngine: 'google'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Transcription statusCallbackUrl="https://example.com/your-callback-url" enableAutomaticPunctuation="true" transcriptionEngine="google" />
5
</Start>
6
</Response>

Supported language and model combinations

supported-language-and-model-combinations page anchor

Twilio's transcription service supports a variety of languages and models. The examples provided below are specific to Google Speech-to-Text. Depending on the language, certain attributes like speechModel, profanityFilter, and enableAutomaticPunctuation may have different levels of support. For the most up-to-date and comprehensive information, please refer to the Google Speech-to-Text Supported Languages(link takes you to an external page) documentation.

(warning)

Warning

These examples are accurate as of June 2024 and are subject to changes. Customers should always refer back to the Google Speech-to-Text Supported Languages page(link takes you to an external page) for the most current information.

Example 1: Chinese (Simplified, China) with Chirp Model

example-1-chinese-simplified-china-with-chirp-model page anchor

This example demonstrates how to configure transcription for Chinese (Simplified, China) using the Chirp Model with support for automatic punctuation.

1
<Response>
2
<Start>
3
<Transcription
4
transcriptionEngine="google"
5
languageCode="cmn-Hans-CN"
6
speechModel="chirp"
7
enableAutomaticPunctuation="true" />
8
</Start>
9
</Response>

In this example, the profanityFilter attribute, hints attribute, and other advanced features are not supported for this configuration.

Example 2: Spanish (Spain) with Telephony Model

example-2-spanish-spain-with-telephony-model page anchor

This example demonstrates how to configure transcription for Spanish (Spain) using the telephony model with full support for all attributes.

1
<Response>
2
<Start>
3
<Transcription
4
transcriptionEngine="google"
5
languageCode="es-ES"
6
speechModel="telephony"
7
profanityFilter="true"
8
enableAutomaticPunctuation="true" />
9
</Start>
10
</Response>

In this example, the telephony model supports automatic punctuation and profanity filter, but not model adaptation (e.g., hints).

Example 3: Hindi (India) with Short Model

example-3-hindi-india-with-short-model page anchor

This example demonstrates how to configure transcription for Hindi (India) using the short model with support for specific attributes.

1
<Response>
2
<Start>
3
<Transcription
4
transcriptionEngine="google"
5
languageCode="hi-IN"
6
speechModel="short"
7
enableAutomaticPunctuation="true"
8
profanityFilter="true"
9
hints="संपर्क, सेवा, समर्थन, ग्राहक"
10
modelAdaptation="true" />
11
</Start>
12
</Response>

In this example, the short model supports automatic punctuation, profanity filter, model adaptation, and hints.

Example 4: French (Canada) with Long Model

example-4-french-canada-with-long-model page anchor

This example demonstrates how to configure transcription for French (Canada) using the long model with support for specific attributes.

1
<Response>
2
<Start>
3
<Transcription
4
transcriptionEngine="google"
5
languageCode="fr-CA"
6
speechModel="long"
7
hints="service à la clientèle, rendez-vous, commande" />
8
</Start>
9
</Response>

In this example, the long model supports model adaptation through hints, but does not support automatic punctuation, profanity filter, or spoken punctuation.


If specifying inboundTrackLabel or outboundTrackLabel, the call direction mapping table below can be used as a guide.

TrackCall DirectionCall Resource(link takes you to an external page) MappingTrackLabel
Inbound-trackOutboundTO #Label for "who is being called" in an outbound call from Twilio (e.g., inboundTrackLabel="customer").
Outbound-trackOutboundFROM #Label for "who is calling" in an outbound call from Twilio (e.g., outboundTrackLabel="agent").
Inbound-trackInboundFROM #Label for "who is being called" in an inbound call to Twilio (e.g., inboundTrackLabel="agent").
Outbound-trackInboundTO #Label for "who is calling" in an inbound call to Twilio (e.g., outboundTrackLabel="customer").

Note: A call that has an "outbound" direction is a call that is outbound from Twilio, i.e., from Twilio to a customer.


Stop a Real-Time Transcription

stop-a-real-time-transcription page anchor

If you provided a name attribute when starting a Real-Time Transcription session, you can stop a Real-Time Transcription using TwiML or via API.

Given a Real-Time Transcription that was started with the following TwiML instructions:

1
<Response>
2
<Start>
3
<Transcription name="Contact center transcription" />
4
</Start>
5
</Response>

You can stop the Real-Time Transcription with the following TwiML instructions:

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const stop = response.stop();
5
stop.transcription({name: 'Contact center transcription'});
6
7
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Stop>
4
<Transcription name="Contact center transcription" />
5
</Stop>
6
</Response>

If a name was not provided, you can stop an in-progress Real-Time Transcription via API using the SID of the Transcription. See the RealtimeTranscription resource API reference page for more information.


(information)

AI Nutrition Facts

Real-Time Transcriptions, including <Transcriptions> TwiML noun and API, uses third-party artificial technology and machine learning technologies.

Twilio's AI Nutrition Facts(link takes you to an external page) provide an overview of the AI feature you're using, so you can better understand how the AI is working with your data. Real-Time Transcriptions AI qualities are outlined in the following Speech to Text Transcriptions - Programmable Voice Nutrition Facts label. For more information and the glossary regarding the AI Nutrition Facts Label, please refer to Twilio's AI Nutrition Facts page(link takes you to an external page).

AI Nutrition Facts

Speech to Text Transcriptions - Programmable Voice and Voice Intelligence

Description
Generate speech to text voice transcriptions (real-time and post-call) in Programmable Voice and Voice Intelligence.
Privacy Ladder Level
N/A
Feature is Optional
Yes
Model Type
Generative and Predictive - Automatic Speech Recognition
Base Model
Google Speech-to-Text, Amazon Transcribe

Trust Ingredients

Base Model Trained with Customer Data
No

Voice Intelligence and Programmable Voice only use the default Base Model provided by the Model Vendor. The Base Model is not trained using customer data.

Customer Data is Shared with Model Vendor
No

Voice Intelligence and Programmable Voice only use the default Base Model provided by the Model Vendor. The Base Model is not trained using customer data.

Training Data Anonymized
N/A

Base Model is not trained using any customer data.

Data Deletion
Yes

Transcriptions are deleted by the customer using the Voice Intelligence API or when a customer account is deprovisioned.

Human in the Loop
Yes

The customer views output in the Voice Intelligence API or Transcript Viewer.

Data Retention
Until the customer deletes

Compliance

Logging & Auditing
Yes

The customer can listen to the input (recording) and view the output (transcript).

Guardrails
Yes

The customer can listen to the input (recording) and view the output (transcript).

Input/Output Consistency
Yes

The customer is responsible for human review.

Other Resources
https://www.twilio.com/docs/voice/intelligence

Need some help?

Terms of service

Copyright © 2024 Twilio Inc.