Picking a voice

Learn to configure text-to-speech (TTS) voices for Twilio Conversation Relay by setting the ttsProvider and voice attributes in your TwiML. To do this, pick a provider, select a voice ID from the available inventory, and specify it within your <ConversationRelay> configuration. You can use this guide to create AI agents and AI/ML transcription.

See Related reference documentation to learn more about the <ConversationRelay> and TwiML elements used in this guide.

(information)

Info

If your workflow is subject to PCI, not all TTS and transcription providers are guaranteed to be PCI compliant. See Twilio's Responsibility Matrix for further information.

SSML support by provider

Support for Speech Synthesis Markup Language (SSML) tags varies by TTS provider in Conversation Relay:

Google and Amazon Polly: These engines support a wide range of SSML tags. To review a complete list of supported tags, see the SSML documentation.
ElevenLabs: Only supports the <phoneme> tag, and only for the en-US language. See the ElevenLabs documentation for details.

Google and Amazon Polly voices

For voices from Google or Amazon (including generative options), refer to our Twilio TTS Voices documentation. Each provider offers a variety of languages and styles, enabling you to tailor your application's voice experience to your specific needs.

How to use Google and Amazon Polly voices

Browse the available voices in the Available voices and languages table. Test them using the Twilio Console to find the one that best fits your application's requirements.
Copy the voice ID from the table (for example, en-US-Wavenet-D).
Configure the <ConversationRelay> noun in TwiML: Set ttsProvider to Google or Amazon and use the copied voice ID in the voice attribute.

ElevenLabs voices

ElevenLabs uses the Flash 2.5 model by default for text-to-speech. Use the interface below to search and filter through a wide selection of ElevenLabs voices by language, accent, age, and more. Each voice entry includes a voice ID that you can copy and paste into your <ConversationRelay> configuration.

How to use ElevenLabs voices

Search or filter: Pick a voice using the tool below that matches your requirements.
Copy the voice ID: From the search results, copy the voice ID (for example, NYC9WEgkq1u4jiqBseQ9).
Configure the <ConversationRelay> noun: In your TwiML, set ttsProvider="ElevenLabs" and use the copied voice ID in the voice attribute.
Pick an audio model (optional): The voices from ElevenLabs use the Flash 2.5 model by default. Other models are available and could improve the quality or performance of your application depending on your use case. You can use a different model by appending a hyphen to the voice ID followed by the model ID. The supported model IDs include flash_v2, turbo_v2_5, turbo_v2 and the default, flash_v2_5. Some models only work with a specific set of languages. You can learn about the strengths and the supported languages of each model on the ElevenLabs website.
Customize your ElevenLabs voice (recommended): You can adjust the speed and other characteristics of your chosen ElevenLabs voice. To do that, add a hyphen to the end of the voice attribute followed by an underscore-separated string with values for speed, stability, and similarity respectively. The speed should be a value between 0.7 and 1.2 and the stability and similarity values can range from 0.0 to 1.0.

For example, a voice attribute of XrExE9yKIg1WjnnlVkGX-1.2_0.6_0.8 will set the speed to 1.2, the stability to 0.6, and the similarity to 0.8. See the ElevenLabs documentation to learn more about how these settings affect your application's voice.

Example:

1<Connect>
2  <ConversationRelay url="wss://example.com/websocket" ttsProvider="ElevenLabs" voice="NYC9WEgkq1u4jiqBseQ9-turbo_v2_5-0.8_0.8_0.6" ... />
3</Connect>

Default voice settings

If you don't explicitly specify the voice attribute in your <ConversationRelay> configuration, Conversation Relay automatically applies a default voice based on the language setting (as defined by the language or ttsLanguage attribute) and the selected TTS provider (default is ElevenLabs). Below is the complete list of default voice settings:

Language	Voice ID	TTS provider	Speech model	Transcription provider
bg-BG	AB9XsbSA4eLG12t2myjN	ElevenLabs	long	Google
cs-CZ	uYFJyGaibp4N2VwYQshk	ElevenLabs	long	Google
da-DK	ygiXC2Oa1BiHksD3WkJZ	ElevenLabs	long	Google
de-DE	FTNCalFNG5bRnkkaP5Ug	ElevenLabs	telephony	Google
en-AU	9Ft9sm9dzvprPILZmLJl	ElevenLabs	telephony	Google
en-GB	Fahco4VZzobUeiPqni1S	ElevenLabs	telephony	Google
en-IN	mCQMfsqGDT6IDkEKR20a	ElevenLabs	long	Google
en-US	UgBBYS2sOqTuMpoF3BR0	ElevenLabs	telephony	Google
es-ES	6xftrpatV0jGmFHxDjUv	ElevenLabs	telephony	Google
es-US	CaJslL1xziwefCeTNzHv	ElevenLabs	telephony	Google
fi-FI	6xPz2opT0y5qtoRh1U1Y	ElevenLabs	long	Google
fr-CA	IPgYtHTNLjC7Bq7IPHrm	ElevenLabs	telephony	Google
fr-FR	a5n9pJUnAhX4fn7lx3uo	ElevenLabs	telephony	Google
hi-IN	IvLWq57RKibBrqZGpQrC	ElevenLabs	long	Google
hu-HU	TumdjBNWanlT3ysvclWh	ElevenLabs	long	Google
id-ID	1k39YpzqXZn52BgyLyGO	ElevenLabs	long	Google
it-IT	uScy1bXtKz8vPzfdFsFw	ElevenLabs	telephony	Google
ja-JP	3JDquces8E8bkmvbh6Bc	ElevenLabs	telephony	Google
kn-IN	kn-IN-Standard-A	Google	long	Google
ko-KR	uyVNoMrnUku1dZyVEXwD	ElevenLabs	telephony	Google
ml-IN	ml-IN-Standard-A	Google	long	Google
mr-IN	mr-IN-Standard-A	Google	long	Google
nl-BE	s7Z6uboUuE4Nd8Q2nye6	ElevenLabs	telephony	Google
nl-NL	UNBIyLbtFB9k7FKW8wJv	ElevenLabs	telephony	Google
pl-PL	W0sqKm1Sfw1EzlCH14FQ	ElevenLabs	long	Google
pt-BR	CstacWqMhJQlnfLPxRG4	ElevenLabs	telephony	Google
pt-PT	TsZfI8Nbn2Xd7ArC76n9	ElevenLabs	telephony	Google
ro-RO	OlBp4oyr3FBAGEAtJOnU	ElevenLabs	long	Google
ru-RU	AB9XsbSA4eLG12t2myjN	ElevenLabs	long	Google
sv-SE	4xkUqaR9MYOJHoaC1Nak	ElevenLabs	long	Google
ta-IN	ZhJ5LanYnCmLKQUXvsV7	ElevenLabs	long	Google
te-IN	te-IN-Standard-A	Google	long	Google
th-TH	th-TH-Standard-A	Google	long	Google
tr-TR	IuRRIAcbQK5AQk1XevPj	ElevenLabs	long	Google
uk-UA	nCqaTnIbLdME87OuQaZY	ElevenLabs	long	Google
vi-VN	foH7s9fX31wFFH2yqrFa	ElevenLabs	long	Google

Our internal configuration defines these default settings and updates them periodically. Refer to the Twilio Twilio TTS Voices documentation for a complete and current list of supported languages, default voices, and detailed settings.

Use cases for picking a voice with Conversation Relay

This guide teaches the basics required for the following use cases:

Create AI agents with Twilio Conversation Relay

You can use this guide to select natural-sounding, low-latency voices that enhance the realism of your autonomous AI agents. By choosing specific providers like ElevenLabs, you can tailor the personality and tone of your AI agent to better suit your brand's customer service experience.

To learn more advanced features that you can use with AI agents, see AI agents.

Create transcriptions for AI or ML with Twilio Conversation Relay

You can use this guide to define the voice output for applications that simultaneously capture and transcribe audio for downstream machine learning analysis. Configuring the right TTS engine ensures that your application provides high-quality audio interactions while your system processes real-time data.

To learn more advanced features that you can use with AI or ML transcription, see Voice AI/ML transcription.

Result

After following this guide, you can successfully configure and customize the text-to-speech voice for your Twilio Conversation Relay application. You can confirm it worked by making a test call to your application and verifying that the relay uses your selected provider and voice ID with the specific speed, stability, and similarity settings you defined.

Next steps

Explore the following guide to build on what you've learned in this guide:

Getting and sending WebSocket messages: Understand how to handle real-time data exchange between Twilio and your application via WebSockets.