Picking a voice
Learn to configure text-to-speech (TTS) voices for Twilio Conversation Relay by setting the ttsProvider and voice attributes in your TwiML. To do this, pick a provider, select a voice ID from the available inventory, and specify it within your <ConversationRelay> configuration. You can use this guide to create AI agents and AI/ML transcription.
See Related reference documentation to learn more about the <ConversationRelay> and TwiML elements used in this guide.
Info
If your workflow is subject to PCI, not all TTS and transcription providers are guaranteed to be PCI compliant. See Twilio's Responsibility Matrix for further information.
Support for Speech Synthesis Markup Language (SSML) tags varies by TTS provider in Conversation Relay:
- Google and Amazon Polly: These engines support a wide range of SSML tags. To review a complete list of supported tags, see the SSML documentation.
- ElevenLabs: Only supports the
<phoneme>tag, and only for theen-USlanguage. See the ElevenLabs documentation for details.
For voices from Google or Amazon (including generative options), refer to our Twilio TTS Voices documentation. Each provider offers a variety of languages and styles, enabling you to tailor your application's voice experience to your specific needs.
- Browse the available voices in the Available voices and languages table. Test them using the Twilio Console to find the one that best fits your application's requirements.
- Copy the voice ID from the table (for example,
en-US-Wavenet-D). - Configure the
<ConversationRelay>noun in TwiML: SetttsProvidertoGoogleorAmazonand use the copied voice ID in thevoiceattribute.
ElevenLabs uses the Flash 2.5 model by default for text-to-speech. Use the interface below to search and filter through a wide selection of ElevenLabs voices by language, accent, age, and more. Each voice entry includes a voice ID that you can copy and paste into your <ConversationRelay> configuration.
-
Search or filter: Pick a voice using the tool below that matches your requirements.
-
Copy the voice ID: From the search results, copy the voice ID (for example,
NYC9WEgkq1u4jiqBseQ9). -
Configure the
<ConversationRelay>noun: In your TwiML, setttsProvider="ElevenLabs"and use the copied voice ID in thevoiceattribute. -
Pick an audio model (optional): The voices from ElevenLabs use the Flash 2.5 model by default. Other models are available and could improve the quality or performance of your application depending on your use case. You can use a different model by appending a hyphen to the voice ID followed by the model ID. The supported model IDs include
flash_v2,turbo_v2_5,turbo_v2and the default,flash_v2_5. Some models only work with a specific set of languages. You can learn about the strengths and the supported languages of each model on the ElevenLabs website. -
Customize your ElevenLabs voice (recommended): You can adjust the
speedand other characteristics of your chosen ElevenLabs voice. To do that, add a hyphen to the end of thevoiceattribute followed by an underscore-separated string with values forspeed,stability, andsimilarityrespectively. Thespeedshould be a value between 0.7 and 1.2 and thestabilityandsimilarityvalues can range from 0.0 to 1.0.For example, a
voiceattribute ofXrExE9yKIg1WjnnlVkGX-1.2_0.6_0.8will set the speed to1.2, thestabilityto0.6, and thesimilarityto0.8. See the ElevenLabs documentation to learn more about how these settings affect your application's voice.
Example:
1<Connect>2<ConversationRelay url="wss://example.com/websocket" ttsProvider="ElevenLabs" voice="NYC9WEgkq1u4jiqBseQ9-turbo_v2_5-0.8_0.8_0.6" ... />3</Connect>
If you don't explicitly specify the voice attribute in your <ConversationRelay> configuration, Conversation Relay automatically applies a default voice based on the language setting (as defined by the language or ttsLanguage attribute) and the selected TTS provider (default is ElevenLabs). Below is the complete list of default voice settings:
| Language | Voice ID | TTS provider | Speech model | Transcription provider |
|---|---|---|---|---|
| bg-BG | AB9XsbSA4eLG12t2myjN | ElevenLabs | long | |
| cs-CZ | uYFJyGaibp4N2VwYQshk | ElevenLabs | long | |
| da-DK | ygiXC2Oa1BiHksD3WkJZ | ElevenLabs | long | |
| de-DE | FTNCalFNG5bRnkkaP5Ug | ElevenLabs | telephony | |
| en-AU | 9Ft9sm9dzvprPILZmLJl | ElevenLabs | telephony | |
| en-GB | Fahco4VZzobUeiPqni1S | ElevenLabs | telephony | |
| en-IN | mCQMfsqGDT6IDkEKR20a | ElevenLabs | long | |
| en-US | UgBBYS2sOqTuMpoF3BR0 | ElevenLabs | telephony | |
| es-ES | 6xftrpatV0jGmFHxDjUv | ElevenLabs | telephony | |
| es-US | CaJslL1xziwefCeTNzHv | ElevenLabs | telephony | |
| fi-FI | 6xPz2opT0y5qtoRh1U1Y | ElevenLabs | long | |
| fr-CA | IPgYtHTNLjC7Bq7IPHrm | ElevenLabs | telephony | |
| fr-FR | a5n9pJUnAhX4fn7lx3uo | ElevenLabs | telephony | |
| hi-IN | IvLWq57RKibBrqZGpQrC | ElevenLabs | long | |
| hu-HU | TumdjBNWanlT3ysvclWh | ElevenLabs | long | |
| id-ID | 1k39YpzqXZn52BgyLyGO | ElevenLabs | long | |
| it-IT | uScy1bXtKz8vPzfdFsFw | ElevenLabs | telephony | |
| ja-JP | 3JDquces8E8bkmvbh6Bc | ElevenLabs | telephony | |
| kn-IN | kn-IN-Standard-A | long | ||
| ko-KR | uyVNoMrnUku1dZyVEXwD | ElevenLabs | telephony | |
| ml-IN | ml-IN-Standard-A | long | ||
| mr-IN | mr-IN-Standard-A | long | ||
| nl-BE | s7Z6uboUuE4Nd8Q2nye6 | ElevenLabs | telephony | |
| nl-NL | UNBIyLbtFB9k7FKW8wJv | ElevenLabs | telephony | |
| pl-PL | W0sqKm1Sfw1EzlCH14FQ | ElevenLabs | long | |
| pt-BR | CstacWqMhJQlnfLPxRG4 | ElevenLabs | telephony | |
| pt-PT | TsZfI8Nbn2Xd7ArC76n9 | ElevenLabs | telephony | |
| ro-RO | OlBp4oyr3FBAGEAtJOnU | ElevenLabs | long | |
| ru-RU | AB9XsbSA4eLG12t2myjN | ElevenLabs | long | |
| sv-SE | 4xkUqaR9MYOJHoaC1Nak | ElevenLabs | long | |
| ta-IN | ZhJ5LanYnCmLKQUXvsV7 | ElevenLabs | long | |
| te-IN | te-IN-Standard-A | long | ||
| th-TH | th-TH-Standard-A | long | ||
| tr-TR | IuRRIAcbQK5AQk1XevPj | ElevenLabs | long | |
| uk-UA | nCqaTnIbLdME87OuQaZY | ElevenLabs | long | |
| vi-VN | foH7s9fX31wFFH2yqrFa | ElevenLabs | long |
Our internal configuration defines these default settings and updates them periodically. Refer to the Twilio Twilio TTS Voices documentation for a complete and current list of supported languages, default voices, and detailed settings.
This guide teaches the basics required for the following use cases:
You can use this guide to select natural-sounding, low-latency voices that enhance the realism of your autonomous AI agents. By choosing specific providers like ElevenLabs, you can tailor the personality and tone of your AI agent to better suit your brand's customer service experience.
To learn more advanced features that you can use with AI agents, see AI agents.
You can use this guide to define the voice output for applications that simultaneously capture and transcribe audio for downstream machine learning analysis. Configuring the right TTS engine ensures that your application provides high-quality audio interactions while your system processes real-time data.
To learn more advanced features that you can use with AI or ML transcription, see Voice AI/ML transcription.
After following this guide, you can successfully configure and customize the text-to-speech voice for your Twilio Conversation Relay application. You can confirm it worked by making a test call to your application and verifying that the relay uses your selected provider and voice ID with the specific speed, stability, and similarity settings you defined.
Explore the following guide to build on what you've learned in this guide:
- Getting and sending WebSocket messages: Understand how to handle real-time data exchange between Twilio and your application via WebSockets.