Onboarding
Learn to integrate Twilio Conversation Relay by configuring a WebSocket server (wss://) and setting up a TwiML App in the Twilio Console to handle speech recognition and synthesis. You can use this guide to build AI agents and capture AI/ML transcriptions.
See Related reference documentation to learn more about the <ConversationRelay> elements used in this guide.
Complete these prerequisite steps before starting the integration process:
Ensure you have a Twilio account set up and log into the Twilio Console.
Set up a WebSocket server to handle communication with Conversation Relay.
You can set up a WebSocket server using various frameworks and programming languages, such as Node.js with Fastify, Python with FastAPI, or others that support WebSocket protocols.
Note: Ensure your WebSocket server is accessible via a wss:// URL.
Conversation Relay includes an X-Twilio-Signature header in the initial WebSocket handshake request. This signature allows you to validate that requests are genuinely coming from Twilio, following the same verification mechanism used for standard Twilio webhooks.
To validate this signature:
- Extract the
X-Twilio-Signatureheader from the incoming WebSocket connection request. - Use your Twilio auth token and the request URL to validate this signature.
- Only accept connections with valid signatures to prevent spoofed requests.
For detailed implementation guidance, refer to Twilio's webhook security documentation.
Once you've completed the prerequisites, follow these steps to integrate your Twilio application with Conversation Relay.
- Log in to your Twilio Console.
- Navigate to the Voice section, select General under Settings, and turn on the Predictive and Generative AI/ML Features Addendum in order to use Conversation Relay.
- Navigate to the Voice section and select TwiML Apps under Manage.
- Create a new TwiML App or select an existing one.
- In the TwiML App settings, configure the Voice URL to point to your application's endpoint (for example,
https://yourserver.com/voice) where Twilio can retrieve the TwiML containing the<ConversationRelay>instructions. - Save your changes.
Warning
Ensure your WebSocket server is properly configured to handle incoming messages from Conversation Relay. If not, you may encounter connection errors.
After configuring your TwiML App, update your application to handle Conversation Relay interactions:
- Use the
<Connect><ConversationRelay>TwiML noun to route calls to the Conversation Relay service. - Specify the WebSocket URL and any additional attributes needed for your use case.
Example TwiML:
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Connect action="https://myhttpserver.com/connect_action">4<ConversationRelay url="wss://mywebsocketserver.com/websocket" welcomeGreeting="Hi! Ask me anything!" />5</Connect>6</Response>
Once you've integrated Conversation Relay, you can enhance your application with advanced voice capabilities. Consider the following features:
Use Conversation Relay's real-time speech recognition to process caller input and respond dynamically.
Use Conversation Relay to convert text responses into natural-sounding speech, providing a seamless conversational experience.
For optimal speech quality, apply proper text normalization techniques (such as writing out numbers as words or spelling out abbreviations). See the Conversation Relay documentation for detailed text normalization best practices.
Pass custom parameters to Conversation Relay to tailor interactions based on specific use cases.
This guide teaches the basics required for the following use cases:
You can use this guide to connect real-time speech recognition and text-to-speech synthesis to conversational AI platforms. This enables you to build intelligent, human-like voice agents that understand caller intent and reply dynamically with natural-sounding speech. To learn more advanced features that you can use with AI agents, see Voice AI agents.
You can use this guide to stream live audio bi-directionally over WebSockets to your application server. This allows you to capture raw voice input for immediate speech-to-text processing, sentiment analysis, and downstream machine learning tasks. To learn more advanced features that you can use with AI or ML transcription, see Voice AI/ML transcription.
After following this guide, you have integrated your Twilio applications with the Conversation Relay service via a secure WebSocket connection. You can verify your setup by triggering an incoming call to your configured TwiML App number and checking your WebSocket server logs to confirm that the handshake with the valid X-Twilio-Signature header completes successfully and real-time audio messaging begins.
Explore the following guides to build on what you've learned in this guide:
- Getting and sending WebSocket messages: Understand the specific JSON message schemas exchanged during a live conversation.
- Picking a voice: Customize the text-to-speech engine and language options for your conversational experiences.
- Integrate OpenAI with Twilio Voice Using Conversation Relay
- Building Voice Bots with Twilio's Conversation Relay
- Conversation Relay Application and Architecture for Voice AI Applications Built on AWS