Skip to contentSkip to navigationSkip to topbar
Page toolsOn this page
Looking for more inspiration?Visit the

Media Streams - WebSocket Messages


(information)

Support for Twilio Regions

You can use Media Streams in the Ireland (IE1) and Australia (AU1) Regions. To set up Media Streams with these regions, follow the guides for non-US outbound and inbound calls. The default region remains US1.

Learn to handle real-time audio by processing JSON messages sent between Twilio and your server over a WebSocket connection. To do this, establish a connection and listen for media events to receive audio or send media events to pipe audio back to the call.

You can use this guide to build inbound contact centers, outbound contact centers, and AI/ML transcription.

See Related reference documentation to learn more about the <Stream> and Streams subresource used in this guide.


WebSocket messages from Twilio

websocket-messages-from-twilio page anchor

Twilio sends the following message types to your WebSocket server during a Stream:

Connected message

connected-message page anchor

Twilio sends the connected event once a WebSocket connection is established. This is the first message your WebSocket server receives, and this message describes the protocol to expect in the following messages.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, connected.
protocolDefines the protocol for the WebSocket connection's lifetime.
versionSemantic version of the protocol.

An example connected message is shown below.

1
{
2
"event": "connected",
3
"protocol": "Call",
4
"version": "1.0.0"
5
}

The start message contains metadata about the Stream and is sent immediately after the connected message. It is only sent once at the start of the Stream.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, start.
sequenceNumberNumber used to keep track of message sending order. The first message has a value of 1 and then is incremented for each subsequent message.
startAn object containing Stream metadata
start.streamSidThe unique identifier of the Stream
start.accountSidThe SID of the Account that created the Stream
start.callSidThe SID of the Call that started the Stream
start.tracksAn array of strings that indicate which media flows are expected in subsequent messages. Values include inbound, outbound.
start.customParametersAn object containing the custom parameters that were set when defining the Stream
start.mediaFormatAn object containing the format of the payload in the media messages.
start.mediaFormat.encodingThe encoding of the data in the upcoming payload. Value is always audio/x-mulaw.
start.mediaFormat.sampleRateThe sample rate in hertz of the upcoming audio data. Value is always 8000
start.mediaFormat.channelsThe number of channels in the input audio data. Value is always 1
streamSidThe unique identifier of the Stream

The start.customParameters object is populated with the values you provided when starting the stream. See the <Stream> TwiML doc or the Stream resource API reference doc for more info.

An example start message is shown below.

1
{
2
"event": "start",
3
"sequenceNumber": "1",
4
"start": {
5
"accountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
6
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
7
"callSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
8
"tracks": [ "inbound" ],
9
"mediaFormat": {
10
"encoding": "audio/x-mulaw",
11
"sampleRate": 8000,
12
"channels": 1 },
13
"customParameters": {
14
"FirstName": "Jane",
15
"LastName": "Doe",
16
"RemoteParty": "Bob"
17
}
18
},
19
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
20
}

This message type encapsulates the raw audio data.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, "media".
sequenceNumberNumber used to keep track of message sending order. The first message has a value of 1 and then is incremented for each subsequent message.
mediaAn object containing media metadata and payload
media.trackOne of inbound or outbound
media.chunkThe chunk for the message. The first message will begin with 1 and increment with each subsequent message.
media.timestampPresentation Timestamp in Milliseconds from the start of the stream.
media.payloadRaw audio encoded in base64
streamSidThe unique identifier of the Stream

An example outbound media message is shown below. The payload value is abbreviated.

1
{
2
"event": "media",
3
"sequenceNumber": "3",
4
"media": {
5
"track": "outbound",
6
"chunk": "1",
7
"timestamp": "5",
8
"payload": "no+JhoaJjpz..."
9
} ,
10
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
11
}

An example inbound media message is shown below. The payload value is abbreviated.

1
2
{
3
"event": "media",
4
"sequenceNumber": "4",
5
"media": {
6
"track": "inbound",
7
"chunk": "2",
8
"timestamp": "5",
9
"payload": "no+JhoaJjpzS..."
10
},
11
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
12
}

Twilio sends a stop message when the Stream has stopped or the call has ended.

For unidirectional Streams, a Stream can be stopped with the <Stop> TwiML instruction or by updating the Stream resource's status to stopped.

For bidirectional Streams, the only way to stop a Stream is to end the call.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, stop.
sequenceNumberNumber used to keep track of message sending order. The first message has a value of 1 and then is incremented for each subsequent message.
stopAn object containing Stream metadata
stop.accountSidThe Account identifier that created the Stream
stop.callSidThe Call identifier that started the Stream
streamSidThe unique identifier of the Stream

An example stop message is shown below.

1
{
2
"event": "stop",
3
"sequenceNumber": "5",
4
"stop": {
5
"accountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
6
"callSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
7
},
8
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
9
}

The dtmf message is currently only supported in bidirectional Streams.

A dtmf message is sent when someone presses a touch-tone number key in the inbound stream, typically in response to a prompt in the outbound stream.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, dtmf.
streamSidThe unique identifier of the Stream
sequenceNumberNumber used to keep track of message sending order. The first message has a value of 1 and then is incremented for each subsequent message.
dtmf.trackThe track on which the DTMF key was pressed. Value is always inbound_track
dtmf.digitThe number-key tone detected.

An example dtmf message is shown below. The dtmf.digit value is 1, indicating that someone pressed the 1 key on their handset.

1
2
{
3
"event": "dtmf",
4
"streamSid":"MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5
"sequenceNumber":"5",
6
"dtmf": {
7
"track":"inbound_track",
8
"digit": "1"
9
}
10
}

Twilio sends the mark event only during bidirectional Streams.

When your server sends a media message, it should then send a mark message to Twilio. When that media message's playback is complete, Twilio sends a mark message to your server using the same mark.name as the one your server sent. Your application can use this information to keep track of which media has played on the Call.

If your server sends a clear message, Twilio empties the audio buffer and sends back mark messages matching any remaining mark messages from your server. Your application can use this information to keep track of which media messages have been cleared and will not be played.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, "mark".
streamSidThe unique identifier of the Stream
sequenceNumberNumber used to keep track of message sending order. The first message has a value of 1 and then is incremented for each subsequent message.
markAn object containing the mark metadata
mark.nameA custom value. Twilio sends back the mark.name you specify when it receives a mark message

An example mark message is shown below.

1
{
2
"event": "mark",
3
"sequenceNumber": "4",
4
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5
"mark": {
6
"name": "my label"
7
}
8
}

Send WebSocket messages to Twilio

send-websocket-messages-to-twilio page anchor

If you initiated a Stream using <Connect><Stream>, your Stream is bidirectional. This means you can send WebSocket messages back to Twilio, allowing you to pipe audio back into the Call and control the flow of the Stream.

The messages that your WebSocket server can send back to Twilio are:

To send media back to Twilio, you must provide a properly formatted media message.

The payload must be encoded audio/x-mulaw with a sample rate of 8000 and must be base64 encoded. The audio can be of any size.

The media messages are buffered and played in the order received. If you need to interrupt the buffered audio, send a clear message.

(warning)

Warning

The media.payload should not contain audio file type header bytes. Providing header bytes causes the media to be streamed incorrectly.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, "media".
streamSidThe SID of the Stream that should play the audio
mediaAn object containing the media payload
media.payloadRaw mulaw/8000 audio in encoded in base64

Below is an example media message that your WebSocket server sends back to Twilio. The media.payload is abbreviated.

1
{
2
"event": "media",
3
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
4
"media": {
5
"payload": "a3242sa..."
6
}
7
}

Send a mark event message after sending a media event message to be notified when the audio that you have sent has been completed. Twilio sends back a mark event with a matching name when the audio ends (or if there is no audio buffered).

Your application also receives an incoming mark event message if the buffer was cleared using the clear event message.

PropertyDescription
eventDescribes the type of WebSocket message. In this case "mark".
streamSidThe SID of the Stream that should receive the mark
markAn object containing mark metadata and payload
mark.nameA name specific to your needs that will assist in recognizing future received mark event

Below is an example mark message that your WebSocket server sends to Twilio.

1
{
2
"event": "mark",
3
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
4
"mark": {
5
"name": "my label"
6
}
7
}

Send a clear message if you want to interrupt the audio that has been sent in various media messages. This empties all buffered audio and causes any mark messages to be sent back to your WebSocket server.

PropertyDescription
eventDescribes the type of WebSocket message. In this case, "clear".
streamSidThe SID of the Stream in which you want to interrupt the audio.

Below is an example clear message that your WebSocket server sends to Twilio.

1
{
2
"event": "clear",
3
"streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
4
}

Use cases for Media Streams with Twilio Programmable Voice

use-cases-for-media-streams-with-twilio-programmable-voice page anchor

This guide teaches the basics required for the following use cases:

Create an inbound contact center with Twilio Programmable Voice

create-an-inbound-contact-center-with-twilio-programmable-voice page anchor

You can use this guide to stream audio from inbound calls to your server in real-time. This allows you to integrate with external tools for agent assistance or live call monitoring.

To learn more advanced features that you can use with inbound contact centers, see Voice inbound contact center.

Create an outbound call center with Twilio Programmable Voice

create-an-outbound-call-center-with-twilio-programmable-voice page anchor

You can use this guide to stream audio from outbound calls to your server, enabling real-time analysis of agent-customer interactions for quality assurance or automated coaching.

To learn more advanced features that you can use with outbound call centers, see Voice outbound contact center.

Create transcriptions for AI or ML with Twilio Programmable Voice

create-transcriptions-for-ai-or-ml-with-twilio-programmable-voice page anchor

You can use this guide to capture raw audio chunks by using WebSockets for immediate processing by AI models. This is essential for building real-time transcription services or sentiment analysis engines.

To learn more advanced features that you can use with AI or ML transcription, see Voice AI/ML transcription.


After following this guide, you can successfully send and receive real-time audio messages between Twilio and your WebSocket server. You can verify this by logging the connected and start messages upon stream initiation and confirming the receipt of media messages containing base64 encoded audio payloads.


Explore the following guides to build on what you've learned in this guide: