Skip to contentSkip to navigationSkip to topbar
Page toolsOn this page
Looking for more inspiration?Visit the

Consume a real-time Media Stream using WebSockets, Python, and Flask


Learn to access and process real-time voice audio from a live Twilio call. You'll establish a WebSocket connection to a Flask server using the <Stream> TwiML verb. You can use this guide to create inbound contact centers and AI/ML transcription.

See Related reference documentation to learn more about the <Stream> and <Start> elements used in this guide.


Meet Media Streams

meet-media-streams page anchor

With Twilio's Media Streams, you can access real-time voice data from a Twilio call. Media Streams will stream the audio from the call for its entire duration to a location of your choice.

In this tutorial, you will learn how to stream audio from a live phone call using Twilio, Python, and Flask. You might want to stream audio to provide real-time sentiment analysis for all calls happening within a call center. While we will dial a specific number in this tutorial, you can imagine this number being populated dynamically from the call center software.

(information)

Info

Want to see the Flask portion of this project in its entirety? Head over to the GitHub repository(link takes you to an external page), where you can clone the project and run it locally.


Twilio Media Streams uses WebSockets to deliver your audio.

A WebSocket is an upgraded HTTP protocol. WebSockets are intended to be used for long-running connections and are ideal for real-time applications. A handshake is made, a connection is created, and, unlike HTTP, multiple messages are expected to be sent over the socket until it is closed. This helps to remove the need for long-polling applications.

The WebSocket interface is included natively in nearly all client-side web browser implementations.

There are numerous WebSocket Server implementations available for just about every web framework. We'll use the Flask-Sockets to help us through this tutorial.


Set up your Python environment

set-up-your-python-environment page anchor

In this tutorial, we're going to use the web framework Flask and the WebSocket package Flask Sockets(link takes you to an external page). Create a virtual environment and install flask-sockets in your terminal:

1
python3 -m venv venv
2
source ./venv/bin/activate
3
pip install flask flask-sockets

Now that the package is installed, we can spin up a Flask web server.


Build your WebSocket server

build-your-websocket-server page anchor

The sockets decorator helps you create a WebSocket route with @socket.route.

Create a @socket decorator

create-a-socket-decorator page anchor

This allows you to respond to named WebSocket paths (e.g., /media)

1
import base64
2
import json
3
import logging
4
5
from flask import Flask
6
from flask_sockets import Sockets
7
8
app = Flask(__name__)
9
sockets = Sockets(app)
10
11
HTTP_SERVER_PORT = 5000
12
13
@sockets.route('/media')
14
def echo(ws):
15
app.logger.info("Connection accepted")
16
# A lot of messages will be sent rapidly. We'll stop showing after the first one.
17
has_seen_media = False
18
message_count = 0
19
while not ws.closed:
20
message = ws.receive()
21
if message is None:
22
app.logger.info("No message received...")
23
continue
24
25
# Messages are a JSON encoded string
26
data = json.loads(message)
27
28
# Using the event type you can determine what type of message you are receiving
29
if data['event'] == "connected":
30
app.logger.info("Connected Message received: {}".format(message))
31
if data['event'] == "start":
32
app.logger.info("Start Message received: {}".format(message))
33
if data['event'] == "media":
34
if not has_seen_media:
35
app.logger.info("Media message: {}".format(message))
36
payload = data['media']['payload']
37
app.logger.info("Payload is: {}".format(payload))
38
chunk = base64.b64decode(payload)
39
app.logger.info("That's {} bytes".format(len(chunk)))
40
app.logger.info("Additional media messages from WebSocket are being suppressed....")
41
has_seen_media = True
42
if data['event'] == "closed":
43
app.logger.info("Closed Message received: {}".format(message))
44
break
45
message_count += 1
46
47
app.logger.info("Connection closed. Received a total of {} messages".format(message_count))
48
49
50
if __name__ == '__main__':
51
app.logger.setLevel(logging.DEBUG)
52
from gevent import pywsgi
53
from geventwebsocket.handler import WebSocketHandler
54
55
server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
56
print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
57
server.serve_forever()

Flask Sockets relies on gevent(link takes you to an external page) for multithreading, so this server startup looks a little more detailed than a typical Flask server setup.

Start your server using a WebSocket handler

start-your-server-using-a-websocket-handler page anchor
1
import base64
2
import json
3
import logging
4
5
from flask import Flask
6
from flask_sockets import Sockets
7
8
app = Flask(__name__)
9
sockets = Sockets(app)
10
11
HTTP_SERVER_PORT = 5000
12
13
@sockets.route('/media')
14
def echo(ws):
15
app.logger.info("Connection accepted")
16
# A lot of messages will be sent rapidly. We'll stop showing after the first one.
17
has_seen_media = False
18
message_count = 0
19
while not ws.closed:
20
message = ws.receive()
21
if message is None:
22
app.logger.info("No message received...")
23
continue
24
25
# Messages are a JSON encoded string
26
data = json.loads(message)
27
28
# Using the event type you can determine what type of message you are receiving
29
if data['event'] == "connected":
30
app.logger.info("Connected Message received: {}".format(message))
31
if data['event'] == "start":
32
app.logger.info("Start Message received: {}".format(message))
33
if data['event'] == "media":
34
if not has_seen_media:
35
app.logger.info("Media message: {}".format(message))
36
payload = data['media']['payload']
37
app.logger.info("Payload is: {}".format(payload))
38
chunk = base64.b64decode(payload)
39
app.logger.info("That's {} bytes".format(len(chunk)))
40
app.logger.info("Additional media messages from WebSocket are being suppressed....")
41
has_seen_media = True
42
if data['event'] == "closed":
43
app.logger.info("Closed Message received: {}".format(message))
44
break
45
message_count += 1
46
47
app.logger.info("Connection closed. Received a total of {} messages".format(message_count))
48
49
50
if __name__ == '__main__':
51
app.logger.setLevel(logging.DEBUG)
52
from gevent import pywsgi
53
from geventwebsocket.handler import WebSocketHandler
54
55
server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
56
print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
57
server.serve_forever()

A typical pattern in most WebSocket server implementations is to continue reading until the WebSocket connection closes:

Read from the connected WebSocket until it is closed

read-from-the-connected-websocket-until-it-is-closed page anchor
1
import base64
2
import json
3
import logging
4
5
from flask import Flask
6
from flask_sockets import Sockets
7
8
app = Flask(__name__)
9
sockets = Sockets(app)
10
11
HTTP_SERVER_PORT = 5000
12
13
@sockets.route('/media')
14
def echo(ws):
15
app.logger.info("Connection accepted")
16
# A lot of messages will be sent rapidly. We'll stop showing after the first one.
17
has_seen_media = False
18
message_count = 0
19
while not ws.closed:
20
message = ws.receive()
21
if message is None:
22
app.logger.info("No message received...")
23
continue
24
25
# Messages are a JSON encoded string
26
data = json.loads(message)
27
28
# Using the event type you can determine what type of message you are receiving
29
if data['event'] == "connected":
30
app.logger.info("Connected Message received: {}".format(message))
31
if data['event'] == "start":
32
app.logger.info("Start Message received: {}".format(message))
33
if data['event'] == "media":
34
if not has_seen_media:
35
app.logger.info("Media message: {}".format(message))
36
payload = data['media']['payload']
37
app.logger.info("Payload is: {}".format(payload))
38
chunk = base64.b64decode(payload)
39
app.logger.info("That's {} bytes".format(len(chunk)))
40
app.logger.info("Additional media messages from WebSocket are being suppressed....")
41
has_seen_media = True
42
if data['event'] == "closed":
43
app.logger.info("Closed Message received: {}".format(message))
44
break
45
message_count += 1
46
47
app.logger.info("Connection closed. Received a total of {} messages".format(message_count))
48
49
50
if __name__ == '__main__':
51
app.logger.setLevel(logging.DEBUG)
52
from gevent import pywsgi
53
from geventwebsocket.handler import WebSocketHandler
54
55
server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
56
print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
57
server.serve_forever()

All messages that are passed over MediaStreams WebSockets are in JSON format.

Python provides a straightforward way to decode JSON:

1
import base64
2
import json
3
import logging
4
5
from flask import Flask
6
from flask_sockets import Sockets
7
8
app = Flask(__name__)
9
sockets = Sockets(app)
10
11
HTTP_SERVER_PORT = 5000
12
13
@sockets.route('/media')
14
def echo(ws):
15
app.logger.info("Connection accepted")
16
# A lot of messages will be sent rapidly. We'll stop showing after the first one.
17
has_seen_media = False
18
message_count = 0
19
while not ws.closed:
20
message = ws.receive()
21
if message is None:
22
app.logger.info("No message received...")
23
continue
24
25
# Messages are a JSON encoded string
26
data = json.loads(message)
27
28
# Using the event type you can determine what type of message you are receiving
29
if data['event'] == "connected":
30
app.logger.info("Connected Message received: {}".format(message))
31
if data['event'] == "start":
32
app.logger.info("Start Message received: {}".format(message))
33
if data['event'] == "media":
34
if not has_seen_media:
35
app.logger.info("Media message: {}".format(message))
36
payload = data['media']['payload']
37
app.logger.info("Payload is: {}".format(payload))
38
chunk = base64.b64decode(payload)
39
app.logger.info("That's {} bytes".format(len(chunk)))
40
app.logger.info("Additional media messages from WebSocket are being suppressed....")
41
has_seen_media = True
42
if data['event'] == "closed":
43
app.logger.info("Closed Message received: {}".format(message))
44
break
45
message_count += 1
46
47
app.logger.info("Connection closed. Received a total of {} messages".format(message_count))
48
49
50
if __name__ == '__main__':
51
app.logger.setLevel(logging.DEBUG)
52
from gevent import pywsgi
53
from geventwebsocket.handler import WebSocketHandler
54
55
server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
56
print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
57
server.serve_forever()

There are four different message types that you will encounter:

  • connected
  • start
  • media
  • stop.

The Start message will contain important information about the stream, like the type of audio, its name, the originating call and any other custom parameters you might have sent.

This information will likely come in handy for whatever service you plan to use with your real-time audio.

You can handle each type by looking at the messages event property.

Handle each message type by using the event property

handle-each-message-type-by-using-the-event-property page anchor
1
import base64
2
import json
3
import logging
4
5
from flask import Flask
6
from flask_sockets import Sockets
7
8
app = Flask(__name__)
9
sockets = Sockets(app)
10
11
HTTP_SERVER_PORT = 5000
12
13
@sockets.route('/media')
14
def echo(ws):
15
app.logger.info("Connection accepted")
16
# A lot of messages will be sent rapidly. We'll stop showing after the first one.
17
has_seen_media = False
18
message_count = 0
19
while not ws.closed:
20
message = ws.receive()
21
if message is None:
22
app.logger.info("No message received...")
23
continue
24
25
# Messages are a JSON encoded string
26
data = json.loads(message)
27
28
# Using the event type you can determine what type of message you are receiving
29
if data['event'] == "connected":
30
app.logger.info("Connected Message received: {}".format(message))
31
if data['event'] == "start":
32
app.logger.info("Start Message received: {}".format(message))
33
if data['event'] == "media":
34
if not has_seen_media:
35
app.logger.info("Media message: {}".format(message))
36
payload = data['media']['payload']
37
app.logger.info("Payload is: {}".format(payload))
38
chunk = base64.b64decode(payload)
39
app.logger.info("That's {} bytes".format(len(chunk)))
40
app.logger.info("Additional media messages from WebSocket are being suppressed....")
41
has_seen_media = True
42
if data['event'] == "closed":
43
app.logger.info("Closed Message received: {}".format(message))
44
break
45
message_count += 1
46
47
app.logger.info("Connection closed. Received a total of {} messages".format(message_count))
48
49
50
if __name__ == '__main__':
51
app.logger.setLevel(logging.DEBUG)
52
from gevent import pywsgi
53
from geventwebsocket.handler import WebSocketHandler
54
55
server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
56
print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
57
server.serve_forever()

The media payload is encoded in base64. Use the built-in Python function b64decode to decode it to bytes.

Decode the base64 encoded payload

decode-the-base64-encoded-payload page anchor
1
import base64
2
import json
3
import logging
4
5
from flask import Flask
6
from flask_sockets import Sockets
7
8
app = Flask(__name__)
9
sockets = Sockets(app)
10
11
HTTP_SERVER_PORT = 5000
12
13
@sockets.route('/media')
14
def echo(ws):
15
app.logger.info("Connection accepted")
16
# A lot of messages will be sent rapidly. We'll stop showing after the first one.
17
has_seen_media = False
18
message_count = 0
19
while not ws.closed:
20
message = ws.receive()
21
if message is None:
22
app.logger.info("No message received...")
23
continue
24
25
# Messages are a JSON encoded string
26
data = json.loads(message)
27
28
# Using the event type you can determine what type of message you are receiving
29
if data['event'] == "connected":
30
app.logger.info("Connected Message received: {}".format(message))
31
if data['event'] == "start":
32
app.logger.info("Start Message received: {}".format(message))
33
if data['event'] == "media":
34
if not has_seen_media:
35
app.logger.info("Media message: {}".format(message))
36
payload = data['media']['payload']
37
app.logger.info("Payload is: {}".format(payload))
38
chunk = base64.b64decode(payload)
39
app.logger.info("That's {} bytes".format(len(chunk)))
40
app.logger.info("Additional media messages from WebSocket are being suppressed....")
41
has_seen_media = True
42
if data['event'] == "closed":
43
app.logger.info("Closed Message received: {}".format(message))
44
break
45
message_count += 1
46
47
app.logger.info("Connection closed. Received a total of {} messages".format(message_count))
48
49
50
if __name__ == '__main__':
51
app.logger.setLevel(logging.DEBUG)
52
from gevent import pywsgi
53
from geventwebsocket.handler import WebSocketHandler
54
55
server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
56
print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
57
server.serve_forever()

Once your code is all in place, start your Flask server by running this command in your terminal:

python app.py

Now your server should be running on your localhost port 5000. Congratulations! Only one thing left to do here: make sure that Twilio can reach your local web server.

We recommend that you make use of an ssh tunnel service like ngrok, which supports the wss scheme. We highly recommend installing ngrok(link takes you to an external page) if you haven't already.

Since our server is running on port 5000, we'll start a tunnel using:

ngrok http 5000

This will generate a random ngrok subdomain. Copy that URL - you'll need it in the next section.


To begin streaming your call's audio with Twilio, you can use the <Stream> TwiML verb.

Create a new TwiML Bin in Twilio Console(link takes you to an external page) or the legacy Console(link takes you to an external page) with the following TwiML:

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Start>
4
<Stream url="wss://yourdomain.ngrok.io/media" />
5
</Start>
6
<Dial>+15550123456</Dial>
7
</Response>
8

You'll need to update the above sample in two key ways:

  1. Replace the phone number nested in the <Dial> tag with your personal phone number, or the number of a friend or family member who can help you see this in action.
  2. Replace the Stream url with your new ngrok subdomain - you can find this in the terminal if ngrok is running. The url attribute must use the wss scheme (WebSockets Secure), but we're in the clear since ngrok itself uses the wss scheme.

The <Start> tag will asynchronously fork your media and immediately continue onto the next TwiML statement. Streaming will continue for the entire duration of the call unless <Stop><Stream> is encountered.

Save your new TwiML Bin, then wire it up to one of your incoming phone numbers (Twilio Console(link takes you to an external page) or the legacy Console(link takes you to an external page)) by selecting TwiML Bin in the A Call Comes In section and then selecting your bin from the list. When a call comes into that number, Twilio streams the real-time data straight to your web server.

(information)

Info

By default, Twilio will stream the incoming track - in our case, the incoming phone call. You can always change this by using the track attribute.


Find a friend or family member willing to help you test your streaming web server (or use a second phone that is different than the one you listed in your TwiML bin).

One of you should call your Twilio phone number, which will then connect the call to the number you specified in your TwiML bin. Keep an eye on your console output and start talking - you should see your conversation appear in the console as you talk!


Integrate with third-party speech providers

integrate-with-third-party-speech-providers page anchor

You can pipe your real-time audio data into external providers, such as Google Cloud Speech-to-Text(link takes you to an external page), Amazon Transcribe(link takes you to an external page), or IBM Watson Speech to Text(link takes you to an external page). All of these providers also offer language translation services.


Use cases for Media Streams with Twilio Programmable Voice

use-cases-for-media-streams-with-twilio-programmable-voice page anchor

This guide teaches the basics required for the following use cases:

Create an inbound contact center with Twilio Programmable Voice

create-an-inbound-contact-center-with-twilio-programmable-voice page anchor

You can use this guide to access live audio from incoming calls to your contact center. This allows you to perform real-time tasks like sentiment analysis or providing live agent assistance based on the conversation.

To learn more advanced features that you can use with inbound contact centers, see Voice inbound contact center.

Create transcriptions for AI or ML with Twilio Programmable Voice

create-transcriptions-for-ai-or-ml-with-twilio-programmable-voice page anchor

You can use this guide to stream raw audio data directly to your machine learning models or transcription services. By using WebSockets, you can achieve low-latency, real-time speech-to-text and automated analysis during a live call.

To learn more advanced features that you can use with AI or ML transcription, see Voice AI/ML transcription.


After following this guide, you can successfully stream real-time audio from a Twilio phone call to a local Python server using WebSockets. You have learned how to handle WebSocket events such as "connected", "start", and "media", and how to decode base64 audio payloads into raw bytes for further processing.


Explore the following guides to build on what you've learned in this guide: