This page is for reference only. We are no longer onboarding new customers to Programmable Video. Existing customers can continue to use the product until December 5, 2024.
We recommend migrating your application to the API provided by our preferred video partner, Zoom. We've prepared this migration guide to assist you in minimizing any service disruption.
Twilio Video is a programmable real-time communications platform that allows you to add video chat functionality to your web, iOS, and Android applications. The platform provides APIs, SDKs, and helper tools to capture, distribute, record, and render high quality audio and video applications.
- What is Twilio Video
- Demo application
- How to build a Twilio Video application
- Resources for getting started
- Resources for building and scaling
- Diagnostic and troubleshooting tools
- Networking considerations
- Integrate with other Twilio services
Twilio Video is a real-time communications platform built on top of WebRTC. Using Twilio’s REST APIs and client-side SDKs, you can build video chat functionality into your application. Twilio Video also provides global STUN/TURN relays, media services for large-scale group conferences and recording, and signaling infrastructure so that you can build scalable applications.
The programmable aspect of Twilio Video allows you to have full control over how video appears in your application. You are not constrained to any particular formats and can calibrate performance based on your use case. Twilio Video offers a wide range of tools to customize, troubleshoot, and optimize your video applications.
Twilio Video provides signaling, user access management, media processing, and media delivery to enable real time communications. Media exchange — the sharing of audio, video, and other data with video participants — happens either directly peer-to-peer or through Twilio’s servers, depending on the type of Video Room you choose to use. Signaling is managed in Twilio’s global infrastructure and is the process of discovery and negotiation to set up, control, and end an RTC session.
Rooms are the core building block of a Twilio Video experience. Participants join a Room and can then exchange audio, video, and other data in real time with one another.
There are two different ways media is exchanged in a video Room, depending on the type of Room chosen:
- Peer-to-Peer: Participants in Peer-to-Peer Rooms exchange media directly. Twilio infrastructure acts as the signaling server and helps each Participant create a direct connection to every other Participant in the Room for sending audio and video data.
- Twilio offers a free version of Peer-to-Peer Rooms, WebRTC Go, which can have up to two Participants and is a great option when you’re starting to build with Twilio Video.
- For small group peer-to-peer video chats, you can select the Peer-to-Peer (P2P) Room type.
- Group: In Group Rooms, a Participant exchanges media with the Twilio Cloud, which acts as a Selective Forwarding Unit (SFU). Group Rooms can have up to 50 concurrent Participants and allow additional functionality such as recordings, Twilio Voice Participants, dominant speaker detection, and more.
See Understanding Video Rooms to compare the different Room types and their capabilities.
A Twilio Video application requires both a frontend and a backend component:
- The application server (backend) is required to generate Access Tokens for Participants. You can also use an application server to interact with Twilio’s APIs to create and manage Room settings or Recordings.
You can learn more in the Basic Concepts video documentation.
If you'd like to start exploring Twilio Video with a pre-built video conferencing application, you can deploy Twilio’s open source demo application built with ReactJS to get started in just a few minutes.
The steps below outline the general flow you’ll follow when creating a multi-participant video application with Twilio Video. Jump down to Resources for Getting Started if you want to see specific resources that you can use for building your first video application.
The three main steps are:
- Create an Access Token server: this is the backend component that generates Access Tokens so that end users can join a Room
- Create a Video Room: create the Room where Participants will share their audio and video. You can create this on the server side using Twilio’s REST API, or on the client side.
Participants will need an Access Token to connect to a Room. Access Tokens are JSON Web Tokens (JWT). They are short-lived credentials that are signed with your Twilio credentials. They contain grants (in the case of video applications, a video grant) that govern the actions the client holding the token is permitted to perform. This ensures that your application has full control of who is authorized to join the Room.
Access Token generation happens on the server side of your application. You can use Twilio’s helper libraries to generate an Access Token.
If you do not want to host your own server to create Access Tokens, you can use a serverless Twilio Function to create an Access Token server hosted in Twilio’s cloud. See an example of how to generate Access Tokens without a server or an example of creating a serverless video application in the Twilio Blog.
You can create a Video Room via the REST API in your backend server, or you can create and join rooms on the client side.
With client-side Room creation, you do not create Rooms before Participants join them. The first time a Participant tries to connect to a Room using an Access Token, Twilio will check to see if a Room with the specific name exists in your account. If it does not, Twilio will create the Room, following the default Room settings you have configured in the Default Room Settings section of the Twilio Console. If it does exist, Twilio will add the Participant to the existing Room.
You can also create Rooms before Participants join them via the Twilio Video REST API.
Using this method, you can specify settings for the Room when you create it with a POST request. For example, you can specify the room type, maximum number of Participants, maximum duration, etc. If you do not explicitly set these values when creating the Room, the Room settings will default to the settings you configured in the Twilio Console.
Check out Understanding Video Rooms for more information about the difference between creating Rooms via the REST API versus the client-side.
Once you are able to generate Access Tokens and have chosen how you’ll create Video Rooms, you’ll use a frontend SDK to create the client-side interface for the application. Note that each SDK’s Getting Started Guide has code samples for how to perform the following steps for that specific SDK.
First, your application should fetch an Access Token for the end user from your Access Token server. Then, the frontend application will use that Access Token to connect to a Room. Once a user joins the Room with an Access Token, they become a Participant in the Room.
All Participants have tracks, which are streams of data generated by a microphone, camera, or other source. There are three types of Participant tracks:
- Video: data from video sources such as cameras or screens
- Audio: data from audio inputs such as microphones
- Data: other data generated by a participant within the application. This can be used for features like building a whiteboarding application, in-application chat, and more.
Video Room tracks follow a publish/subscribe pattern. A Participant publishes their video, audio, and/or data tracks, and all other Participants can subscribe to those published tracks. Your application receives the data from all the tracks you have subscribed to, and you can choose how to display or play that data on the page.
Through the Participant track model, you can have fine-grained control over which tracks you display in your application. This can allow you to implement functionality such as muting/unmuting, presentation mode, paginating Participants’ videos, hiding Participants from others, and more.
Note that in Peer-to-Peer Rooms, all data from Participants’ tracks are sent directly to and from peers. In Group Rooms, all data goes from a Participant to the Twilio SFU, which then forwards that data to other Participants.
Twilio acts as the signaling server for both Peer-to-Peer and Group Rooms. Twilio will send your frontend application notifications about events such as Participants connecting or disconnecting from a Room, or Participants publishing/unpublishing tracks and subscribing/unsubscribing from tracks.
Your application should listen for these signaling events so it can handle them appropriately. For example, your application should listen for the “participantDisconnected” event so it can stop displaying a disconnected Participant’s inactive data stream. You can learn more about the types of events that Twilio will signal in the documentation for the client-side SDK you are using.
You have a fully functioning multi-party video application once you have:
- An Access Token Server
- A frontend application that:
- Retrieves Access Tokens and connects to a Video Room
- Displays and plays Participant tracks (audio and video, as needed)
- Listens for signaling events such as Participants connecting and disconnecting from the Room, or publishing and unpublishing tracks.
Once you have a working video application that performs these actions, there is much more you can add on to it in terms of functionality and additional tooling. Read on for more details about what Twilio Video offers.
There are many resources you can explore when starting to build your first Twilio Video applications, depending on how you like to learn. You can follow a tutorial, read documentation for SDKs and APIs, or deploy a pre-built sample video application.
Twilio’s Blog has many posts about building applications with Twilio Video. You can explore many different Twilio features and see examples using a variety of languages and frameworks. To find all Video blog posts, filter posts for the “Video” tag. You can also find translated blog posts on the Twilio Blog.
Learn more about building a video application with each client-side SDK with Getting Started guides.
Twilio’s CodeExchange is a repository of code samples for common Twilio use cases.
- List of all CodeExchange Video applications
Quickstart applications are minimal Twilio Video applications that demonstrate the basics of working with Twilio Video. Use these to get started with a small demo application that you can then deconstruct or add on to and understand core Twilio Video components.
Quick Deploy applications are more full-featured than the Quickstart applications above. They demonstrate a wide variety of Twilio Video functionality and can be used to quickly get started with a robust set of Video tools. They are open-source and you can use or alter them in any way to fit your video conferencing use case.
Once you have started your video application, there is a lot of functionality and tooling you can add on top of it to enhance, customize, and optimize the app.
Twilio Group Rooms allow you to record Room content. Because all media passes through Twilio’s SFU when you use Group Rooms, Twilio can save that media for you to retrieve after a Room is completed.
Each Participant track is recorded and stored as a separate file. You can choose to record all tracks in a Room, or specify exactly which Participants and which tracks you want to capture. After you have recorded a Room, you can customize the layout of the final recorded video using Compositions. Twilio’s Composition service takes individual track recordings, formats them visually according to your specifications, and creates an output file in mp4 or webm format.
You can choose to store Recordings and Compositions in Twilio’s Cloud, or set up external AWS S3 storage.
Learn more in the Understanding Recordings and Compositions Guide.
There are many factors that influence the quality of a video call. Some of those factors are related to an end-user’s network and device setup, and Twilio has tools to provide end-user feedback about their connectivity before they join a call.
There are also tools and guides you can use to improve the video call experience for all Participants based on your call use case.
Twilio has detailed recommendations and best practices for video calls. Check out the Developing High Quality Video Applications guide for in-depth suggestions about how to enhance call quality in Peer-to-Peer and Group Rooms, depending on the video use case.
The following tools are referenced in the Developing High Quality Video Applications guide and can be used to enhance quality in Group Rooms:
- Network Quality API: Monitors Participants’ networks and provides quality metrics, allowing you to display quality scores in your UI and understand when a user’s network quality changes.
- Network Bandwidth Profile API: Specify how the downlink bandwidth of a Group Room Participant should be distributed among its subscribed tracks. This API allows developers to assign higher bandwidth to higher priority tracks, protect audio quality, and keep network and battery resources under control. This is often used in conjunction with the Track Priority API and Simulcast.
- Track Priority API: Set the priority of tracks so that the most important tracks (for example, a presenter’s screen share and video) so that the Network Bandwidth Profile API can allocate available bandwidth appropriately.
- Dominant Speaker Detection: In a multi-party video application, the dominant speaker is the Participant sharing the loudest audio track in the Room. The Dominant Speaker Detection API sends events to your application every time the dominant speaker changes. You can use the Dominant Speaker API to dynamically set track priorities based on who is actively speaking.
Simulcast is a scalable video codec available for Group Rooms. You can use simulcast to provide the right quality of video to each Participant based on their available bandwidth. With simulcast, Twilio’s Selective Forwarding Unit (SFU) forwards higher quality videos to higher bandwidth subscribers and lower quality videos to lower bandwidth ones. You can specify which tracks are the highest priority to make sure bandwidth is allocated appropriately and automatically switch off tracks if a Participant’s network is too congested.
Twilio also offers adaptive simulcast, which enables and disables simulcast layers dynamically to improve bandwidth and CPU usage. This helps save device resources in cases such as presentation and grid modes, when the application does not need a Participant's highest resolution video. Adaptive simulcast ensures that publishers are only encoding the spatial layers needed at a given moment.
Learn more in Working with VP8 Adaptive Simulcast.
By default in Video Rooms, Participants share their audio and video tracks. You can additionally create data tracks to share other data among Participants. You can use the DataTrack API to develop features like in-application chat or drawing. Twilio has a demo app, Draw with Twilio, that demonstrates using data tracks to create a virtual whiteboard. Check out the live demo here.
Learn more in the DataTrack API tutorial.
Learn how to capture a Participant’s screen to share in a Room as a video track.
Twilio has several tools you can use to gain insight into your video applications and provide feedback to end users about their setup and connectivity. You can use these tools for debugging applications and providing feedback to users about their input devices and bandwidth before they join a video call.
- Video Insights: Provides analytics and aggregations in the Twilio Console for observing your application, discovering trends, and troubleshooting Rooms and Participants.
- Preflight API: Provides functions for testing connectivity to the Twilio Cloud. The API can identify signaling and media connectivity issues and provides a report at the end of the test.
- RTC diagnostics SDK: Provides functions to test a Participant’s input and output devices, including microphones, speakers, and cameras, as well as functionality to confirm that a Participant meets the network bandwidth requirements required to make a voice call or conduct a video call.
Twilio Video uses WebRTC to provide real-time video and audio communication in Rooms. Review the list of ports and protocols that Twilio uses during video calls so that you can help end-users connect appropriately to your application.
Additionally, you can learn more about locations of Twilio servers and global low latency. Connecting to Twilio infrastructure that is closer to your end-users will help reduce round-trip-time and latency on video calls.
There are many ways you can integrate other Twilio services into your Video application. Below are several services you might consider adding:
- Voice: Add PSTN and SIP callers as Participants in Group Rooms
- Conversations: Add text-based chat into your video application
- Sync: Synchronize state in real time between browsers, mobile devices, and Twilio’s cloud