Simulcast is a standardized technique used to retain media quality when some subscribers have limited bandwidth. It is a mechanism for providing scalability to non-scalable video codecs such as VP8.
With simulcast, a client sends multiple versions of the same video simultaneously. Each version is encoded independently at a different resolution and frame rate; this way, a subscriber with limited bandwidth can receive a lower quality version of the media, but subscribers with more bandwidth can receive a higher quality video and their media quality is not degraded.
Twilio Video provides the option to use VP8 simulcast via Twilio's Selective Forwarding Unit (SFU).
SFUs only forward media and can neither transcode nor modify the video. When sending media with unicast (only sending one version of the video track), publishers need to reduce quality to adapt to the worst of their subscribers' bandwidths so that no subscriber is congested.
With VP8 simulcast, the SFU can forward higher quality videos to higher bandwidth subscribers and lower quality videos to lower bandwidth ones. The track publisher sends different track qualities and the SFU selects the most optimal quality for each subscriber. Then, the subscriber receives a single VP8 encoded video that is most suited for their network conditions.
The following illustration shows the difference between unicast and simulcast.
Simulcast offers the following benefits Participants:
On the other hand, simulcast also has some drawbacks:
Twilio's standard VP8 simulcast sends up to three layers of video at different resolutions. See the approximate resolutions of each layer here. In some video conferencing contexts, the higher resolution layers that consume the most resources to encode may not be needed.
Twilio Video offers adaptive simulcast, which enables and disables simulcast layers dynamically to improve bandwidth and CPU usage. This helps save device resources in cases such as presentation and grid modes, when the application does not need a Participant's highest resolution video. Adaptive simulcast ensures that publishers are only encoding the spatial layers needed at a given moment.
For example, when someone is presenting in a video conference, you will frequently only display the presenter's video in a large format, and will display only thumbnails of other participants' video. The participants who are not presenting do not need to encode and send higher resolution video layers because their video is not highlighted. The same might also be true in a video conference in grid mode, where each participant's video is the same size and no one's video needs to be the highest quality.
In these situations, adaptive simulcast can detect which layers are being used by subscribers and automatically turn off encoding on the publisher's side for higher spatial layers that are not being used. As speakers change, adaptive simulcast will dynamically turn on or turn off the appropriate spatial layers, based on what subscribers in the room are using.
Note that adaptive simulcast will not disable any video layers when the Room is being recorded, to help produce high quality recordings.
Adaptive simulcast is currently available in the Twilio Video JavaScript SDK, Android SDK, and iOS SDK. Please review:
for more information. If your application is currently using VP8 simulcast, we recommend that you switch to adaptive simulcast.
Twilio SDKs encode up to three spatial layers when simulcast is enabled. The following table illustrates which layers are typically generated given a particular capture resolution. Remark that this is just an approximation and that the real behavior may be slightly different. In the table, disabled
means that layer is not sent in those conditions. (Video of the specified resolution is not generated by the
publisher and is not available at the SFU to be forwarded to subscribers).
Capture resolution | Layer 1 | Layer 2 | Layer 3 |
---|---|---|---|
352x288 | 352x288 | disabled | disabled |
480x360 | 240x180 | 480x360 | disabled |
640x480 | 320x240 | 640x480 | disabled |
640x480 (with crop) | 240x240 | 480x480 | disabled |
960x540 | 240x135 | 480x270 | 960x540 |
1024x768 | 256x192 | 512x384 | 1024x768 |
1024x768 (with crop) | 240x192 | 480x384 | 960x768 |
1280x720 | 320x180 | 640x360 | 1280x720 |
1280x720 (with crop) | 225x180 | 450x360 | 900x720 |
1920x1080 | 480x270 | 960x540 | 1920x1080 |
The following table illustrates Twilio's current support for simulcast:
Twilio Video SDK | Browser (or N/A) | VP8 Simulcast Support |
---|---|---|
JavaScript | Chrome | Yes (SDK v1.7.0+) |
JavaScript | Firefox | No |
JavaScript | Safari | Yes (Safari 12.1+ with SDK 1.17.0+) |
Android | N/A | Yes (SDK v2.1.0+) |
iOS | N/A | Yes (SDK v2.1.0+) |
Simulcast is disabled by default. You can enable simulcast on a per-Participant basis when connecting to a Room.
To enable adaptive simulcast, set preferredVideoCodecs="auto"
in ConnectOptions
when connecting to a video Room. The SDK will use VP8 simulcast, and will enable/disable simulcast layers dynamically, thus improving bandwidth and CPU usage.
Adaptive simulcast works best when used along with Client Track Switch Off Control
and Video Content Preferences
. These two flags allow the SFU to determine which simulcast layers are needed, thus allowing it to disable the layers not needed on publisher side.
1const { connect } = require('twilio-video');23const room = await connect(token, {4preferredVideoCodecs: 'auto',5bandwidthProfile: {6video: {7contentPreferencesMode: 'auto',8clientTrackSwitchOffControl: 'auto'9}10}11});
Please note the following limitations with adaptive simulcast in the JavaScript SDK:
preferredVideoCodecs="auto"
will revert to unicast in the following cases:
You can enable standard simulcast by setting simulcast: true
in ConnectOptions
when connecting to a video Room.
1// Web JavaScript2// Remember that simulcast only needs to be enabled in media publishers3// See compatibility table above with supported browsers and required SDK versions45const room = await connect(token, {6preferredVideoCodecs: [7{ codec: 'VP8', simulcast: true }8]9});
Any Participant with VP8 simulcast enabled publishes all their video tracks using VP8 simulcast. Once this is done, Twilio's video infrastructure leverages simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.
By default, simulcast is disabled. You can enable simulcast on a per-Participant basis when connecting to a Room. This is done using the ConnectOptions
as shown in the following code snippet:
1// Swift code2// Remember that simulcast only need to be enabled in media publishers3// See compatibility table above to with required SDK versions45let connectOptions = ConnectOptions(token: accessToken) { (builder) in6builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)]7}
Any Participant with VP8 simulcast enabled publishes all its video tracks using VP8 simulcast. Once this is done, Twilio's video infrastructure leverages simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.
By default, simulcast is disabled. You can enable simulcast on a per-Participant basis when connecting to a Room. This is done using the ConnectOptions
as shown in the following code snippet:
1// Java code2// Remember that simulcast only need to be enabled in media publishers3// See compatibility table above to with required SDK versions45ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken).preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))).build();
Any Participant with VP8 simulcast enabled publishes all its video tracks using VP8 simulcast. Once this is done, Twilio's video infrastructure leverages simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.
To optimize video quality while minimizing CPU usage and bandwidth, it is recommended to use VP8 simulcast with the capture settings suggested below on each mobile platform.
24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 software encoder.
iOS devices support high resolution capture formats with ratios of 1.33:1 and 1.77:1. When simulcasting, it is often desirable to produce a squarish ratio (1.25:1) that can be viewed by subscribers in landscape or portrait, and as smaller thumbnails. Cropping is performed at the source by using a format request. Besides changing the ratio of the captured video, cropping also reduces the number of pixels that need to be processed by the software encoder. Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 is recommended on older iPhones and will result in 2-layer simulcast.
It is recommended to remove the rotation tags using hardware acceleration using this API. Also, it is recommended to reduce the audio bitrate tuned for speech content.
The above recommendations are implemented in this code snippet:
1struct CaptureDeviceUtils {23// Produce 3 spatial layers ~ {960x768, 480x384, 240x192}. 1024x768 is captured on most phones4// Produce 3 spatial layers ~ {900x720, 450x360, 225x180}, 1280x720 is captured on on iPhone X5static let kSimulcastVideoDimensions = CMVideoDimensions(width: 900, height: 720)6static let kSimulcastVideoFrameRate = UInt(24)7static let kSimulcastVideoBitrate = UInt(1800)89/*10* @brief Finds the smallest format that is suitably close to the ratio requested.11*12* @param device The AVCaptureDevice to query.13* @param targetRatio The ratio that is preferred.14*15* @return A format that satisfies the request.16*/17static func selectFormatBySize(device: AVCaptureDevice,18targetSize: CMVideoDimensions) -> VideoFormat {19// Arranged from smallest to largest.20let formats = CameraSource.supportedFormats(captureDevice: device)21var selectedFormat = formats.firstObject as? VideoFormat22for format in formats {23guard let videoFormat = format as? VideoFormat else {24continue25}26if videoFormat.pixelFormat != PixelFormat.formatYUV420BiPlanarFullRange {27continue28}29let dimensions = videoFormat.dimensions30// Cropping might be used if there is not an exact match.31if (dimensions.width >= targetSize.width && dimensions.height >= targetSize.height) {32selectedFormat = videoFormat33break34}35}36return selectedFormat!37}3839let options = CameraSourceOptions { (builder) in40// Stripping rotation tags using hardware acceleration41builder.rotationTags = .remove42}43camera = CameraSource(options: options, delegate: self)4445// Assume front camera is available46let frontCamera = CameraSource.captureDevice(position: .front)47if let camera = camera {48localVideoTrack = LocalVideoTrack(source: camera, enabled: true, name: "Camera")4950// Discover a simulcast format for the front camera51let format = CaptureDeviceUtils.selectFormatBySize(device: frontCamera!,52targetSize: CaptureDeviceUtils.kSimulcastVideoDimensions)5354// Lower the frame rate to reduce CPU load, but still produce 3 temporal layers (f, f/2, f/4)55format.frameRate = CaptureDeviceUtils.kSimulcastVideoFrameRate5657// Apply slight cropping to reduce CPU load, and provide square-ish video58let croppedFormat = VideoFormat.init()59croppedFormat.dimensions = CaptureDeviceUtils.kSimulcastVideoDimensions60camera.requestOutputFormat(croppedFormat)6162camera.startCapture(device: device, format:format) { (captureDevice, videoFormat, error) in63if let error = error {64self.logMessage(messageText: "Capture failed with error.\ncode = \((error as NSError).code) error = \(error.localizedDescription)")65}66}67}6869let connectOptions = ConnectOptions(token: accessToken) { (builder) in70if let localVideoTrack = localVideoTrack {71builder.videoTracks = [localVideoTrack]72}73builder.isNetworkQualityEnabled = true74builder.networkQualityConfiguration =75NetworkQualityConfiguration(localVerbosity: .minimal, remoteVerbosity: .minimal)76// Enable Vp8 simulcast, and cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content.77builder.encodingParameters = EncodingParameters(audioBitrate:16, videoBitrate:1800)78builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)]79}
24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 encoder.
Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 for video capture will result in a 2-layer simulcast.
It is recommended to reduce the audio bitrate tuned for speech content.
The above settings are specified as part of the Video Format API as shown in the code snippet below:
1import tvi.webrtc.MediaCodecVideoEncoder;23VideoDimensions videoDimensions = VideoDimensions.VGA_VIDEO_DIMENSIONS;4if (MediaCodecVideoEncoder.isVp8HwSupported()) {5videoDimensions = VideoDimensions.HD_720P_VIDEO_DIMENSIONS;6}7VideoFormat videoFormat = new VideoFormat(videoDimensions, 24);89LocalVideoTrack localVideoTrack = LocalVideoTrack.create(context, true, videoCapturer, videoFormat);1011// Enable network quality information for local and remote participants12NetworkQualityConfiguration configuration =13new NetworkQualityConfiguration(14NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL,15NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL);1617ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken)18.enableNetworkQuality(true)19.networkQualityConfiguration(configuration)20.videoTracks(Collections.singletonList(localVideoTrack))21// Cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content.22.encodingParameters(new EncodingParameters(16, 1800)23// Enable Vp8 simulcast24.preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))) // Enable simulcast25.build();