Skip to contentSkip to navigationSkip to topbar
On this page

Voice Intelligence - Best Practices


Twilio Voice Intelligence transcribes your call recordings to then generate data insights from your conversations. This document includes some best practices for working with the recordings intended for transcription, assigning Participants to the transcripts, and using webhooks.


Use dual-channel recordings

use-dual-channel-recordings page anchor

Voice Intelligence automatically transcribes and analyzes dual-channel media files of Recordings when available. For a dual-channel Recording, Twilio stores the audio from the Recording on two different tracks in the same media file. Using dual-channel recordings with Voice Intelligence provides not only higher accuracy, but also adds the ability to map and override participants with additional metadata for search and business reporting.

Twilio two-party Call Recordings are dual-channel by default.

For Conference Recordings, you need to enable Dual-channel Conference Recordings on the Voice Settings page in the Console(link takes you to an external page).

Audio contained on each channel

audio-contained-on-each-channel page anchor
  • For two-party calls, one channel contains the audio from one call leg, and the other channel contains the audio from the other call leg.
  • For Conferences, the audio from the first Participant that joined the Conference is on one channel, and the rest of the audio from the Conference is mixed together and is contained on the second channel.
(warning)

Warning

By default, Voice Intelligence treats channel 1 of a dual-channel recording as the "Agent" audio and channel 2 as the "Customer" audio. This is true for two-party calls and Conferences.

If this doesn't match your application's implementation of Recordings, you can do one of the following:

  • Update your application's logic so that the "Agent" is always on the first channel. For two-party calls, that is the first call leg. For Conferences, that is the first Participant that joined the recorded Conference.
  • If your application logic has all of your "Agents" on channel 2 and all of your "Customers" on channel 1, reach out to Twilio Support to invert the Agent/Customer Voice Intelligence labeling at the Account level. This affects all Recordings within that Account.
  • Specify Participant information in the request to create a Transcript. This should be used only if the first two options are not possible with your application.

Make third-party media recordings public

make-third-party-media-recordings-public page anchor

Voice Intelligence supports third-party media recordings. If your call recordings aren't stored in Twilio and you want to use them with Voice Intelligence, the recordings need to be publicly accessible for the duration of transcription. The recordings can be hosted or better used on a time-limited pre-signed URL. For example, to share a recording on an existing AWS S3 bucket, please follow this guide(link takes you to an external page). Then add the public recording URL to the media_url when creating a Transcript.


Create an audio recording from Twilio Video

create-an-audio-recording-from-twilio-video page anchor

If you use Twilio Video and want to transcribe the audio of a Twilio Video recording, it needs additional processing to create an audio recording that can be submitted for transcription.

To create a dual-channel audio recording first, transcode a separate audio-only composition for each participant in the Video Room.

Create a dual-channel audio recording

create-a-dual-channel-audio-recording page anchor
1
curl -X POST "https://video.twilio.com/v1/Compositions" \ --data-urlencode "AudioSources=PAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2
\ --data-urlencode "StatusCallback=https://www.example.com/callbacks"
3
\ --data-urlencode "Format=mp4"
4
\ --data-urlencode "RoomSid=RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
5
\ -u $TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN

Next, download the Media from these compositions and merge them into a single audio stereo audio.

Download the Video Room Media

download-the-video-room-media page anchor
ffmpeg -i speaker1.mp4 -i speaker2.mp4 -filter_complex "[0:a][1:a]amerge=inputs=2[a]" -map "[a]" -f flac -bits_per_raw_smaple 16 -ar 441000 output.flac

In case the recording duration for each participant is different, you can avoid overlapping audio tracks. Use ffmpeg to create a single-stereo audio track with delay to cover the difference in track length. For example, if one audio track last 63 seconds and the other 67 seconds, use ffmpeg to create a stereo file with the first track, with four seconds of delay to match the length of the second track.

Create a single stereo audio track

create-a-single-stereo-audio-track page anchor
ffmpeg -i speaker1.wav -i speaker2.wav -filter_complex "aevalsrc=0:d=${second_to_delay}[s1];[s1][1:a]concat=n=2:v=0:a=1[ac2];[0:a]apad[ac1];[ac1][ac2]amerge=2[a]" -map "[a]" -f flac -bits_per_raw_sample 16 -ar 441000 output.flac

Finally, send a CreateTranscript request to Voice Intelligence by providing a publicly accessible URL for this audio file as media_url in MediaSource.


Include metadata for Call participants

include-metadata-for-call-participants page anchor

By default, Voice Intelligence assumes Participant One is on channel One, and Participant Two is on channel Two and associates a phone number from the recording. Since a recording can be created in different ways, this assumption may not work for all use cases.

For any such cases and/or the need to attach additional metadata to call participants, it's recommended to use the Voice Intelligence APIs to create a Transcript by providing optional Participant metadata and mapping the participant to the correct audio channel.


Provide a CustomerKey with the CreateTranscript API

provide-a-customerkey-with-the-createtranscript-api page anchor

Providing a CustomerKey with the CreateTranscript API allows you to map a Transcript to an internal identifier known to you. This can be a unique identifier within your system to track the transcripts. The CustomerKey is also included as part of the webhook callback when the results for Transcript and Operators are available. This is an optional field and cannot be substituted for Transcript Sid in APIs.


Check the status of the transcript with the webhook callback

check-the-status-of-the-transcript-with-the-webhook-callback page anchor

Use the webhook callback to know when a create Transcript request has completed and when the results are available. This is preferable to polling the GET /Transcript endpoint. The webhook callback URL can be configured on the Voice Intelligence Service settings.


Need some help?

Terms of service

Copyright © 2024 Twilio Inc.