Call transcription is the conversion of a voice or video call audio track into written words to be stored as plain text in a conversational language. Call transcription can either be live - as a call or event happens - or based on the recording of a past conversation.
Call transcription is an important and powerful tool for business, training, medical, or legal reasons. As text has far more advanced search and analysis features available than audio, a text-based history of conversations is necessary (or superior) for many use cases. Additionally, real-time speech-to-text transcription services (such as Closed Captioning) are used to increase accessibility, improving understanding for people who are hard-of-hearing or new to a language.
When it comes to voice calls, call transcription is often used in a business context, for example, to improve training and feedback for call center employees. Logging the context and words spoken in a call can help you identify business problems algorithmically, making it easier to deploy resources in an evidence-based manner. Additionally, call transcriptions and recordings are valuable for legal purposes, where contemporaneous transcriptions, recordings, and notes are superior to other types of records.
Twilio allows you to add call transcriptions to our Programmable Voice product. For recorded transcriptions, you can use our REST API's provisions to translate recordings to speech. Twilio additionally has a real-time transcription service with multiple language support and contextual analysis and Natural Language Processing support. Talk to Sales about your call transcription requirements for information on that product.
Note that call transcription legality differs by locality. For some localities, transcribing recorded calls, recording calls or even transcribing real-time speech over a call or video is banned or requires informed consent by some or all parties in a conversation. Twilio cannot comment on the specifics of your local laws; you'll have to read the relevant laws or consult with your legal representation for your unique situation.
Because of differences in volume, accents, timing, and connection quality, the final mixed track of a voice or video call can often be unintelligible even for professional human transcribers. So-called Single-Channel Recordings only store the one final mixed track pre-transcription, which can vastly increase the eventual number of transcription errors - especially if participants are speaking at the same time.
With the highest accuracy call transcription solutions, both (or all) sides of the call are recorded separately. With individual recordings, a Dual-Channel Recording solution (or Multi-Channel Recording solution) is superior for eliminating cross-talk and cancellation noise which would otherwise interfere with the final mix. It also prevents most (or all) misattribution errors.
See more about our dual-channel call transcription options, here.
The Gather or Record TwiML Voice verbs both support eventual transcribing, while our Phone Call Speech Transcription Product can help you with your real-time requirements. Also, speak to sales about Natural Language Processing and determining caller intent or sentiment in real-time.