Speech-to-text (ASR)

Converting spoken audio into written text, also called automatic speech recognition.

Speech-to-text, also called automatic speech recognition or ASR, is the technology that converts spoken audio into written text. It is the foundation of call transcription: before a call can be searched, scored, or analyzed, the words have to be turned into text.

Quality varies with audio conditions, accents, and language. In call intelligence, speech-to-text is the input step; the value comes from what is done with the transcript afterward, such as scoring, categorization, and extraction.

AI transcription Speaker diarization

← All glossary terms

Related