Skip to main content

Speaker diarization

Splitting a recording into who spoke when.

Speaker diarization is the process of partitioning an audio recording by speaker, answering the question of who spoke when. It separates a two-person call into distinct turns so a transcript reads as a conversation rather than a single block of text.

Diarization is different from speaker identification: it groups speech by speaker without necessarily knowing the speakers’ names. Labeling those groups as "Speaker 1" and "Speaker 2" is diarization; matching them to a known person is identification.

Related

← All glossary terms