PGVector for Voice: High-Dimensional Embeddings Deep Dive

By IdentityCall AI Team | Engineering | 10 min read

From Audio to Math

How do you search for a "voice" in a database? You can't query WHERE voice = 'John'.
You need Vector Embeddings.

This article explains how IdentityCall utilizes PostgreSQL + pgvector to perform lightning-fast biometric identification on millions of voice prints.

The Vectorization Pipeline

Spectrogram Generation: The raw audio (WAV) is converted into a Mel-frequency cepstrum (spectrogram).
Encoder Network (ResNet/interaction): A deep neural network processes the image-like spectrogram.
Embedding Extraction: The network's penultimate layer outputs a fixed-size array of numbers (float32).
- Dimension: 256 for Voice ID (compact, identity-focused).
- Dimension: 768 for Semantic Search (richer, meaning-focused).

Example Embedding: [-0.024, 0.551, -0.112, ...]

Why PGVector?

Traditionally, vectors were stored in specialized DBs (Pinecone, Milvus).
pgvector allows us to store embeddings alongside relational data (User ID, Call Logs) in our primary Postgres database.

The "Cosine Similarity" Distance

To find a match, we don't look for an exact match (audio is never identical). We look for the "nearest neighbor" in the 256-dimensional space.
We use Cosine Distance (identifying the angle between two vectors).

Distance 0.0: Identical.
Distance < 0.2: High Confidence Match (Same Speaker).
Distance > 0.4: different Speaker.

Indexing for Speed (IVFFlat vs HNSW)

Scanning 1 million vectors linearly is too slow (latency!). We use indexing.

HNSW (Hierarchical Navigable Small World): The gold standard. It builds a graph allowing the search to "zoom in" on the relevant neighborhood of vectors in milliseconds.

-- Example PGVector Query
SELECT id, speaker_name, 1 - (embedding <=> '[...]'::vector) AS confidence
FROM voice_profiles
ORDER BY embedding <=> '[...]'::vector
LIMIT 1;

The Architecture Advantage

By keeping vectors in Postgres, we maintain ACID compliance and Referential Integrity. We can delete a User and cascade-delete their voice vectors instantly—a huge win for GDPR (Right to be Forgotten).

Store the voice. Query the math. Build with IdentityCall API.