Vector databases store and search mathematical representations of meaning. That’s it. The hype around them is enormous, but the core concept is straightforward.

What Are Embeddings?

Embeddings convert text (or images, audio) into arrays of numbers — typically 384 to 3072 dimensions. Similar meanings produce similar vectors. “The cat sat on the mat” and “A feline rested on the rug” will have nearly identical embeddings despite sharing no words.

from openai import OpenAI

client = OpenAI()

def create_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def similarity_search(
    query: str,
    index: list[dict],
    top_k: int = 5
) -> list[dict]:
    query_vec = create_embedding(query)
    scored = []
    for doc in index:
        score = cosine_similarity(query_vec, doc["embedding"])
        scored.append({**doc, "score": score})
    return sorted(scored, key=lambda x: x["score"], reverse=True)[:top_k]

Do You Actually Need One?

FeatureTraditional DBVector DB
Exact matchExcellentPoor
Semantic searchImpossibleExcellent
Filtering + searchGreatImproving
Cost per queryLowHigher
MaturityDecadesYears

When to Use What

If you’re building semantic search, recommendation systems, or RAG pipelines, you need vector search. But consider starting with pgvector in PostgreSQL before reaching for a dedicated vector database. The operational overhead of a new database is significant, and PostgreSQL handles most workloads fine up to millions of vectors.

The right question isn’t “which vector database?” — it’s “do I need vector search at all?”