Vector databases store and search mathematical representations of meaning. That’s it. The hype around them is enormous, but the core concept is straightforward.
What Are Embeddings?
Embeddings convert text (or images, audio) into arrays of numbers — typically 384 to 3072 dimensions. Similar meanings produce similar vectors. “The cat sat on the mat” and “A feline rested on the rug” will have nearly identical embeddings despite sharing no words.
from openai import OpenAI
client = OpenAI()
def create_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def similarity_search(
query: str,
index: list[dict],
top_k: int = 5
) -> list[dict]:
query_vec = create_embedding(query)
scored = []
for doc in index:
score = cosine_similarity(query_vec, doc["embedding"])
scored.append({**doc, "score": score})
return sorted(scored, key=lambda x: x["score"], reverse=True)[:top_k]
Do You Actually Need One?
| Feature | Traditional DB | Vector DB |
|---|---|---|
| Exact match | Excellent | Poor |
| Semantic search | Impossible | Excellent |
| Filtering + search | Great | Improving |
| Cost per query | Low | Higher |
| Maturity | Decades | Years |
When to Use What
If you’re building semantic search, recommendation systems, or RAG pipelines, you need vector search. But consider starting with pgvector in PostgreSQL before reaching for a dedicated vector database. The operational overhead of a new database is significant, and PostgreSQL handles most workloads fine up to millions of vectors.
The right question isn’t “which vector database?” — it’s “do I need vector search at all?”