PostgreSQL + pgvector: The Underestimated Stack for AI
Pinecone/Weaviate cost $70-300/month. PostgreSQL + pgvector: $0 extra on your existing DB. Same performance.
I've been operating a multi-agent AI system in production with PostgreSQL + pgvector for 18 months. Result: 0 downtime, predictable costs, data 100% under control, performance equivalent to specialized vector DBs.
The AI industry is obsessed with Pinecone, Weaviate, Qdrant. But for most B2B applications, PostgreSQL + pgvector is objectively superior.
The Problem with Specialized Vector Databases
Pinecone, Weaviate, and similar services offer an attractive proposition: "drop-in vector search". But they come with hidden costs:
- Unpredictable pricing: Pinecone Standard: $70/month base + $0.40/1M queries. At 10M queries/month: $170/month
- Severe vendor lock-in: Proprietary APIs, complex migration
- Data fragmentation: Embeddings in Pinecone, metadata in PostgreSQL = 2 systems to maintain
- Network latency: External API calls add 50-200ms vs local queries
- No SQL joins: Impossible to combine vector searches with complex relational queries
pgvector: PostgreSQL Extension for Vector Search
pgvector is an open-source extension that adds vector support to PostgreSQL. Installation: 1 SQL line:
CREATE EXTENSION vector;Once installed, you can store embeddings as vector type columns:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536), -- OpenAI ada-002 dimension
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
-- Index for fast searches
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);Search Operations
Semantic search with standard SQL:
-- Cosine similarity search
SELECT id, content, metadata,
1 - (embedding <=> '[0.1, 0.2, ...]') as similarity
FROM documents
WHERE metadata->>'category' = 'technical'
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;Key note: You combine vector search + SQL filters in a single query.
Advantages of PostgreSQL + pgvector
💰 $0 Extra Cost
- Free open-source extension
- Same VPS/RDS as your DB
- No per-query costs
- Scales with your existing infra
🔒 Unified Data
- Embeddings + metadata in same table
- ACID transactions guaranteed
- Unified backups
- No cross-DB synchronization
⚡ Full SQL
- JOINs between vector and relational tables
- Complex WHERE clauses
- Aggregations and analytics
- CTEs and window functions
🛡️ No Vendor Lock-in
- Standard PostgreSQL, portable
- Trivial cross-cloud migration
- Standard tools (pg_dump, pgAdmin)
- Massive community support
Performance: pgvector vs Pinecone
Benchmarks with 1M embeddings (OpenAI ada-002, 1536 dimensions):
Conclusion: pgvector is 15% slower in pure latency, but eliminates 50-100ms of network overhead from API calls. Equivalent recall. For most applications, imperceptible difference.
When pgvector is NOT enough
- 100M+ embeddings with <10ms queries required
- Extreme multi-tenancy (1000+ tenants with strict isolation)
- Real-time updates at massive scale (>10K writes/sec)
For 95% of B2B applications (including those with "millions of users"), pgvector performance is more than sufficient.
Real Architecture: AI System with pgvector
Case study SeducSer (500K active users):
PostgreSQL Schema
-- Users and context
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE,
metadata JSONB
);
-- Conversation embeddings
CREATE TABLE conversation_embeddings (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
content TEXT,
embedding vector(1536),
conversation_id TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Index for searches
CREATE INDEX ON conversation_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Knowledge base
CREATE TABLE knowledge_base (
id SERIAL PRIMARY KEY,
category TEXT,
title TEXT,
content TEXT,
embedding vector(1536),
metadata JSONB
);
CREATE INDEX ON knowledge_base
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 50);Typical Workflow
- User query arrives via n8n webhook
- Generate embedding with OpenAI API (ada-002)
- Semantic search in knowledge_base + conversation history:
SELECT kb.title, kb.content, kb.metadata, 1 - (kb.embedding <=> $1) as similarity FROM knowledge_base kb WHERE kb.category = 'faq' AND kb.metadata->>'active' = 'true' ORDER BY kb.embedding <=> $1 LIMIT 5; - Context assembly with results + user history (JOIN)
- LLM call (Claude/GPT-4) with enriched context
- Store interaction with new embedding for future context
Setup: PostgreSQL + pgvector in Production
Production-ready deployment takes 1-2 hours:
Option 1: Docker (Development/Small Scale)
docker run -d \
--name postgres-pgvector \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
-v pgdata:/var/lib/postgresql/data \
pgvector/pgvector:pg17Option 2: Managed VPS (Production)
Hetzner/DigitalOcean VPS with PostgreSQL 17:
# Install pgvector
sudo apt install postgresql-17-pgvector
# Enable extension
sudo -u postgres psql -d mydb -c "CREATE EXTENSION vector;"
# Configure for vector workloads
# postgresql.conf:
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
max_parallel_workers_per_gather = 4Indexing Strategy
For tables >100K embeddings, use IVFFlat index:
-- Lists = sqrt(total_rows) is a good starting point
CREATE INDEX ON embeddings_table
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 316); -- for ~100K rows
-- For large tables (>1M rows), consider HNSW (PostgreSQL 17+)
CREATE INDEX ON embeddings_table
USING hnsw (embedding vector_cosine_ops);Monitoring
-- Query performance
EXPLAIN ANALYZE
SELECT * FROM documents
ORDER BY embedding <=> '[...]'
LIMIT 10;
-- Index usage stats
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';Real Use Cases
1. RAG (Retrieval Augmented Generation)
Most common use case: enrich LLM prompts with relevant context. pgvector stores knowledge base, queries search for top-k similar documents.
2. Semantic Search in SaaS
Replace traditional full-text search with semantic search. Example: "find similar support tickets" without keyword matching.
3. Recommendation Systems
User embeddings + item embeddings = recommendations via cosine similarity. SQL JOINs allow complex filters (price range, category, etc.)
4. Duplicate Detection
Detect duplicate/similar content: document embeddings, search for pairs with similarity > threshold.
Migration: Pinecone → pgvector
I've migrated 2 projects from Pinecone to pgvector. Standard process:
- Export from Pinecone (API batch fetch, 10K vectors/call)
- Schema design in PostgreSQL with appropriate metadata columns
- Bulk insert with COPY command (1M embeddings: ~5 min)
- Create indexes (IVFFlat, ~10 min for 1M rows)
- Update application code (SQL queries vs Pinecone SDK)
- Parallel testing validate equivalent recall
- Cutover turn off Pinecone, monitor performance
Total timeline: 2-3 days for 1M+ embeddings.Immediate ROI: $70-240/month saved.
Conclusion
Specialized vector databases have their place for extreme-scale or latency-critical applications. But for the vast majority of B2B AI applications, PostgreSQL + pgvector offers:
- ✅ Equivalent performance (<50ms P95)
- ✅ $0 extra cost over existing infra
- ✅ Unified data (embeddings + metadata in same DB)
- ✅ Full SQL (JOINs, analytics, complex queries)
- ✅ No vendor lock-in (standard PostgreSQL)
18 months operating multi-agent AI system in production with pgvector, 500K users, 0 downtime. The architecture works.
Need to implement AI with pgvector?
I offer specialized consulting in self-hosted AI architectures with PostgreSQL + pgvector. From diagnostic ($2K) to complete implementation ($12-20K).