Can I try something before hiring?

Yes. Lucy (chatbot below) runs llama3.2-3b fine-tuned (synthetic dataset of 2,500 AI business strategy entries) deployed on Modal.com. Production system showcasing custom fine-tuning + scalable deployment. SeducSer (seducser.com) is a multi-agent system with 10+ coordinated agents in production. Use the /matrix command to verify I built it. Both are verifiable right now.

Why an individual consultant vs a large agency?

Large agencies charge you $50K+ USD for systems they delegate to juniors. I personally design, implement and train your team. Zero intermediaries. Also: agencies want lock-in (to keep you paying). I deliver complete code + documentation so you are autonomous.

My team has no AI experience. Will they be able to maintain the system?

I use n8n (visual, no-code/low-code). If your team knows basic JavaScript, they can maintain and modify workflows. I include: 2+ hours of training, exhaustive documentation, and 30 days of post-handoff support for questions. 80% of my clients make simple modifications on their own after 2 weeks.

How long until production?

Tier 2 (Single-Agent System): 4-6 weeks. Tier 3 (Multi-Agent System): 8-12 weeks. Includes: architecture, implementation, testing, documentation, training. Specific timeline defined in discovery call.

← Back to blogTech Stack

PostgreSQL + pgvector: The underestimated stack for AI

Pinecone/Weaviate cost $70-300/month. PostgreSQL + pgvector: $0 extra on your existing DB. Same performance.

Alejandro Valencia•2024-11-05•12 min

I've been operating a multi-agent AI system in production with PostgreSQL + pgvector for 18 months. Result: 0 downtime, predictable costs, data 100% under control, performance equivalent to specialized vector DBs.

The AI industry is obsessed with Pinecone, Weaviate, Qdrant. But for most B2B applications, PostgreSQL + pgvector is objectively superior.

The Problem with Specialized Vector Databases

Pinecone, Weaviate, and similar services offer an attractive proposition: "drop-in vector search". But they come with hidden costs:

Unpredictable pricing: Pinecone Standard: $70/month base + $0.40/1M queries. At 10M queries/month: $170/month
Severe vendor lock-in: Proprietary APIs, complex migration
Data fragmentation: Embeddings in Pinecone, metadata in PostgreSQL = 2 systems to maintain
Network latency: External API calls add 50-200ms vs local queries
No SQL joins: Impossible to combine vector searches with complex relational queries

Real example: Client with 50M embeddings in Pinecone was paying $240/month. After migrating to PostgreSQL + pgvector: $0 extra (same VPS they were using for DB). Savings: $2,880/year.

pgvector: PostgreSQL Extension for Vector Search

pgvector is an open-source extension that adds vector support to PostgreSQL. Installation: 1 SQL line:

CREATE EXTENSION vector;

Once installed, you can store embeddings as vector type columns:

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536),  -- OpenAI ada-002 dimension
  metadata JSONB,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Index for fast searches
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Search Operations

Semantic search with standard SQL:

-- Cosine similarity search
SELECT id, content, metadata,
       1 - (embedding <=> '[0.1, 0.2, ...]') as similarity
FROM documents
WHERE metadata->>'category' = 'technical'
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Key note: You combine vector search + SQL filters in a single query.

Advantages of PostgreSQL + pgvector

💰 $0 Extra Cost

Free open-source extension
Same VPS/RDS as your DB
No per-query costs
Scales with your existing infra

🔒 Unified Data

Embeddings + metadata in same table
ACID transactions guaranteed
Unified backups
No cross-DB synchronization

⚡ Full SQL

JOINs between vector and relational tables
Complex WHERE clauses
Aggregations and analytics
CTEs and window functions

🛡️ No Vendor Lock-in

Standard PostgreSQL, portable
Trivial cross-cloud migration
Standard tools (pg_dump, pgAdmin)
Massive community support

Performance: pgvector vs Pinecone

Benchmarks with 1M embeddings (OpenAI ada-002, 1536 dimensions):

45ms

pgvector P95 latency

38ms

Pinecone P95 latency

95%

Recall @10 (both)

pgvector extra cost

Conclusion: pgvector is 15% slower in pure latency, but eliminates 50-100ms of network overhead from API calls. Equivalent recall. For most applications, imperceptible difference.

When pgvector is NOT enough

100M+ embeddings with <10ms queries required
Extreme multi-tenancy (1000+ tenants with strict isolation)
Real-time updates at massive scale (>10K writes/sec)

For 95% of B2B applications (including those with "millions of users"), pgvector performance is more than sufficient.

Real Architecture: AI System with pgvector

Case study SeducSer (500K active users):

PostgreSQL Schema

-- Users and context
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email TEXT UNIQUE,
  metadata JSONB
);

-- Conversation embeddings
CREATE TABLE conversation_embeddings (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  content TEXT,
  embedding vector(1536),
  conversation_id TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Index for searches
CREATE INDEX ON conversation_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Knowledge base
CREATE TABLE knowledge_base (
  id SERIAL PRIMARY KEY,
  category TEXT,
  title TEXT,
  content TEXT,
  embedding vector(1536),
  metadata JSONB
);

CREATE INDEX ON knowledge_base
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 50);

Typical Workflow

User query arrives via n8n webhook
Generate embedding with OpenAI API (ada-002)

Semantic search in knowledge_base + conversation history:

SELECT kb.title, kb.content, kb.metadata,
       1 - (kb.embedding <=> $1) as similarity
FROM knowledge_base kb
WHERE kb.category = 'faq'
  AND kb.metadata->>'active' = 'true'
ORDER BY kb.embedding <=> $1
LIMIT 5;

Context assembly with results + user history (JOIN)
LLM call (Claude/GPT-4) with enriched context
Store interaction with new embedding for future context

Production performance: P95 latency 180ms end-to-end (embedding generation: 80ms, semantic search: 45ms, LLM call: 1.2s). DB costs: $80/month VPS (PostgreSQL 17 + pgvector).

Setup: PostgreSQL + pgvector in Production

Production-ready deployment takes 1-2 hours:

Option 1: Docker (Development/Small Scale)

docker run -d \
  --name postgres-pgvector \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  -v pgdata:/var/lib/postgresql/data \
  pgvector/pgvector:pg17

Option 2: Managed VPS (Production)

Hetzner/DigitalOcean VPS with PostgreSQL 17:

# Install pgvector
sudo apt install postgresql-17-pgvector

# Enable extension
sudo -u postgres psql -d mydb -c "CREATE EXTENSION vector;"

# Configure for vector workloads
# postgresql.conf:
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
max_parallel_workers_per_gather = 4

Indexing Strategy

For tables >100K embeddings, use IVFFlat index:

-- Lists = sqrt(total_rows) is a good starting point
CREATE INDEX ON embeddings_table
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 316);  -- for ~100K rows

-- For large tables (>1M rows), consider HNSW (PostgreSQL 17+)
CREATE INDEX ON embeddings_table
USING hnsw (embedding vector_cosine_ops);

Monitoring

-- Query performance
EXPLAIN ANALYZE
SELECT * FROM documents
ORDER BY embedding <=> '[...]'
LIMIT 10;

-- Index usage stats
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';

Real Use Cases

1. RAG (Retrieval Augmented Generation)

Most common use case: enrich LLM prompts with relevant context. pgvector stores knowledge base, queries search for top-k similar documents.

2. Semantic Search in SaaS

Replace traditional full-text search with semantic search. Example: "find similar support tickets" without keyword matching.

3. Recommendation Systems

User embeddings + item embeddings = recommendations via cosine similarity. SQL JOINs allow complex filters (price range, category, etc.)

4. Duplicate Detection

Detect duplicate/similar content: document embeddings, search for pairs with similarity > threshold.

Common pattern: Combine semantic search (pgvector) with business logic filters (SQL WHERE). This is impossible/complicated in specialized vector DBs.

Migration: Pinecone → pgvector

I've migrated 2 projects from Pinecone to pgvector. Standard process:

Export from Pinecone (API batch fetch, 10K vectors/call)
Schema design in PostgreSQL with appropriate metadata columns
Bulk insert with COPY command (1M embeddings: ~5 min)
Create indexes (IVFFlat, ~10 min for 1M rows)
Update application code (SQL queries vs Pinecone SDK)
Parallel testing validate equivalent recall
Cutover turn off Pinecone, monitor performance

Total timeline: 2-3 days for 1M+ embeddings.Immediate ROI: $70-240/month saved.

Conclusion

Specialized vector databases have their place for extreme-scale or latency-critical applications. But for the vast majority of B2B AI applications, PostgreSQL + pgvector offers:

✅ Equivalent performance (<50ms P95)
✅ $0 extra cost over existing infra
✅ Unified data (embeddings + metadata in same DB)
✅ Full SQL (JOINs, analytics, complex queries)
✅ No vendor lock-in (standard PostgreSQL)

18 months operating multi-agent AI system in production with pgvector, 500K users, 0 downtime. The architecture works.

Need to implement AI with pgvector?

I offer specialized consulting in self-hosted AI architectures with PostgreSQL + pgvector. From diagnostic ($2K) to complete implementation ($12-20K).

Schedule Discovery Call View Services