Can I try something before hiring?

Yes. Lucy (chatbot below) runs llama3.2-3b fine-tuned (synthetic dataset of 2,500 AI business strategy entries) deployed on Modal.com. Production system showcasing custom fine-tuning + scalable deployment. SeducSer (seducser.com) is a multi-agent system with 10+ coordinated agents in production. Use the /matrix command to verify I built it. Both are verifiable right now.

Why an individual consultant vs a large agency?

Large agencies charge you $50K+ USD for systems they delegate to juniors. I personally design, implement and train your team. Zero intermediaries. Also: agencies want lock-in (to keep you paying). I deliver complete code + documentation so you are autonomous.

My team has no AI experience. Will they be able to maintain the system?

I use n8n (visual, no-code/low-code). If your team knows basic JavaScript, they can maintain and modify workflows. I include: 2+ hours of training, exhaustive documentation, and 30 days of post-handoff support for questions. 80% of my clients make simple modifications on their own after 2 weeks.

How long until production?

Tier 2 (Single-Agent System): 4-6 weeks. Tier 3 (Multi-Agent System): 8-12 weeks. Includes: architecture, implementation, testing, documentation, training. Specific timeline defined in discovery call.

← Back to blogArchitecture

Why hybrid architecture > cloud-only

Private VPS + Cloud APIs: The combination that reduces costs 70-80% while maintaining compliance and performance

Alejandro Valencia•2024-11-15•8 min

For 18 months I've operated a multi-agent AI system in production for SeducSer (500K+ active users). The most valuable lesson: hybrid architecture (private VPS + Cloud APIs) beats cloud-only in costs, compliance, and control.

This goes against the current "everything to the cloud" mantra, but the numbers don't lie.

The problem with cloud-only

Most AI consultants use cloud APIs exclusively (OpenAI, Anthropic, etc.). This works for prototypes, but in production you face:

Linearly scaling costs: $0.03 per 1K tokens adds up fast with 500K users
Vendor lock-in: Switching providers requires rewriting prompts and logic
Impossible compliance: Sensitive data leaves your infrastructure
Variable latency: You depend on external APIs for critical operations

Real example: A client was processing 2M queries/month using only GPT-4. Monthly cost: $6,000. After hybrid approach: $1,200/month (-80%).

The solution: hybrid architecture

The concept is simple but powerful: Private VPS handles routine queries, Cloud APIs process complex cases.

Key components

🖥️ Private VPS

Ollama with open-source models (Llama 3, Mistral)
PostgreSQL + pgvector for context
n8n for orchestration
Cost: ~$40-80/month fixed

☁️ Cloud APIs

Claude Sonnet 4 for complex reasoning
GPT-4 as fallback
Only when VPS can't resolve
Cost: Variable, ~20-30% of total volume

🔀 Intelligent Router

Analyzes query complexity
Routing based on model confidence
Automatic fallback if VPS fails
Implemented in n8n

When to use each layer

Private VPS (70-80% of queries)

Frequent and repetitive queries
Sensitive data (names, emails, financial)
Critical low latency (<500ms)
Cases where historical context is key

Cloud APIs (20-30% of queries)

Complex multi-step reasoning
Long content generation
Edge/unusual cases
When VPS confidence score < threshold

Real use case (SeducSer): 70% of user queries (FAQ, order status, tracking) are resolved by local Llama 3. The remaining 30% (personalized advice, complex cases) goes to Claude. Savings: 75% in AI costs.

100% remote deployment

Key advantage: No physical access to infrastructure needed. Everything is managed remotely via SSH/APIs:

VPS provisioning: DigitalOcean/Hetzner API
Automated setup: Ansible playbooks
Deployment: Docker + CapRover
Monitoring: Prometheus + Grafana cloud

This allows scaling operations without on-site visits, critical for independent consultants.

Compliance and security

Sensitive data never leaves the VPS. This is crucial for:

GDPR (EU): Data on controlled servers
HIPAA (US Healthcare): PHI on private infrastructure
PCI-DSS (Payments): Local financial data

Cloud APIs only receive anonymized queries without PII (Personally Identifiable Information).

Real numbers: SeducSer

500K+

Active users

18 months

In production

75%

AI cost reduction

<500ms

P95 Latency (VPS)

Tech stack: Ollama (Llama 3 8B) on $80/month VPS + Claude API ~$300/month = $380/month total. Cloud-only would have cost $1,800/month.

Honest trade-offs

Hybrid architecture isn't magic. It has costs:

Operational complexity: You maintain 2 systems (VPS + Cloud)
Longer initial setup: 2-3 days vs 30 min cloud-only
Requires DevOps expertise: Docker, nginx, monitoring
Local models less capable: Llama 3 < GPT-4 in reasoning

When NOT to use it? Quick prototypes, volume <100K queries/month, or when you don't have technical expertise.

Practical implementation

If you want to replicate this approach, the minimum stack is:

VPS: 8GB RAM, 4 vCPUs (~$40/month on Hetzner)
Ollama: Docker container with Llama 3 8B
PostgreSQL + pgvector: For context and embeddings
n8n: Orchestration and routing logic
Cloud API: Claude or GPT-4 for fallback

Setup time: 2-3 days first time, then replicable in hours.

Conclusion

Hybrid architecture isn't for everyone. But if you operate AI systems in production with significant volume, need real compliance, or want control over your costs, it's the only sustainable strategy.

The industry is obsessed with "serverless" and "cloud-first". But for applications with predictable traffic and compliance requirements, VPS + Cloud is objectively superior.

18 months in production with 500K users prove it.

Want to implement hybrid architecture in your system?

I offer technical consulting to design and implement hybrid AI systems. From diagnostic workshops ($2K) to complete implementation ($12K-20K).

View Services