AI Engineering13 min read

Vector Database Comparison 2026: Pinecone, Qdrant, pgvector

By Ergini, Software & AI Developer in Pristina, Kosovo

TL;DR

The $600 per month Pinecone bill is the crossover where self-hosting becomes cheaper. I have shipped all four - Pinecone, Qdrant, Weaviate, and pgvector - in production. Here are the real benchmarks, hidden costs, and the decision matrix.

Direct verdict: which vector database to pick in 2026

For most teams shipping retrieval augmented generation in 2026, start with pgvector. It runs inside the Postgres you already have, costs effectively nothing under 5 million vectors, indexes well with HNSW, and gives you joins, filters, and transactions for free. The most expensive mistake I see weekly on Upwork is teams reaching for Pinecone at 80,000 chunks because a blog told them to. They pay $70 per month for what a 10 dollar Supabase row could serve in single-digit milliseconds.

Move off pgvector when one of three things is true: you cross roughly 50 million vectors and your p95 latency starts climbing, your filtering patterns get exotic enough that the planner stops using your index, or your team simply does not want to own a database. At that point, Qdrant is my default - Cloud if you want managed, self-hosted if you are comfortable with Kubernetes or a managed VPS. Pinecone is the right call when zero ops is a hard requirement and you have budget. Weaviate wins when you want hybrid retrieval to be a single primitive instead of a code project. Everything else in this post is the detail behind those four sentences.

The 4 vector DBs that actually ship in 2026

I have shipped production RAG on all four of these for paying clients, plus pgvector for my own products. The table below is the cheat sheet I use when scoping a new project. Costs are rough monthly figures for 100 million 1536-dimension vectors at modest QPS - your mileage will vary, but the relative ordering holds.

DatabaseBest forHostingFilteringHybrid search$/100M vectors/moOps overheadStandout feature
Pinecone ServerlessZero-ops production RAGFully managedSolid, metadata-typedSparse-dense via separate index$1,200 to $2,400NoneGenuinely zero ops, mature SDKs
Qdrant CloudHeavy filtering at scaleManagedBest in class, payload indexesNative sparse-dense vectors$700 to $1,400LowQuantization cuts cost 3-4x
Qdrant self-hostedCost-sensitive scaleYou run itBest in classNative sparse-dense$250 to $600 (VPS)MediumSame engine as Cloud, half the price
WeaviateHybrid retrieval out of the boxCloud or self-hostGood, BM25-awareFirst-class, one parameter$800 to $1,800MediumHybrid is a query type, not a project
pgvector (Supabase or Neon)Up to 50M vectors, mixed workloadsManaged PostgresFull SQL, can defeat indexVia Postgres FTS join$200 to $500 (mostly compute)Low if managedLives inside your Postgres
Chroma (mention)Prototyping, notebooksSelf-host or Cloud betaFunctionalLimitedn/a at scaleLowEasiest local dev experience

Chroma is on the list because it keeps coming up - it is genuinely the nicest local development experience and I use it for spike work. I have not run it in production for a paying client and I would not recommend you do either in 2026. The operational story is still too young for anything load-bearing.

The decision tree (5 questions, one answer)

If you skip the rest of this post, do this exercise. It is the same sequence I run with clients on the first call before we touch any infrastructure.

  1. Are you under 1 million vectors today and likely to stay under 10 million for 12 months? Use pgvector. Stop reading. The operational simplicity is worth more than any benchmark difference at this scale.
  2. Are you above 100 million vectors or will you be inside 12 months? Pinecone Serverless if budget is fine and ops time is scarce. Qdrant self-hosted if you have an SRE in the building. Skip pgvector - you will fight it.
  3. Do you need to filter on more than 3 metadata fields per query, including high-cardinality fields like tenant_id? Qdrant is the strongest pick. Its payload index design handles this better than anything else I have shipped. Pinecone is fine until you hit very high-cardinality filters, then it surprises you.
  4. Do you need true hybrid retrieval, BM25 plus vector, weighted, with a tunable alpha? Weaviate is the fastest path. Qdrant with sparse-dense is the most flexible. pgvector can do it with Postgres full-text search but you write the join. Avoid Pinecone for native hybrid in 2026 - the sparse-dense story is still clunky.
  5. Does your team have real Postgres operational experience? If yes, pgvector for almost anything under 50 million vectors is the right call. If no, you are paying tax for a new database either way - pick the managed option that fits the other criteria.

Run those five questions in order. They will collapse to one or two candidates in under five minutes.

pgvector deep dive: why it wins for most teams

pgvector is the Postgres extension that adds a vector type, distance operators, and ANN indexes. It is boring in the best possible sense. The reason I default to it for almost every new project is that it removes an entire system from your architecture diagram. Your application data, your auth, your embeddings, and your retrieval all live in the same database, behind the same backups, in the same transactions.

On HNSW indexes with 1536-dimension OpenAI embeddings on a modest Supabase Pro instance, here is what I have measured in production:

Vector countIndex build timep50 query latencyp95 query latencyStorage
1 million~6 minutes9 ms22 ms~7 GB
10 million~70 minutes18 ms48 ms~70 GB
50 million~8 hours42 ms140 ms~350 GB

Those numbers assume HNSW with m = 16 and ef_construction = 64, which is the default I run. ivfflat is the older index type and I still use it for the rare case where the data is mostly static and read-heavy - it builds faster and uses less memory but the recall and latency profile is worse for typical RAG workloads.

Where pgvector breaks: very large metadata filters that defeat the index and force a sequential scan, very high QPS workloads above a few hundred queries per second on a single primary, and multi-tenant designs where tenant cardinality explodes into the millions. If any of those describe you, plan the migration before you need it. For everyone else, the cost-benefit of staying inside one database is enormous. This is the foundation I cover in detail in the production RAG architecture guide.

Qdrant deep dive: the filtering and quantization king

Qdrant is the database I reach for when pgvector stops being the right answer. It is written in Rust, has a clean REST and gRPC API, ships as a single binary or a Docker image, and the Cloud product is priced sensibly. Most importantly, its filtering design is the best in the category.

Qdrant lets you build payload indexes on metadata fields - keyword, integer, float, geo, datetime - and the query planner uses them together with the vector index. In practice this means a query like "find the top 10 most similar chunks where tenant_id = 421 and published_at > last week and source in (kb, support)" runs in the same millisecond range as a pure vector query. Pinecone handles this well at moderate scale but starts to surprise you when filters are very selective; pgvector falls back to sequential scans when the planner stops trusting the index. Qdrant just works.

The other Qdrant superpower is quantization. Scalar quantization to int8 cuts memory cost by 4x with negligible recall loss for most embeddings. Binary quantization is more aggressive - roughly 32x reduction - and is genuinely usable in 2026 for rerank-after-recall pipelines. I have a client running 180 million vectors on a single Qdrant node with binary quantization plus an exact-rescore step, and the cost story is roughly 70% cheaper than the equivalent Pinecone Serverless deployment.

Cost crossover: Qdrant Cloud beats Pinecone Serverless on raw dollars from roughly the $400 per month mark upward. Self-hosted Qdrant on a Hetzner or Vultr VPS beats both from the first dollar, but you pick up backup, upgrade, monitoring, and on-call as your problem. Whether that is a good trade depends entirely on whether you have the team for it.

Pinecone deep dive: zero ops, mature, expensive

Pinecone is the original commercial vector database and it shows. The SDK is the cleanest, the dashboard is the most polished, the docs are the best, and the operational story is genuinely zero. You do not think about backups, upgrades, replication, or sharding. You write to an index and you query an index. That is it.

Pinecone Serverless, introduced in 2024 and matured through 2025, changed the pricing math for small workloads. Below roughly 10 million vectors and modest QPS, you can run on Serverless for $30 to $80 per month, which is a fair price for the operational freedom. Above that point, costs climb faster than people expect because you pay separately for storage, reads, and writes. A typical production RAG workload at 100 million vectors with 5 QPS lands me in the $1,200 to $2,400 per month range, which is two to three times the equivalent Qdrant Cloud deployment.

When you should pay the premium: when your team is shipping product and the cost of one person spending one day per month on database operations is more than the price difference. For a US-based startup with one infrastructure engineer, that breakeven is honestly higher than most founders assume. Pay Pinecone, ship the product. For a Kosovo or Eastern Europe team with cheaper engineering hours, the breakeven flips earlier - that is why most of the clients I work with end up on self-hosted Qdrant past the prototype stage. If you want help thinking through that tradeoff, this is the kind of work I cover under AI integration.

Weaviate deep dive: hybrid retrieval as a primitive

Weaviate is the most opinionated of the four. It is not just a vector store - it is closer to a retrieval framework with a vector store inside it. The schema is typed, queries go through a GraphQL API, hybrid search is a first-class query type with a single alpha parameter between 0 and 1 that mixes BM25 and vector scores, and there are generative modules that let the database itself call an LLM after retrieval.

For teams who want one tool that handles the entire retrieval layer, Weaviate is the fastest path to a working system. The hybrid story alone is worth a serious look - in pgvector or Qdrant, getting BM25 and vector to play together properly is a project. In Weaviate, it is a query parameter.

Where Weaviate gets harder: the opinions cut both ways. The GraphQL API is great when you embrace it and awkward when you do not. The schema migrations are real work. The cluster topology for production is more involved than Qdrant or Pinecone. I have shipped two Weaviate deployments and both took longer to get to production than the equivalent Qdrant build. If hybrid is your core requirement, pay that cost - it pays back. If hybrid is a nice-to-have, pick something simpler.

The hidden costs nobody mentions

Every vector database benchmark you read focuses on latency and recall. The bill you get in month three is shaped by costs nobody writes about. Here is the list I run through before recommending any of these to a client.

Egress. If your vector DB is in a different cloud or region from your application, you pay for every byte of every query result that leaves. At 1,000 QPS with 10 results of 1536 floats each, you are pushing real money out the door per month. Co-locate or accept the bill.

Embeddings storage and reindex cost. A 1536-dim OpenAI embedding stored as float32 is 6 KB per vector. 100 million vectors is 600 GB just for the vectors, before any index or metadata overhead. If you change embedding models - and you will, every 18 months - you re-embed and reindex everything. Budget for the embedding API spend that bigger than you think, especially with the patterns from the OpenAI API cost breakdown.

Multi-tenant isolation. If you are building a SaaS with tenant data in the same store, you need either a namespace per tenant (Pinecone), a collection per tenant (Qdrant), a class per tenant (Weaviate), or a tenant_id filter (pgvector and anywhere). Each model has different cost and isolation tradeoffs. Namespace-per-tenant is the safest pattern but only Pinecone makes it cheap.

Backups. Managed services do this for you. Self-hosted means snapshots, S3 lifecycle, restore drills. I have watched a team lose three weeks of fine-tuned chunking work to a missing backup on a self-hosted Qdrant node. Do not be that team.

ANN vs exact mode. Every database here uses approximate nearest neighbor by default. Recall is typically 0.95 to 0.99 at top-k=10. For most RAG, that is fine. For legal, medical, or compliance workloads where missing a single relevant chunk is a real risk, you want either exact search (slow), an oversample-and-rerank pipeline, or a much higher ef_search parameter. None of this is in the marketing.

Real benchmarks: same dataset, same queries, 4 DBs

I ran a controlled comparison on a 5 million vector dataset of 1536-dimension OpenAI embeddings (a deduplicated subset of a client's knowledge base, anonymized). Same dataset, same 500 queries, top-k=10, single warm node per database, no quantization, HNSW or equivalent default index. The point is not to pick a winner - at this scale all four are good enough. The point is to show the shape.

Databasep50 latencyp95 latencyRecall@10Index build timeNotes
pgvector (Supabase, m=16)14 ms38 ms0.96~32 minSingle primary, shared compute
Qdrant 1.x (single node, 4 vCPU)8 ms21 ms0.98~18 minFastest on this hardware
Pinecone Serverless22 ms55 ms0.97n/a (managed)Latency includes network hop
Weaviate (single node, 4 vCPU)12 ms34 ms0.97~25 minHybrid query adds ~6 ms

Two honest caveats. First, your numbers will differ - workload shape, region, instance class, and dataset distribution all move these by 20 to 40 percent. Second, recall is the metric you should actually care about for RAG, not raw latency. A 5 ms query that misses the right chunk is worse than a 50 ms query that finds it. All four databases above are in the band where recall is production-acceptable; pick on the other axes.

Migration: how to switch DBs without rewriting your app

The most important architectural decision you will make is not which vector database to pick - it is whether to write your application against a vendor SDK directly or against a thin interface you control. I learned this the hard way on a 2023 project that pinned itself to Pinecone and took a full week to migrate to Qdrant when the bill became absurd. Now I write every project against a 20-line TypeScript interface and the migration is an afternoon.

Here is the pattern. Four methods cover 95% of real workloads:

// lib/vector-store.ts
export interface VectorStore {
  upsert(items: VectorItem[]): Promise<void>;
  query(opts: QueryOpts): Promise<QueryResult[]>;
  delete(ids: string[]): Promise<void>;
  count(filter?: Filter): Promise<number>;
}

export type VectorItem = {
  id: string;
  values: number[];
  metadata: Record<string, string | number | boolean>;
};

export type QueryOpts = {
  vector: number[];
  topK: number;
  filter?: Filter;
};

export type QueryResult = {
  id: string;
  score: number;
  metadata: Record<string, string | number | boolean>;
};

export type Filter = Record<string, unknown>;

Implement that interface once per database. Your application code never imports the Pinecone or Qdrant SDK directly. When the bill gets painful, you write a new implementation and flip an environment variable. The same pattern applies to any agentic RAG pipeline where the retrieval step is one tool of many.

Hybrid retrieval: how each one does BM25 + vector

Pure vector search misses keyword matches that BM25 catches easily. Pure BM25 misses semantic matches that vectors catch easily. Hybrid retrieval - running both and combining the scores - is the single biggest quality lift you can ship after basic RAG. Here is how each database handles it.

Weaviate exposes hybrid as a query type. One alpha parameter mixes BM25 and vector scores, 0 is pure keyword, 1 is pure vector. This is the cleanest experience and is the reason teams pick Weaviate.

Qdrant supports sparse and dense vectors in the same collection since the 1.7 release line. You generate sparse vectors with SPLADE or BM25, store them alongside your dense embeddings, and Qdrant fuses the results with reciprocal rank fusion or your own weighting. Most flexible, slightly more setup.

Pinecone supports sparse-dense via a separate index type. It works but the developer experience is rougher than the others - you maintain two indexes and combine results client-side or with a serverless reranker.

pgvector does hybrid via Postgres full-text search joined with the vector index. More code than the others, but uses tools you already have. Sample query:

-- Hybrid search in pgvector + Postgres FTS
WITH semantic AS (
  SELECT id, 1 - (embedding <=> $1) AS sem_score
  FROM chunks
  ORDER BY embedding <=> $1
  LIMIT 50
),
keyword AS (
  SELECT id,
    ts_rank(content_tsv, plainto_tsquery('english', $2)) AS kw_score
  FROM chunks
  WHERE content_tsv @@ plainto_tsquery('english', $2)
  LIMIT 50
)
SELECT c.id, c.content,
  COALESCE(s.sem_score, 0) * 0.7 +
  COALESCE(k.kw_score, 0) * 0.3 AS score
FROM chunks c
LEFT JOIN semantic s ON s.id = c.id
LEFT JOIN keyword  k ON k.id = c.id
WHERE s.id IS NOT NULL OR k.id IS NOT NULL
ORDER BY score DESC
LIMIT 10;

That is roughly 15 lines of SQL and it gives you a tunable hybrid retriever with no extra infrastructure. The 0.7 and 0.3 weights are starting points - tune them against an eval set, the same way you would tune Weaviate's alpha.

OmniAPI case study: pgvector all the way

OmniAPI is one of my products. It ships a developer-facing knowledge base with semantic search across API documentation and integration guides. The current corpus is roughly 4.2 million chunks. It has been on pgvector since day one, hosted on a single Supabase Pro instance with HNSW indexes, and there has been no reason to migrate.

Numbers: p50 retrieval latency is 11 ms, p95 is 31 ms, total Supabase bill including the application database is under $100 per month, and the vector index has been rebuilt twice in 18 months for embedding-model upgrades. The closest equivalent on Pinecone Serverless would cost roughly $180 per month at this traffic level. The closest equivalent on Qdrant Cloud would cost roughly $120. The savings are not the point - the operational simplicity is. There is one database to back up, one database to monitor, one set of credentials to rotate.

The migration plan if we ever need it is also clear. The retrieval layer is behind the 20-line interface above, and a Qdrant implementation lives in a feature flag for load testing. The day pgvector latency crosses the threshold I have set, we flip the flag. That is the dividend of designing for migration from day one.

My picks by scenario

These are concrete recommendations I would give a friend at each stage, with the caveats baked in. If your situation is exotic, bring it to a scoping call - none of these are absolute.

MVP RAG (under 1M vectors): pgvector on Supabase or Neon. The same database that holds your users holds your embeddings. Total infra cost under $30 per month. Ship the product, see if anyone wants it, worry about scale later. This fits cleanly into the SaaS MVP stack I default to.

Production RAG up to 50M vectors: still pgvector if your Postgres team is solid. Move to Qdrant Cloud if you want the database off your plate or you need stronger filtering. Skip Pinecone unless the zero-ops requirement is genuine.

Multi-tenant SaaS RAG: Pinecone if budget is fine and namespaces fit your isolation model, Qdrant if you need collection-per-tenant with payload indexes, pgvector with a tenant_id column and a strict filter discipline if you have fewer than ~500 tenants and decent-size tenants.

Heavy filtering RAG (more than 3 metadata fields per query, including high-cardinality): Qdrant. No close second. The payload index design is genuinely better than anything else in the category.

Zero-ops mandate: Pinecone Serverless. Pay the premium, ship the product, do not think about the database again. If your AI team is small and your application team is large, this is almost always the right call.

Hybrid retrieval as a hard requirement: Weaviate for the cleanest experience, Qdrant for the most flexibility. Both will outship a pgvector hybrid implementation by a couple of weeks for a team new to the pattern. If you are also building agents on top, the patterns in the AI agent development work I do tend to assume hybrid is already solved.

Cost-sensitive scale (Eastern Europe team economics): self-hosted Qdrant on a managed VPS. This is what most of the teams I work with through my hire an AI developer in Kosovo practice end up running. The engineering hours are cheaper than the SaaS premium and the team enjoys owning the infrastructure.

Frequently asked questions

These are the questions I get most often when teams scope a RAG project with me. The answers are also embedded as FAQ structured data for search.

What is the best vector database in 2026?

There is no single best vector database. For most teams, pgvector on Supabase or Neon is the right starting point because it ships with your relational data and costs effectively nothing under 5 million vectors. Above 50 million vectors, with heavy filtering or strict latency, Qdrant Cloud or self-hosted Qdrant is the practical pick. Pinecone wins when zero ops is a hard requirement. Weaviate wins when you want hybrid retrieval baked in.

Is pgvector really good enough for production RAG?

Yes, up to roughly 50 million vectors with HNSW indexes, sensible chunk sizes, and reasonable query patterns. I have shipped production RAG on pgvector at 12 million vectors with p95 retrieval under 80 milliseconds. Where it breaks is large metadata filters that defeat the index, very high QPS workloads above a few hundred per second on a single instance, and multi-tenant cardinality explosions.

When does Pinecone become cheaper than self-hosting?

Almost never on raw infrastructure cost, often on total cost of ownership. Pinecone Serverless with the current pricing model is competitive up to about $400 to $600 per month, which is roughly where a small Qdrant cluster on a managed VPS becomes cheaper. Above that point, self-hosting Qdrant wins on dollars, but you take on backup, upgrade, monitoring, and on-call cost.

What is the difference between Qdrant and Weaviate?

Qdrant is the cleaner vector database with the strongest filtering story and the best quantization options. Weaviate is closer to an opinionated retrieval framework with built-in BM25, generative modules, and a GraphQL API. If you want a vector primitive you compose around, pick Qdrant. If you want one tool that handles hybrid retrieval out of the box, Weaviate is faster to ship.

Can I switch vector databases later without rewriting my app?

Yes, if you put a thin interface between your application and the vector store from day one. The four operations that matter are upsert, query, delete, and filter. Wrap them behind 15 to 25 lines of TypeScript and you can swap Pinecone for Qdrant in an afternoon. Skip this and you will rewrite your retrieval layer twice.

Do I need a vector database for RAG?

Not always. For corpora under 100,000 chunks, an in-memory FAISS index or even a sorted list with cosine similarity is enough. Vector databases earn their keep when you need persistence, filtering, multi-tenancy, hybrid search, or you cross roughly a million vectors. Below that, the operational overhead is hard to justify.

Which vector database supports hybrid search best?

Weaviate has the cleanest native experience because hybrid is a first-class query type with one alpha parameter. Qdrant added strong sparse-dense vector support and is now the most flexible. Pinecone supports sparse-dense via a separate index. pgvector does hybrid through Postgres full-text search joined with the vector index, which is more code but uses the database you already have.

What about Chroma, Milvus, LanceDB, and the newer entrants?

Chroma is great for prototyping but I would not run it in production yet. Milvus is powerful but operationally heavy and aimed at larger orgs. LanceDB has a beautiful file-based story and is interesting for analytical workloads, but the operational ecosystem around it is still young. For most production RAG in 2026, the four covered in this post are the safe picks.

Closing

The vector database category in 2026 is mature enough that any of the four databases here will get you to production. The difference between picking well and picking badly is measured in a couple of weeks of engineering time, a few hundred dollars per month in infrastructure, and the operational headache you take on as the corpus grows. Start with pgvector unless one of the five questions in the decision tree pushes you elsewhere. Build the migration interface from day one. Re-evaluate at 10 million vectors and again at 100 million. That is the whole playbook.