AI Tools11 min read

pgvector vs Pinecone: When to Switch (and When Not To)

By Ergini, Software & AI Developer in Pristina, Kosovo

TL;DR

Under 50M vectors, pgvector wins on cost, ops, and filtering. Past 100M, Pinecone wins on hands-off scale. This is the decision tree, the cost crossover, and the migration runbook in both directions.

TL;DR verdict: which one, in one paragraph

Under 50 million vectors with metadata filtering and a team that knows Postgres, pgvector wins. It lives inside the database you already pay for, gives you joins and transactions for free, and the cost stays trivial up to roughly 50 million vectors. Above 100 million vectors with no appetite for database operations, Pinecone Serverless wins because the operational story is genuinely zero - you write to an index and you query an index. Between 50 and 100 million vectors, both are fine and the deciding factor is whether your team prefers paying dollars or paying engineering hours. The rest of this post is the cost math, the benchmarks, the migration scripts, and the pattern decisions behind that one paragraph. If you want the wider field - Qdrant, Weaviate, Chroma - that is in the vector database comparison guide.

What actually changed in 2026

Most pgvector versus Pinecone posts still on the first page of Google were written in 2023 against Pinecone's per-pod pricing and pgvector's ivfflat-only era. Both products have moved a long way since, and the answer has changed with them. Three shifts matter for the comparison.

Pinecone killed per-pod pricing. Pinecone Serverless, introduced in 2024 and matured through 2025 and into 2026, replaced the old fixed-pod model with usage-based pricing for storage, reads, and writes. The floor is much lower - a tiny RAG index might be $5 per month instead of $70 - but the ramp past 10 million vectors is steep, and the bill is harder to predict because it scales with query traffic. The marketing now says "pay only for what you use," and that is true; the unstated consequence is that a viral week of traffic shows up on next month's invoice.

pgvector got HNSW and serious hybrid. pgvector 0.5 shipped HNSW indexes in late 2023 and the implementation has matured into 2026 to the point where it is the default for any new project. Latency at 10 million vectors dropped from ivfflat's 150 ms p95 range into the 40 to 60 ms range, and recall improved with it. Hybrid search via Postgres full-text search joined with the vector index is now a clean 15-line query instead of a research project. The planner behavior under selective filters is also much better - pgvector now happily combines a btree index on tenant_id with the HNSW vector index instead of falling back to a sequential scan.

Both are more competitive in the middle. In 2023 the choice was usually obvious: prototype on pgvector, scale on Pinecone. In 2026 the band where both are reasonable answers is wide, roughly 5 to 100 million vectors, and the right call depends on team shape more than benchmarks. That is the question this post is actually answering.

The comparison table

These are the columns I look at when scoping a project. Costs are monthly estimates for 1536-dimension OpenAI embeddings at modest query traffic. Your numbers will move 20 to 40 percent depending on region, read-write ratio, and how aggressively you tune.

Dimensionpgvector (Supabase or Neon)Pinecone Serverless
Cost at 1M vectors$25 to $40 (shared with app DB)$10 to $30
Cost at 10M vectors$40 to $90$120 to $260
Cost at 100M vectors$350 to $700 (compute-bound)$1,200 to $2,400
p95 latency (10M vectors)22 to 48 ms35 to 70 ms (includes network)
Ops overheadLow if managed, real if self-hostZero
FilteringFull SQL, can defeat index at extremesMetadata-typed, fine to moderate scale
Hybrid searchPostgres FTS join, nativeSeparate sparse-dense index
Free tierYes, via Supabase or Neon free plansYes, generous Serverless free tier
BackupsInherits Postgres backupsManaged snapshots
SQL joins to app dataYes, same databaseNo, separate system

The two rows that decide most projects are cost at 10 million vectors and SQL joins. Pinecone is two to three times more expensive in that band, and the loss of a single-database story is a real architectural cost that does not show up on the invoice.

Cost math at three scales

Numbers in the table above are summaries. Here is the actual line-item math at three reference points, with 2026 list prices from both vendors' current pricing pages. I keep these spreadsheets and update them after every client engagement.

1 million vectors, 1 QPS average. On pgvector inside a Supabase Pro plan ($25 per month base), the vectors add roughly 7 GB of storage and negligible CPU for an HNSW index of this size. Total marginal cost: $25 to $40 per month, shared with the application database. On Pinecone Serverless at the same scale: about $7 in storage, $3 in writes, $5 in reads - call it $15 per month. At 1 million vectors Pinecone is cheaper if you do not already have a Postgres bill, and a wash if you do.

10 million vectors, 5 QPS average. On pgvector, you need to upsize the Supabase or Neon compute to handle the HNSW index in memory - call it $50 to $90 per month including the app database. p95 latency lands around 35 to 50 ms. On Pinecone Serverless: roughly $60 in storage, $40 in writes (assuming moderate churn), and $80 to $160 in reads depending on query rate. Realistic range $120 to $260 per month. At this scale pgvector is roughly half the cost and lives in the same database.

100 million vectors, 20 QPS average. On pgvector, this is the edge of the band. You need a serious compute size with 64 GB or more of memory to keep the HNSW index warm, and you are looking at $350 to $700 per month depending on provider. Latency p95 climbs into the 80 to 150 ms range. On Pinecone Serverless: storage alone is around $600, writes add $150 to $300, and reads at 20 QPS add $400 to $800. Realistic total $1,200 to $2,400 per month. At 100 million vectors Pinecone is roughly three times the cost, but it is also the point where pgvector starts to feel like it is fighting you and the ops cost becomes real.

If you want to model the full pipeline including embeddings and generation, the math chains directly into the breakdown in the OpenAI API cost guide.

Performance benchmarks: same dataset, same queries

I ran a controlled test on a 5 million vector dataset of deduplicated 1536-dimension OpenAI embeddings - an anonymized slice of a client's knowledge base. Same 500 queries, top-k of 10, single warm node per database, HNSW default index, no quantization. The point is shape, not winners.

Databasep50 latencyp95 latencyRecall@10Notes
pgvector on Supabase Pro14 ms38 ms0.96Same primary serving app queries
pgvector on Neon (autoscale)16 ms42 ms0.96Cold-start spikes excluded
Pinecone Serverless22 ms55 ms0.97Latency includes network hop

Two honest caveats. First, pgvector's lower numbers here come partly from co-location - the test client and Postgres instance were in the same region, while Pinecone added a small but real network leg. Second, recall is the metric you should actually care about for retrieval augmented generation. A 10 ms query that misses the right chunk is worse than a 40 ms query that finds it. Both products land in the recall band where the rest of your pipeline matters more than the vector store. For context on how the retrieval step fits into the broader system, the production RAG architecture guide walks the whole flow.

Filtering: where pgvector quietly wins

Pinecone's metadata filtering is good. You attach a typed metadata object to each vector and query with operators like $eq, $in, and $gt. It handles moderate-cardinality filters cleanly and the SDK is pleasant. Where it starts to surprise you is high-cardinality filters like tenant_id, or combinations of three or four metadata fields, where latency creeps up faster than the marketing suggests.

pgvector inherits SQL. The same query that combines a vector search with a tenant filter, a date range, and a status check is a single statement:

SELECT id, content, 1 - (embedding <=> $1) AS score
FROM chunks
WHERE tenant_id = $2
  AND published_at > now() - interval '30 days'
  AND status = 'published'
  AND source = ANY($3)
ORDER BY embedding <=> $1
LIMIT 10;

With a btree index on tenant_id and a partial index for published status, the Postgres planner combines them with the HNSW vector index efficiently. The pattern that breaks pgvector is extremely selective filters - when your WHERE clause matches less than 0.1% of rows, the planner sometimes abandons the vector index and goes sequential, which gets slow. The fix is usually a smarter index choice or an explicit SET LOCAL enable_seqscan = off in the transaction.

For complex filtering at moderate scale, pgvector wins because SQL wins. For simple filtering at extreme scale, Pinecone wins because it is just easier.

Hybrid search: BM25 plus vector, both ways

Pure vector search misses keyword matches. Pure BM25 misses semantic matches. Hybrid retrieval - running both and combining scores - is the single biggest quality lift you can ship after basic RAG. Here is how each product handles it.

Pinecone ships hybrid via a separate sparse-dense index. You generate sparse vectors with SPLADE or BM25, store them alongside your dense embeddings, and query the sparse-dense index with both. It works, but you are now maintaining two indexes and the developer experience is rougher than the dense-only flow. Pattern, simplified:

// Pinecone hybrid query
const result = await index.query({
  vector: denseEmbedding,
  sparseVector: { indices: sparseIds, values: sparseValues },
  topK: 10,
  includeMetadata: true,
  alpha: 0.7, // 1 = pure dense, 0 = pure sparse
});

pgvector does hybrid via Postgres full-text search joined with the vector query. More code than Pinecone but uses the database you already have, with no extra moving parts:

-- Hybrid search in pgvector + Postgres FTS
WITH semantic AS (
  SELECT id, 1 - (embedding <=> $1) AS sem_score
  FROM chunks
  ORDER BY embedding <=> $1
  LIMIT 50
),
keyword AS (
  SELECT id,
    ts_rank(content_tsv, plainto_tsquery('english', $2)) AS kw_score
  FROM chunks
  WHERE content_tsv @@ plainto_tsquery('english', $2)
  LIMIT 50
)
SELECT c.id, c.content,
  COALESCE(s.sem_score, 0) * 0.7 +
  COALESCE(k.kw_score, 0) * 0.3 AS score
FROM chunks c
LEFT JOIN semantic s ON s.id = c.id
LEFT JOIN keyword  k ON k.id = c.id
WHERE s.id IS NOT NULL OR k.id IS NOT NULL
ORDER BY score DESC
LIMIT 10;

Tune the 0.7 and 0.3 weights against an eval set, the same way you would tune Pinecone's alpha. The pgvector hybrid path is more lines of code; the Pinecone hybrid path is more infrastructure. Pick the cost you would rather pay.

Ops overhead: the real story

This is the section that should weigh heavier than it usually does. Pinecone's zero-ops claim is genuinely true - you do not back it up, you do not upgrade it, you do not page on it, you do not size it. pgvector inherits whatever ops story your Postgres has. If you are on Supabase or Neon, that is also close to zero. If you are running your own Postgres, it is real.

Quantified, for a production RAG workload at 10 to 50 million vectors:

  • Index tuning. pgvector HNSW parameters m and ef_construction need to be set once, and ef_search tuned per workload. Expect 4 to 8 hours of work and another 2 hours per major dataset change. Pinecone: zero.
  • Vacuum and bloat. Heavy upserts create dead tuples and index bloat in Postgres. Autovacuum handles most of it, but you will hit a case where you need to REINDEX CONCURRENTLY after a large rewrite. Budget 1 to 2 hours per quarter. Pinecone: zero.
  • Monitoring. Vector query latency, recall degradation, and memory pressure are all things you watch on pgvector. Set up dashboards once, plus periodic check-ins. Budget half a day of setup and 30 minutes per week of observation. Pinecone: zero.
  • Backups and restore drills. Inherited from your Postgres provider on pgvector. Do one drill per quarter even if managed. Pinecone: zero, with the caveat that point-in-time recovery is less granular than Postgres PITR.

Call it 2 to 4 engineer-days per quarter for pgvector operations at this scale, 0 for Pinecone. If your fully loaded engineer cost is $1,000 per day, that is $2,000 to $4,000 per quarter, or $700 to $1,300 per month. Add it to the infrastructure cost when comparing.

Multi-tenant isolation

For a SaaS app where different customers store their own embeddings, the isolation pattern matters as much as the performance. Each product has a different default.

Pinecone ships namespaces - a logical partition inside an index, free, with cleanly isolated query and write paths. You query within a namespace and you never see another tenant's vectors. This is the cleanest multi-tenant story in the category and is the single best reason to pick Pinecone for a SaaS RAG product with more than a few hundred tenants.

pgvector does it through row-level security or a tenant_id column with strict filter discipline. RLS is the safer pattern because it enforces isolation at the database layer regardless of what the application code does:

-- Enable RLS on the chunks table
ALTER TABLE chunks ENABLE ROW LEVEL SECURITY;

-- Policy: a tenant can only see its own rows
CREATE POLICY tenant_isolation ON chunks
  USING (tenant_id = current_setting('app.tenant_id')::uuid);

-- In your connection setup, before any query:
-- SET LOCAL app.tenant_id = '<tenant-uuid>';

This works well up to roughly 500 tenants with mixed sizes. Past that, or if you have a few very large tenants alongside many small ones, the per-tenant cardinality starts to hurt the HNSW index quality and Pinecone's namespace model becomes materially better. The sweet spot for pgvector multi-tenancy is small-to-mid B2B SaaS with predictable tenant cardinality.

Migration: Pinecone to pgvector

This is the more common direction in 2026 because most teams who started on Pinecone in 2022 to 2024 are now staring at bills that no longer make sense for their scale. The script below is the one I use, condensed. It assumes you have already provisioned the target Postgres with the vector extension installed.

// scripts/migrate-pinecone-to-pgvector.ts
import { Pinecone } from '@pinecone-database/pinecone';
import postgres from 'postgres';

const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pc.index('production-rag');
const sql = postgres(process.env.POSTGRES_URL!);

const BATCH = 100;
let cursor: string | undefined;
let migrated = 0;

while (true) {
  const list = await index.listPaginated({ limit: BATCH, paginationToken: cursor });
  const ids = list.vectors?.map((v) => v.id) ?? [];
  if (ids.length === 0) break;

  const fetched = await index.fetch(ids);
  const rows = Object.values(fetched.records).map((r) => ({
    id: r.id,
    embedding: `[${r.values.join(',')}]`,
    metadata: r.metadata ?? {},
  }));

  await sql`
    INSERT INTO chunks (id, embedding, metadata)
    SELECT * FROM ${sql(rows)}
    ON CONFLICT (id) DO UPDATE
      SET embedding = EXCLUDED.embedding,
          metadata = EXCLUDED.metadata
  `;

  migrated += rows.length;
  cursor = list.pagination?.next;
  console.log(`Migrated ${migrated}`);
  if (!cursor) break;
}

Three caveats from doing this in anger. First, build the HNSW index after the migration completes - building it incrementally as you insert is 5 to 10 times slower. Second, run with maintenance_work_mem bumped to 1 GB or more before the index build, otherwise the build will be memory-starved and crawl. Third, keep Pinecone in shadow-read mode for a week after the cutover - query both, log disagreements, and only decommission Pinecone after the disagreement rate is what you expected from recall math.

Migration: pgvector to Pinecone

Less common, but it happens when scale forces it. Usually the trigger is a corpus that crossed 100 million vectors and the Postgres team does not want to run a dedicated multi-node deployment. The migration script is structurally similar in the other direction:

// scripts/migrate-pgvector-to-pinecone.ts
import { Pinecone } from '@pinecone-database/pinecone';
import postgres from 'postgres';

const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pc.index('production-rag');
const sql = postgres(process.env.POSTGRES_URL!);

const BATCH = 100;
let last = '00000000-0000-0000-0000-000000000000';

while (true) {
  const rows = await sql`
    SELECT id, embedding::text AS embedding, metadata
    FROM chunks
    WHERE id > ${last}
    ORDER BY id
    LIMIT ${BATCH}
  `;
  if (rows.length === 0) break;

  await index.upsert(
    rows.map((r) => ({
      id: r.id,
      values: JSON.parse(r.embedding),
      metadata: r.metadata,
    }))
  );

  last = rows[rows.length - 1].id;
  console.log(`Upserted through ${last}`);
}

The pattern that makes this easy is having written your application against a thin vector-store interface from day one, the same pattern covered in the vector database comparison. If your app calls vectorStore.query() instead of importing the Pinecone or pgvector SDK directly, the migration is a script plus an environment variable flip. Skip the abstraction and you rewrite your retrieval layer twice in two years.

My picks by scenario

Concrete recommendations I would give a friend at each stage, with the caveats baked in. None of these are absolute - bring an exotic case to a scoping call.

MVP RAG (under 1M vectors): pgvector on Supabase or Neon. The same database that holds your users holds your embeddings. Total infra cost under $30 per month. Ship the product, see if anyone wants it, worry about scale later.

Multi-tenant SaaS: Pinecone if you expect more than 500 tenants or any single tenant might hit tens of millions of vectors - the namespace model is genuinely better and the cost is bearable in that band. pgvector with row-level security if your tenant count and per-tenant size are predictable and modest. The pattern fits cleanly into the agentic RAG designs I use for SaaS clients.

Heavy filtering (3 or more metadata fields per query): pgvector. SQL is the right tool for complex WHERE clauses, the planner handles it, and you avoid the Pinecone metadata-cardinality surprise.

200M+ vectors: Pinecone, unless you have an SRE in the building and a real appetite for owning a large-vector deployment. At this scale pgvector starts to require multi-node tricks and the operational cost stops being worth the savings.

Zero-ops mandate: Pinecone Serverless. Pay the premium, ship the product, do not think about the database again. If your AI team is small and your application team is large, this is almost always the right call. It is the same logic behind picking managed everything when you hire an AI developer in Kosovo or anywhere - engineering time is your scarce resource.

Embeddings-model uncertainty: pgvector. You will change embedding models at least once every 18 months - Voyage to OpenAI to something newer - and reindexing a Postgres table is straightforward while migrating a Pinecone index is a full re-upsert with downtime considerations. If you are still evaluating models, the embedding models comparison is a useful starting point.

OmniAPI case study: kept on pgvector

OmniAPI is one of my products. It runs a developer-facing knowledge base with semantic search across API documentation and integration guides. Current corpus is roughly 4.2 million chunks. It has been on pgvector since day one, on a single Supabase Pro instance with HNSW indexes, and there has been no reason to migrate.

Numbers from the last 30 days of production: p50 retrieval latency 11 ms, p95 31 ms, total Supabase bill including the application database $87 per month. The closest equivalent on Pinecone Serverless at this query volume would be roughly $180 per month, just for the vector store, plus a separate Postgres bill for the application database. The savings are not the point - the operational simplicity is. There is one database to back up, one to monitor, one set of credentials to rotate.

The migration plan, if we ever need it, is also ready. The retrieval layer is behind a 20-line TypeScript interface, and a Pinecone implementation lives in a feature flag for load testing. The day pgvector latency crosses the threshold I have set - p95 over 100 ms sustained - we flip the flag. That is the dividend of designing for migration from day one. If you want to think through that boundary for your own stack, this is the kind of work I cover under AI integration.

Frequently asked questions

These are the questions I get most when teams scope a vector store choice with me. The answers are also embedded as FAQ structured data for search.

Is pgvector or Pinecone better in 2026?

Under 50 million vectors with metadata filtering, pgvector wins on cost, ops simplicity, and joinable SQL data. Above 100 million vectors with no ops appetite, Pinecone Serverless wins because it scales without you thinking about it. Between those points either one is fine and the deciding factor is whether your team prefers paying dollars or paying engineering hours.

What changed for pgvector and Pinecone in 2026?

Pinecone Serverless replaced the old per-pod pricing model and made small deployments far cheaper, while still ramping up steeply past 50 million vectors. pgvector shipped HNSW indexes, better hybrid search via Postgres full-text search, and improved planner behavior under selective filters. Both are now more competitive than the 2023 versions most blog posts still benchmark.

When should I migrate from pgvector to Pinecone?

Three signals: your p95 retrieval latency climbs past 150 ms even with HNSW tuning, your Postgres compute bill grows faster than your application bill, or your team is spending more than two engineer-days a month on vector ops. If any one of those is true and you are above 50 million vectors, migration starts paying back inside a quarter.

When should I migrate from Pinecone to pgvector?

When your Pinecone Serverless bill crosses roughly $400 per month and your dataset is still under 50 million vectors, or when you find yourself fighting metadata filtering limits that SQL would handle in one line. Most teams that migrate report saving 60 to 80 percent on infrastructure and gaining a far simpler stack.

Does pgvector support hybrid search?

Yes, through Postgres full-text search joined with the vector index. It is 10 to 20 lines of SQL with a tunable score weight, and it uses the database you already have. Pinecone supports hybrid through a separate sparse-dense index that you maintain alongside your main one. pgvector is more code; Pinecone is more infrastructure.

Can pgvector handle multi-tenant SaaS workloads?

Yes, with row-level security or a tenant_id column and a strict filter discipline. It works well up to roughly 500 tenants with mixed sizes. Past that, or if you have a few very large tenants and many small ones, Pinecone namespaces give you cleaner isolation. The sweet spot for pgvector multi-tenancy is small-to-mid SaaS with predictable tenant cardinality. For broader patterns see the Supabase pgvector notes.

What does Pinecone actually cost at 10 million vectors?

On Pinecone Serverless in 2026, 10 million 1536-dimension vectors with around 5 queries per second lands in the $120 to $260 per month range, depending on read and write volume. The equivalent on pgvector inside a Supabase Pro plan is roughly $40 to $90 per month including the application database, because the vectors share compute with everything else.

Is pgvector fast enough for real-time RAG?

Yes for most workloads. On HNSW with 1536-dim embeddings and a modest Supabase or Neon instance, p50 retrieval lands at 9 to 18 ms and p95 at 22 to 48 ms up to 10 million vectors. Where it slows down is large selective filters that defeat the index and force a sequential scan, or query rates above a few hundred per second on a single primary.

Closing

pgvector versus Pinecone in 2026 is not the close call most blog posts make it. Under 50 million vectors, pgvector is the right answer for almost everyone with a Postgres team - cheaper, simpler, and good enough on latency. Above 100 million vectors or with a hard zero-ops mandate, Pinecone is the right answer because the operational story stops being a side project. The middle band is where team shape decides: prefer dollars over engineering hours and you pick Pinecone, prefer engineering hours over dollars and you pick pgvector. Build the migration interface from day one regardless of which you choose. Reassess at 10 million vectors and again at 100 million. That is the whole playbook.