The Hidden Indexing Trap: LangChain-Postgres, pgvector, and Custom Vector Stores
Pgvector is a PostgreSQL extension that adds vector similarity search functionality. It lets you store embedding vectors — high-dimensional numerical representations produced by embedding models — alongside ordinary relational data, and query them by semantic proximity rather than exact match. This makes it a natural fit for retrieval-augmented generation (RAG) systems, semantic search, and any pipeline that needs to find the most relevant examples or documents given a text input.
In production environments, is the standard library for building on top of PostgreSQL with the pgvector extension enabled. Its PGVectorStore provisions its own dedicated embeddings table via init_vectorstore_table, separate from the application's ORM models. It handles embedding insertion and collection management, and exposes retrieval methods including hybrid search and maximum marginal relevance search for diverse result sets. In practice the application ends up with two sets of tables: those managed by the ORM holding canonical relational data, and the embeddings table managed by PGVectorStore acting as the similarity search layer. Configuring these two components together correctly — particularly ensuring that vector columns are properly typed and that indexes match the distance metric used at query time — is where most production issues originate.
Background: Indexes, Column Types, and Distance Metrics
A vector index is a data structure built alongside the embeddings table that allows the database to find the K nearest vectors (nearest neighbours search) to a query without scanning every row. The extension provides two index types:
-
IVFFlat (Inverted File, flat storage) partitions the vector space into clusters (Voronoi cells) at build time and, at query time, searches only the nearest clusters rather than all rows. Vectors within each cluster are stored and compared without compression (“flat”). It builds quickly but needs periodic rebuilding as data changes.
-
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph of proximity relationships and traverses it greedily at query time. It is slower to build than IVFFlat but does not need reindexing as data mutates and generally achieves better recall.
Both index types require the embedding column to have an explicit declared dimension, for example vector(1024). A bare vector column with no dimension annotation cannot be indexed — pgvector needs to know the fixed size to build the index structure.
Both index types also require an operator class: a setting that declares which distance metric the index is built for. That choice must align with how you query. Embedding models usually output unit-normalised vectors, where direction carries meaning and magnitude is fixed at 1. For those, cosine distance is the right metric — semantically similar texts have vectors pointing in similar directions, so cosine distance is small. On the other hands, Euclidean distance (L2) measures straight-line distance and on unit-normalised vectors it is equivalent to cosine. On unnormalised data it mixes direction and magnitude in a way that rarely matches what one would like to obtain from semantic search.
The pgvector extension defaults to the L2 operator class (vector_l2_ops). If the default index is build but cosine-distance queries (<=>) are executed, PostgreSQL will not use the index — it will fall back to a full table scan. To avoid that, the operator class that matches semantic search metric (vector_cosine_ops for cosine and vector_l2_ops for euclidean) must be specified when creating the index in the initial table migrations.
Causes of Silent Full-Table Scans
In general, three specific configuration errors produce silent full-table scans.
The first is an operator class mismatch. An ORM model that declares an IVFFlat or HNSW index in __table_args__ without specifying an operator class gets vector_l2_ops by default (Euclidean distance). If the application queries with cosine distance (<=>), PostgreSQL creates the index, lists it in \d, and silently ignores it at query time because the operator class does not match with no warning being raised.
The second arises from omitting vector_size when calling init_vectorstore_table. When that parameter is missing or wrong, the embedding column is created as a bare vector with no declared dimension. As described above, pgvector requires a typed column — vector(N) — to create any index. The column accepts vectors of any dimension, computes distances correctly, and scans every row to do it.
The third is a query expression that PostgreSQL cannot match to an index. Computing (1 - (embedding <=> query_vec)) to convert cosine distance into a similarity score in the [0, 1] range wraps the distance operator in arithmetic. PostgreSQL's query optimizer selects an index based on the exact structure of the query; when the distance operator is buried inside an expression, the optimizer cannot match it to the index and falls back to a full table scan. A correctly configured HNSW index is still bypassed when the distance is not the bare ORDER BY term.
At 20K rows a sequential scan takes 40–80 ms. In a typical RAG pipeline where multiple similarity queries run concurrently per request and the downstream LLM call takes 1–3 seconds, that overhead is easily absorbed and missed. In such cases, the specific PostgreSQL command EXPLAIN ANALYZE on the similarity query directly shows the execution plan and real execution statistics. This is the fastest way to confirm whether an index is being used.
LangChain-Postgres: Provisioning the Vector Table
When the langchain-postgres package is use the inner PGVectorStore class requires the embeddings table to be provisioned explicitly before use via init_vectorstore_table. This call declares the embedding column as vector(N) with an explicit dimension. Passing the wrong vector_size — or skipping the call — leaves the column as bare vector, and no HNSW or IVFFlat index can be built on it.
from langchain_postgres import PGEngine, PGVectorStore
engine = PGEngine.from_connection_string(url=connection_string)
# Wrong — vector_size must match the embedding model output
engine.init_vectorstore_table(
table_name="documents",
vector_size=768, # mismatched dimension; index will not serve queries
)
# Correct — vector_size matches the model; HNSW index is possible
engine.init_vectorstore_table(
table_name="documents",
vector_size=1024,
)
store = PGVectorStore.create_sync(
engine=engine,
table_name="documents",
embedding_service=my_embeddings,
)The table name is user-defined and passed consistently to both init_vectorstore_table and PGVectorStore. A typed column rejects dimension mismatches at the database level — a GPU OOM that returns a truncated vector will fail at insert rather than silently corrupt similarity scores.
Implementing Custom Embeddings
LangChain's Embeddings interface is provider-agnostic. Implementing it for a self-hosted async inference service requires bridging to the synchronous PGVectorStore API via a ThreadPoolExecutor where each worker thread creates its own event loop, avoiding event-loop nesting errors.
from langchain_core.embeddings import Embeddings
import asyncio
from typing import List
class SelfHostedEmbeddings(Embeddings):
def __init__(self, client, embedding_dim: int = 1024, batch_size: int = 32):
self._client = client
self._dim = embedding_dim
self._batch_size = batch_size
def embed_documents(self, texts: List[str]) -> List[List[float]]:
return self._run(self._client.embed_batch(texts, self._batch_size))
def embed_query(self, text: str) -> List[float]:
return self._run(self._client.embed_single(text))
async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
return await self._client.embed_batch(texts, self._batch_size)
async def aembed_query(self, text: str) -> List[float]:
return await self._client.embed_single(text)
def _run(self, coro):
loop = asyncio.new_event_loop()
try:
result = loop.run_until_complete(coro)
finally:
loop.close()
self._validate(result if isinstance(result[0], list) else [result])
return result
def _validate(self, embeddings: List[List[float]]) -> None:
for i, emb in enumerate(embeddings):
if len(emb) != self._dim:
raise ValueError(
f"Embedding at index {i} has {len(emb)} dims, expected {self._dim}. "
"Check for GPU OOM or inference timeout."
)GPU OOM does not raise a Python exception — it returns empty or truncated vectors. With a bare vector column those propagate silently into the database and produce garbage similarity scores. With vector(1024) the database rejects them at insert time. Both the application-level dimension check and the typed column are required.
Query Patterns That Bypass Indexes
A common retrieval pattern in RAG and few-shot pipelines is: embed a query string, find the K most similar documents in the vector store, optionally filter by metadata (document type, source, date), and pass the results as context to a language model. The similarity search step is on the hot path — it runs once or several times per user request — so its latency directly affects end-to-end response time.
Any arithmetic wrapper around the distance operator forces a sequential scan on that hot path:
-- Forces sequential scan even with a correct HNSW index
SELECT id, (1 - (embedding <=> $1)) AS similarity
FROM documents
ORDER BY similarity DESC
LIMIT 10;
-- Uses the index
SELECT id, (embedding <=> $1) AS distance
FROM documents
ORDER BY distance ASC
LIMIT 10;PostgreSQL with pgvector enabled can only use an HNSW or IVFFlat index when the query is structured as ORDER BY embedding <=> $1. Any transformation of the distance value must happen in an outer query or in application code. The correct pattern for threshold filtering is a subquery: let the index do the distance ordering in the inner query, apply the threshold in the outer one.
SELECT id, content, 1.0 - distance AS similarity
FROM (
SELECT id, content,
(embedding <=> CAST($1 AS vector)) AS distance
FROM documents
WHERE metadata->>'category' = $2
ORDER BY distance ASC
LIMIT $3 -- fetch_k: larger than final k to account for filtering
) candidates
WHERE distance < $4 -- distance threshold
LIMIT $5;In addition, parameters such as hnsw.ef_search, ivfflat.probes, and hnsw.iterative_scan control how the index is used at query time (recall vs speed), and whether filtered vector searches can return enough results. One caveat is that setting them at session level is unsafe when using a connection pool: the connection is reused by other requests after the query finishes, so the next request (possibly a different endpoint or a non-vector query) inherits those settings. That can cause hard-to-reproduce bugs or unintended full table scans.
Use SET LOCAL instead so the values apply only to the current transaction and are discarded at COMMIT. Then each similarity query gets the parameters it needs without leaking into other work.
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SET LOCAL hnsw.iterative_scan = 'relaxed_order';
-- run the similarity query here
COMMIT;PostgreSQL's query optimizer is cost-based: it estimates the work required for each possible plan (index scan vs sequential scan) and chooses the one with the lowest estimated cost. The cost of an HNSW index scan grows with ef_search, because that parameter controls how many candidate nodes are explored during graph traversal — higher values mean more distance computations and more index pages touched. Once that estimated cost exceeds the cost of reading the entire table and computing the distance for every row, the planner switches to a sequential scan. In practice the crossover depends on table size, vector dimension, and index parameters; for example, at ef_search ≈ 400 on a 25K-row table the planner may already prefer a full table scan over HNSW traversal 1. So raising ef_search to improve recall can backfire: the index is abandoned and every query pays for a sequential scan. Thus, ef_search in the 40–200 range is justified.
Migrations: CREATE INDEX CONCURRENTLY in Practice
PostgreSQL does not allow CREATE INDEX CONCURRENTLY inside a transaction block; that rule applies regardless of how you run migrations.
For example, Alembic wraps each migration in a transaction by default, so op.execute("CREATE INDEX CONCURRENTLY ...") inside a standard migration raises an error. The solution is autocommit_block(), which steps outside the transaction for the enclosed block:
from alembic import op
def upgrade() -> None:
# ALTER COLUMN requires a transaction — do not use autocommit_block here
op.execute("ALTER TABLE documents ALTER COLUMN embedding TYPE vector(1024)")
# CONCURRENTLY cannot run inside a transaction
with op.get_context().autocommit_block():
op.execute("""
CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64)
""")
def downgrade() -> None:
with op.get_context().autocommit_block():
op.execute(
"DROP INDEX CONCURRENTLY IF EXISTS ix_documents_embedding_hnsw"
)The operator class is explicit — vector_cosine_ops — because omitting it defaults to vector_l2_ops, which produces a dead index for cosine-distance queries. The legacy workaround of manually issuing COMMIT before and BEGIN after the index statement leaves partial state on failure; autocommit_block() is the correct approach.
Never declare vector indexes in ORM __table_args__. ORM-level index declarations cannot use CONCURRENTLY, cannot run inside autocommit_block(), and do not validate operator class correctness at migration time.
As a rule of a thumb, declare the column in the model and manage the index in Alembic.
Index Types, Operator Classes, and When to Use Each
pgvector is the core PostgreSQL extension: it provides the vector type and the HNSW and IVFFlat index types. In addition, it exists another extension from Timescale pgvectorscale that builds on pgvector and adds a DiskANN-based index (and Statistical Binary Quantization) for larger or more dimension-heavy workloads; 1:
| Index | Provider | Max dims (vector) | Build (25K, 3072-dim) | Index size | Iterative scan |
|---|---|---|---|---|---|
| HNSW | pgvector | 2,000 | 29 s | 193 MB | Yes |
| IVFFlat | pgvector | 2,000 | 5 s | 193 MB | Yes |
| DiskANN | pgvectorscale | 16,000 | 49 s | 21 MB | No (0.9.0) |
HNSW is the production default for most workloads. IVFFlat is appropriate for small, mostly-static datasets where the faster build time matters more than avoiding periodic reindexing. DiskANN uses Statistical Binary Quantization to achieve a 9× smaller index — the right choice when the HNSW index would exceed shared_buffers or when embeddings exceed 2,000 dimensions.
The operator class on the index must match the distance metric used in queries. The correct choice depends on whether the embedding model normalises its output vectors:
| Operator class | SQL operator | Use when |
|---|---|---|
vector_cosine_ops | <=> | Normalised embeddings — most text embedding models |
vector_l2_ops | <-> | Unnormalised vectors where magnitude carries meaning |
vector_ip_ops | <#> | Unit-normalised vectors (inner product is equivalent to cosine but avoids the division) |
halfvec_cosine_ops | <=> on halfvec | Embeddings above 1,024 dims, to stay within the 8 KB page size limit |
HNSW at Scale: Recall Degradation
HNSW recall degrades as the dataset grows beyond what the graph parameters were tuned for 2. A deployment that achieves 99% recall at 10K vectors may drop to 85% at 10M vectors with identical m, ef_construction, and ef_search. Three mechanisms drive this.
Graph fragmentation occurs because HNSW connects each node to its nearest neighbours at insert time. Early nodes end up connected to vectors that are no longer their actual nearest neighbours as the dataset fills in around them. The hubness problem arises in high-dimensional space: a small number of vectors become universal hubs that appear as nearest neighbours to a disproportionately large fraction of others, concentrating graph edges and creating search bottlenecks. Greedy search failure compounds both: the termination condition triggers at a local minimum increasingly far from the global nearest neighbour.
In practice this produces high similarity scores for semantically dissimilar queries, with long-tail queries failing first. At 20K vectors recall is still high with standard parameters; degradation accelerates as the corpus approaches 100K+. Increase m (16 → 32) and ef_construction (64 → 128) before recall drops. A periodic Recall@k benchmark comparing HNSW results against brute-force exact search on a random sample is the most reliable detection method.
Recent work explores alternatives to the standard hierarchy. AQR-HNSW applies density-aware quantization to reduce hubness while maintaining competitive build times 3. "Down with the Hierarchy" demonstrates on 13 large-scale datasets that a flat navigable small world graph achieves identical recall to HNSW for vectors above 32 dimensions, with 38% lower peak memory — challenging whether the hierarchical layers provide any real benefit for production embedding dimensions 4.
Iterative Scans for Filtered Queries
When a vector search is combined with a highly selective WHERE filter, the HNSW index may return K candidates of which none satisfy the filter, producing an empty result set even though matching vectors exist in the table. This is the standard failure mode for retrieval pipelines that combine semantic search with metadata filtering.
pgvector 0.8+ introduces iterative index scans. Instead of fetching a fixed candidate set and then applying the filter — which can produce zero results — the search expands iteratively until K results satisfying the filter are found or the scan budget is exhausted. Enable it per-transaction with SET LOCAL hnsw.iterative_scan = 'relaxed_order'. The relaxed_order mode is faster and sufficient for most retrieval use cases; strict_order preserves exact distance rank at higher cost.
The budget is controlled by hnsw.max_scan_tuples (default 20,000). A scan that exhausts the budget returns however many results were found — not an error. For low-selectivity filters this approaches sequential scan cost. The trade-off is worth it for most filter cardinalities, but tune the budget against the latency SLO for the most selective paths. Filters applied in a subquery cannot benefit from iterative scans — the filter must be a direct WHERE predicate on the indexed table, not a condition in a nested query.
Quantization and the 8 KB Page Budget
PostgreSQL stores index entries on 8 KB pages. The maximum vector that fits on a single page determines the dimension ceiling:
For 1024-dim embeddings vector(1024) is fine. For 3072-dim embeddings (OpenAI text-embedding-3-large) use halfvec or an expression index on a halfvec cast to stay within the limit and halve storage. At 1M rows the difference between vector(3072) and halfvec(3072) is 6 GB of table storage, plus the HNSW index on halfvec comes in around 7.7 GB versus a DiskANN index at roughly 840 MB — a meaningful gap once the index approaches the size of shared_buffers.
When the HNSW index no longer fits in shared_buffers, every query triggers disk reads and QPS drops sharply. Benchmarks show 2,250 QPS at a 4 GB index falling to 12.9 QPS at 38 GB as the index exceeds available cache 5. DiskANN's SBQ compression keeps the index in memory at scales where HNSW would not fit. The cost is that iterative scans are not supported in pgvectorscale 0.9.0 and the extension must be installed separately.
HNSW Parameter Tuning
Three parameters govern HNSW behaviour. m (default 16) sets the number of connections per node. A denser graph improves recall but doubling m can quadruple build time. ef_construction (default 64) controls how thoroughly the graph is built; values of 64–128 cover most workloads. ef_search (default 40) is the recall-latency dial at query time — the practical range is 40–200 before PostgreSQL determines that a full table scan is cheaper than continuing the graph traversal 1.
The tuning workflow: establish a recall baseline by comparing HNSW results to brute-force exact search (SET enable_indexscan = off) on a sample of 500–1,000 queries. Fix ef_search at 100, then tune m and ef_construction until HNSW recall is within 2% of brute-force. Reduce ef_search until the latency SLO is met. When PostgreSQL switches to a sequential scan at higher ef_search values, the right response is to rebuild the index with higher m and ef_construction, not to keep raising ef_search.
pgvector 0.8 introduced parallel index builds with up to 7.4× speedup using multiple workers. Set max_parallel_maintenance_workers and increase maintenance_work_mem to 1–2 GB before a large build. Note that pgvector 0.6.0–0.8.1 contain a buffer overflow vulnerability (CVE-2026-3172, CVSS 8.1) in parallel HNSW builds that can leak data or crash the server — update to 0.8.2+ before enabling parallel builds in production 6.
Operational Pitfalls
Updating non-vector columns on a table with an HNSW index is far slower than expected — observed overhead is 58 ms rising to 6 s for a 10K-row update 7. The update triggers HNSW graph maintenance even though the vector column is unchanged. The mitigation is to keep the HNSW-indexed table append-heavy and separate frequently-updated metadata into a side table joined by primary key.
Insert performance degrades when the HNSW index exceeds maintenance_work_mem. Initial batches of 10K rows take minutes; later batches take tens of minutes as the graph spills to disk 8. Insert all data before creating the index — pgvector's documentation recommends this, and it produces better graph quality than incremental inserts into an existing index.
QPS degrades sharply when the index grows beyond shared_buffers. At that point every query requires disk reads to traverse the graph, and throughput approaches single-digit QPS even on fast storage 5. Monitor the ratio of idx_blks_hit to idx_blks_hit + idx_blks_read on the HNSW index; a sustained cache-hit rate below 90% under steady-state load is the leading indicator.
IVFFlat centroids are computed once at build time. As the dataset mutates, recall degrades. Rebuild the index when more than 30% of rows have changed since the last build, or when recall drops without a configuration change. REINDEX CONCURRENTLY rebuilds without a full table lock on PostgreSQL 15+.
Design Principles for pgvector in Production
Always declare vector(N) with an explicit dimension — in every ORM model, in every init_vectorstore_table call, in every Alembic migration. A bare vector column is unindexable. Pass vector_size to init_vectorstore_table and verify the column type with \d <your_table> after deployment.
Always match the operator class to the distance metric at index creation time. vector_cosine_ops for <=>, vector_l2_ops for <->, vector_ip_ops for <#>. Validate with a catalog query against pg_opclass and pg_am after running migrations.
Never create vector indexes inside ORM __table_args__ or inline in init_vectorstore_table. Use Alembic migrations with raw SQL and autocommit_block() for CONCURRENTLY builds.
Use SET LOCAL inside transactions for all vector GUC parameters (hnsw.ef_search, hnsw.iterative_scan, ivfflat.probes). Session-level settings persist across pool checkouts.
Enable hnsw.iterative_scan = 'relaxed_order' whenever vector search is combined with WHERE filters. Without it, highly selective filters silently return zero results.
Insert data before creating the HNSW index. Build time is faster and graph quality is better than inserting into an existing index.
Observability
Silent full-table scans on vector columns are invisible to standard application monitoring. Running EXPLAIN (ANALYZE, BUFFERS) on the similarity query is the fastest diagnostic. Adding an automated CI check that runs EXPLAIN on representative vector queries and fails on Seq Scan catches regressions before deployment.
For production monitoring, the index inventory and usage rate are available directly from the PostgreSQL catalog:
SELECT
s.relname AS table_name,
s.indexrelname AS index_name,
s.idx_scan AS total_scans,
pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
ROUND(
100.0 * s.idx_blks_hit / NULLIF(s.idx_blks_hit + s.idx_blks_read, 0),
1
) AS cache_hit_pct
FROM pg_stat_user_indexes s
WHERE s.indexrelname ILIKE '%embedding%'
ORDER BY s.idx_scan ASC;idx_scan = 0 after 24 hours in production indicates a dead index. cache_hit_pct below 90% under steady load indicates the HNSW index has grown beyond shared_buffers. Build progress is visible in pg_stat_progress_create_index during index creation — DiskANN is the exception and does not report to this view.
Standard PostgreSQL dashboards do not surface vector-specific metrics: recall degradation, ef_search misconfiguration, and operator class mismatches are all invisible to generic monitoring. Explicit alerts are needed for zero-scan indexes after deployments, for recall drift detected by a nightly brute-force comparison, and for latency SLO violations on the similarity search endpoint.
Footnotes
-
Boutaga, B. (2026). "pgvector, a guide for DBA — Part 2: Indexes (update March 2026)." dbi-services Blog. https://www.dbi-services.com/blog/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026/ ↩ ↩2 ↩3
-
Sarkar, P. (2026). "HNSW at Scale: Why Your RAG System Gets Worse as the Vector Database Grows." Towards Data Science, Jan 7, 2026. https://towardsdatascience.com/hnsw-at-scale-why-your-rag-system-gets-worse-as-the-vector-database-grows/ ↩
-
Li, H. et al. (2026). "AQR-HNSW: Accelerating Approximate Nearest Neighbor Search via Density-aware Quantization and Multi-stage Re-ranking." arXiv:2602.21600 (submitted to DAC 2026). https://arxiv.org/abs/2602.21600 ↩
-
Munyampirwa, B., Lakshman, V., & Coleman, B. (2025). "Down with the Hierarchy: The 'H' in HNSW Stands for Hubs." arXiv:2412.01940. Poster, 1st Workshop on Vector Databases at ICML 2025. https://arxiv.org/abs/2412.01940 ↩
-
pgvector contributors. (2025). "HNSW QPS Degradation as Index Size Grows Beyond Memory." pgvector Issue #700. https://github.com/pgvector/pgvector/issues/700 ↩ ↩2
-
pgvector contributors. (2026). "pgvector 0.8.2 released." PostgreSQL News, Feb 26, 2026. Fixes CVE-2026-3172 buffer overflow in parallel HNSW index builds. https://www.postgresql.org/about/news/pgvector-082-released-3245/ ↩
-
pgvector contributors. (2024). "UPDATE performance degradation with HNSW index on non-vector columns." pgvector Issue #875. https://github.com/pgvector/pgvector/issues/875 ↩
-
pgvector contributors. (2024). "HNSW Indexing Times Issue." pgvector Issue #430. https://github.com/pgvector/pgvector/issues/430 ↩