The Hidden Indexing Trap: LangChain-Postgres, pgvector, and Custom Vector Stores

Martin Penchev
Martin Penchev
··29 min read

Pgvector is a PostgreSQL extension that adds vector similarity search functionality. It lets you store embedding vectors — high-dimensional numerical representations produced by embedding models — alongside ordinary relational data, and query them by semantic proximity rather than exact match. This makes it a natural fit for retrieval-augmented generation (RAG) systems, semantic search, and any pipeline that needs to find the most relevant examples or documents given a text input.

In production environments, is the standard library for building on top of PostgreSQL with the pgvector extension enabled. Its PGVectorStore provisions its own dedicated embeddings table via init_vectorstore_table, separate from the application's ORM models. It handles embedding insertion and collection management, and exposes retrieval methods including hybrid search and maximum marginal relevance search for diverse result sets. In practice the application ends up with two sets of tables — those managed by the ORM holding canonical relational data, and the embeddings table managed by PGVectorStore acting as the similarity search layer. Configuring these two components together correctly — particularly ensuring that vector columns are properly typed and that indexes match the distance metric used at query time — is where most production issues originate.

Background: Indexes, Column Types, and Distance Metrics

A vector index is a data structure built alongside the embeddings table that allows the database to find the K nearest vectors (nearest neighbours search) to a query without scanning every row. The extension provides two index types:

  1. IVFFlat (Inverted File, flat storage) partitions the vector space into clusters (Voronoi cells) at build time and, at query time, searches only the nearest clusters rather than all rows. Vectors within each cluster are stored and compared without compression (“flat”). It builds quickly but needs periodic rebuilding as data changes.

  2. HNSW (Hierarchical Navigable Small World) builds a multi-layer graph of proximity relationships and traverses it greedily at query time. It is slower to build than IVFFlat but does not need reindexing as data mutates and generally achieves better recall.

Both index types require the embedding column to have an explicit declared dimension, for example vector(1024). pgvector cannot index a bare vector column with no dimension annotation — it needs to know the fixed size to build the index structure.

Both index types also require an operator class — a setting that declares which distance metric the index targets. That choice must align with how you query. Embedding models usually output unit-normalised vectors, where direction carries meaning and magnitude is fixed at 1. For those, cosine distance is the right metric — semantically similar texts have vectors pointing in similar directions, so cosine distance is small.

Cosine distance is defined as one minus the cosine of the angle between two vectors. The formula divides the dot product by the product of both magnitudes, which means it normalises internally and returns an angle-based metric regardless of whether the input vectors are unit-length. Using cosine distance on unnormalised vectors does not produce undefined results — the arithmetic is valid. The concern is whether the metric captures what you want. If an embedding model encodes semantic strength or confidence in the magnitude of its output — a larger-magnitude vector representing a stronger, more prototypical embedding — cosine distance silently discards that information by normalising it away. L2 distance preserves both the direction and the scale of the difference and may better reflect what such a model was trained to represent. For most current text and code embedding models, which produce unit-normalised or near-unit-normalised output, cosine and L2 are approximately equivalent and the choice mainly determines index compatibility rather than result quality. When embeddings are explicitly unnormalised, applying cosine distance retrieves results ranked by angle alone — semantically plausible rankings that ignore half of the geometric information in the embedding.

The pgvector extension defaults to the L2 operator class (vector_l2_ops). If the default index exists but the application runs cosine-distance queries (<=>), PostgreSQL will not use the index — it falls back to a full table scan. To avoid that, specify the operator class matching the semantic search metric (vector_cosine_ops for cosine, vector_l2_ops for euclidean) when creating the index in the initial table migrations.

Causes of Silent Full-Table Scans

In general, three specific configuration errors produce silent full-table scans.

The first is an operator class mismatch. An ORM model that declares an IVFFlat or HNSW index in __table_args__ without specifying an operator class gets vector_l2_ops by default (Euclidean distance). If the application queries with cosine distance (<=>), PostgreSQL creates the index, lists it in \d, and silently ignores it at query time because the operator class does not match — PostgreSQL raises no warning.

The second arises from omitting vector_size when calling init_vectorstore_table. When that parameter is missing or wrong, the provisioning call creates the embedding column as a bare vector with no declared dimension. As described above, pgvector requires a typed column — vector(N) — to create any index. The column accepts vectors of any dimension, computes distances correctly, and scans every row to do it.

The third is a query expression that PostgreSQL cannot match to an index. Computing (1 - (embedding <=> query_vec)) to convert cosine distance into a similarity score in the [0, 1] range wraps the distance operator in arithmetic. PostgreSQL's query optimizer selects an index based on the exact structure of the query; when arithmetic buries the distance operator inside an expression, the optimizer cannot match it to the index and falls back to a full table scan. A correctly configured HNSW index is still bypassed when the distance is not the bare ORDER BY term.

At 20K rows a sequential scan takes 40–80 ms. In a typical RAG pipeline where multiple similarity queries run concurrently per request and the downstream LLM call takes 1–3 seconds, that overhead is easily absorbed and missed. In such cases, running EXPLAIN ANALYZE on the similarity query directly shows the execution plan and real execution statistics — the fastest way to confirm whether the planner chose an index.

In-depth overview

Getting pgvector to use an index requires three layers to agree — the table schema must declare the correct column type, the migration must create an index with the right operator class, and the query must present the distance expression in the exact form the planner expects. The sections below cover each layer in turn.

LangChain-Postgres: Provisioning the Vector Table

When using the langchain-postgres package, PGVectorStore requires the embeddings table to be provisioned explicitly before use via init_vectorstore_table. This call declares the embedding column as vector(N) with an explicit dimension. Passing the wrong vector_size — or skipping the call — leaves the column as bare vector, and no HNSW or IVFFlat index can be built on it.

from langchain_postgres import PGEngine, PGVectorStore
 
engine = PGEngine.from_connection_string(url=connection_string)
 
# Wrong — vector_size must match the embedding model output
engine.init_vectorstore_table(
    table_name="documents",
    vector_size=768,  # mismatched dimension; index will not serve queries
)
 
# Correct — vector_size matches the model; HNSW index is possible
engine.init_vectorstore_table(
    table_name="documents",
    vector_size=1024,
)
 
store = PGVectorStore.create_sync(
    engine=engine,
    table_name="documents",
    embedding_service=my_embeddings,
)

The table name is user-defined and passed consistently to both init_vectorstore_table and PGVectorStore. A typed column rejects dimension mismatches at the database level — a GPU OOM that returns a truncated vector will fail at insert rather than silently corrupt similarity scores.

Implementing Custom Embeddings

LangChain's Embeddings interface is provider-agnostic. Implementing it for a self-hosted async inference service requires bridging to the synchronous PGVectorStore API via a ThreadPoolExecutor where each worker thread creates its own event loop, avoiding event-loop nesting errors.

from langchain_core.embeddings import Embeddings
import asyncio
from typing import List
 
class SelfHostedEmbeddings(Embeddings):
    def __init__(self, client, embedding_dim: int = 1024, batch_size: int = 32):
        self._client = client
        self._dim = embedding_dim
        self._batch_size = batch_size
 
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return self._run(self._client.embed_batch(texts, self._batch_size))
 
    def embed_query(self, text: str) -> List[float]:
        return self._run(self._client.embed_single(text))
 
    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
        return await self._client.embed_batch(texts, self._batch_size)
 
    async def aembed_query(self, text: str) -> List[float]:
        return await self._client.embed_single(text)
 
    def _run(self, coro):
        loop = asyncio.new_event_loop()
        try:
            result = loop.run_until_complete(coro)
        finally:
            loop.close()
        self._validate(result if isinstance(result[0], list) else [result])
        return result
 
    def _validate(self, embeddings: List[List[float]]) -> None:
        for i, emb in enumerate(embeddings):
            if len(emb) != self._dim:
                raise ValueError(
                    f"Embedding at index {i} has {len(emb)} dims, expected {self._dim}. "
                    "Check for GPU OOM or inference timeout."
                )

GPU out-of-memory is more than a theoretical edge case in production embedding pipelines. Embedding models run on GPU for throughput — inference servers process documents in batches to saturate available VRAM. When a batch exceeds VRAM capacity, the CUDA runtime raises an OOM condition inside the inference framework. The caller rarely sees that condition as a Python exception — inference frameworks treat OOM as a recoverable resource constraint and handle it internally, returning a partial batch (embeddings for the first N items only), falling back to zero-filled vectors of the expected dimension, or returning an empty list. From the Python side, the call appears to succeed — the returned value has the right structure. No exception propagates.

With a bare vector column, those defective vectors pass through into the database without any validation. A zero-filled vector stored alongside a real document will be retrieved and ranked in every subsequent similarity query — its cosine distance from any query vector is undefined (division by zero in the normalisation), and pgvector's handling of zero-magnitude vectors during similarity ranking is unspecified, making its presence in results unpredictable. Partial vectors of the wrong dimension would normally trigger a dimension mismatch error during distance computation, but only at query time and only for the specific pairs being compared — not at insert. With vector(1024) the database validates the dimension at insert time, converting a silent data corruption into a detectable failure the moment the defective embedding arrives. The application-level _validate check in the example above catches the zero-magnitude case the typed column cannot — a vector of the correct length but containing only zeros passes dimension validation but signals a failed inference call.

Query Patterns That Bypass Indexes

A common retrieval pattern in RAG and few-shot pipelines embeds a query string, finds the K most similar documents in the vector store, optionally filters by metadata (document type, source, date), and passes the results as context to a language model. The similarity search step is on the hot path — it runs once or several times per user request — so its latency directly affects end-to-end response time.

Any arithmetic wrapper around the distance operator forces a sequential scan on that hot path:

-- Forces sequential scan even with a correct HNSW index
SELECT id, (1 - (embedding <=> $1)) AS similarity
FROM documents
ORDER BY similarity DESC
LIMIT 10;
 
-- Uses the index
SELECT id, (embedding <=> $1) AS distance
FROM documents
ORDER BY distance ASC
LIMIT 10;

PostgreSQL with pgvector enabled can only use an HNSW or IVFFlat index when the query is structured as ORDER BY embedding <=> $1. Any transformation of the distance value must happen in an outer query or in application code. The correct pattern for threshold filtering is a subquery — let the index do the distance ordering in the inner query, apply the threshold in the outer one.

SELECT id, content, 1.0 - distance AS similarity
FROM (
    SELECT id, content,
           (embedding <=> CAST($1 AS vector)) AS distance
    FROM documents
    WHERE metadata->>'category' = $2
    ORDER BY distance ASC
    LIMIT $3   -- fetch_k: larger than final k to account for filtering
) candidates
WHERE distance < $4  -- distance threshold
LIMIT $5;

In addition, parameters such as hnsw.ef_search, ivfflat.probes, and hnsw.iterative_scan control how the index is used at query time (recall vs speed), and whether filtered vector searches can return enough results. One caveat is that setting them at session level is unsafe when using a connection pool — the connection is reused by other requests after the query finishes, so the next request (possibly a different endpoint or a non-vector query) inherits those settings. That can cause hard-to-reproduce bugs or unintended full table scans.

Use SET LOCAL instead so the values apply only to the current transaction and are discarded at COMMIT. Then each similarity query gets the parameters it needs without leaking into other work.

BEGIN;
SET LOCAL hnsw.ef_search = 100;
SET LOCAL hnsw.iterative_scan = 'relaxed_order';
-- run the similarity query here
COMMIT;

PostgreSQL's query optimizer is cost-based — it estimates the work required for each possible plan (index scan vs sequential scan) and chooses the one with the lowest estimated cost. The cost of an HNSW index scan grows with ef_search, because that parameter controls how many candidate nodes are explored during graph traversal — higher values mean more distance computations and more index pages touched. Once that estimated cost exceeds the cost of reading the entire table and computing the distance for every row, the planner switches to a sequential scan. In practice the crossover depends on table size, vector dimension, and index parameters; for example, at ef_search ≈ 400 on a 25K-row table the planner may already prefer a full table scan over HNSW traversal 1. So raising ef_search to improve recall can backfire — the planner abandons the index and every query pays for a sequential scan. That is why the 40–200 range is the practical ceiling for ef_search.

Migrations: CREATE INDEX CONCURRENTLY in Practice

PostgreSQL does not allow CREATE INDEX CONCURRENTLY inside a transaction block. Alembic wraps each migration in a transaction by default, so both constraints apply at once. The direct consequence is that ALTER COLUMN and CREATE INDEX CONCURRENTLY must live in separate Alembic revisions — the column type change is transactional DDL, the index build is not. Placing both in one upgrade() — with autocommit_block() only around the index — leaves Alembic's transaction bookkeeping in an inconsistent state on partial failure and makes the downgrade path harder to reason about.

Migration 1 — type the column (transactional DDL):

# migrations/versions/001_type_embedding_column.py
from alembic import op
 
def upgrade() -> None:
    op.execute(
        "ALTER TABLE documents ALTER COLUMN embedding TYPE vector(1024)"
    )
 
def downgrade() -> None:
    op.execute(
        "ALTER TABLE documents ALTER COLUMN embedding TYPE vector"
    )

Migration 2 — build the HNSW index (non-transactional, depends on migration 1):

# migrations/versions/002_create_embedding_hnsw_index.py
from alembic import op
 
revision = "002"
down_revision = "001"
 
def upgrade() -> None:
    with op.get_context().autocommit_block():
        op.execute("""
            CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_documents_embedding_hnsw
            ON documents
            USING hnsw (embedding vector_cosine_ops)
            WITH (m = 16, ef_construction = 64)
        """)
 
def downgrade() -> None:
    with op.get_context().autocommit_block():
        op.execute(
            "DROP INDEX CONCURRENTLY IF EXISTS ix_documents_embedding_hnsw"
        )

The operator class is explicit — vector_cosine_ops — because omitting it defaults to vector_l2_ops, which produces a dead index for cosine-distance queries. The legacy workaround of manually issuing COMMIT before and BEGIN after the index statement leaves partial state on failure; autocommit_block() is the correct approach.

The autocommit_block driver compatibility trap

autocommit_block() works reliably with psycopg2 but has known failures with newer drivers. With psycopg3 (the psycopg package), entering the block raises:

AssertionError
  alembic/runtime/migration.py, line 330
  assert self._transaction is not None

The assertion fires because Alembic's internal transaction state tracking does not match what psycopg3's connection model reports at the point the block tries to commit and re-enter autocommit mode. The autocommit_block() + asyncpg combination has a separate failure mode — the migration hangs indefinitely rather than raising an error.

In both cases the fix is to fall back to raw psycopg2 for the migration run, or to use transaction_per_migration = True in env.py so each migration file starts with a clean transaction state. Within any driver, keep the block contents minimal — only op.execute() calls. Any work that requires inspecting the database (reading a config value, checking whether the index already exists) should happen before entering the block, where the connection is in a predictable transactional state.

Never declare vector indexes in ORM __table_args__. ORM-level index declarations cannot use CONCURRENTLY, cannot run inside autocommit_block(), and do not validate operator class correctness at migration time.

As a rule of thumb, declare the column in the model and manage the index in Alembic.

Index Types, Operator Classes, and When to Use Each

pgvector is the core PostgreSQL extension — it provides the vector type and the HNSW and IVFFlat index types. Timescale also ships a companion extension, pgvectorscale, that builds on pgvector and adds a DiskANN-based index (and Statistical Binary Quantization) for larger or more dimension-heavy workloads 1:

IndexProviderMax dims (vector)Build (25K, 3072-dim)Index sizeIterative scan
HNSWpgvector2,00029 s193 MBYes
IVFFlatpgvector2,0005 s193 MBYes
DiskANNpgvectorscale16,00049 s21 MBNo

HNSW is the production default for most workloads. IVFFlat is appropriate for small, mostly-static datasets where the faster build time matters more than avoiding periodic reindexing. DiskANN uses Statistical Binary Quantization to achieve a 9× smaller index — the right choice when the HNSW index would exceed shared_buffers or when embeddings exceed 2,000 dimensions.

The operator class on the index must match the distance metric used in queries. The correct choice depends on whether the embedding model normalises its output vectors:

Operator classSQL operatorUse when
vector_cosine_ops<=>Normalised embeddings — most text embedding models
vector_l2_ops<->Unnormalised vectors where magnitude carries meaning
vector_ip_ops<#>Unit-normalised vectors (inner product is equivalent to cosine but avoids the division)
halfvec_cosine_ops<=> on halfvecEmbeddings above 1,024 dims, to stay within the 8 KB page size limit

In Production

Getting the configuration right is the first half. The second is keeping the system correct as the corpus grows, the index ages, and the team changes. This section covers the HNSW trade-offs that emerge at scale, the operational pitfalls that are easy to miss, and the design and monitoring decisions that hold the whole thing together.

HNSW at Scale: Recall Degradation

HNSW recall degrades as the dataset grows beyond what the graph parameters were tuned for 2. A deployment that achieves 99% recall at 10K vectors may drop to 85% at 10M vectors with identical m, ef_construction, and ef_search. Three mechanisms drive this.

Graph fragmentation occurs because HNSW connects each node to its nearest neighbours at insert time. Early nodes end up connected to vectors that are no longer their actual nearest neighbours as the dataset fills in around them. The hubness problem arises in high-dimensional space — a small number of vectors become universal hubs that appear as nearest neighbours to a disproportionately large fraction of others, concentrating graph edges and creating search bottlenecks. Greedy search failure compounds both — the termination condition triggers at a local minimum increasingly far from the global nearest neighbour.

In practice this produces high similarity scores for semantically dissimilar queries, with long-tail queries failing first. At 20K vectors recall is still high with standard parameters; degradation accelerates as the corpus approaches 100K+. Increase m (16 → 32) and ef_construction (64 → 128) before recall drops. A periodic Recall@k benchmark comparing HNSW results against brute-force exact search on a random sample is the most reliable detection method.

Recent work explores alternatives to the standard hierarchy. AQR-HNSW applies density-aware quantization to reduce hubness while maintaining competitive build times 3. "Down with the Hierarchy" demonstrates on 13 large-scale datasets that a flat navigable small world graph achieves identical recall to HNSW for vectors above 32 dimensions, with 38% lower peak memory — challenging whether the hierarchical layers provide any real benefit for production embedding dimensions 4.

Iterative Scans for Filtered Queries

Understanding why filtered vector search fails requires knowing how HNSW and SQL WHERE clauses interact in the query execution engine. HNSW is an index over the vector column only — it has no knowledge of any other column. When PostgreSQL executes a similarity query with a WHERE filter, the planner produces a plan with two separate nodes — an HNSW index scan node that returns row IDs ordered by approximate distance, and a filter node sitting above it that evaluates the WHERE predicate against each candidate row. The filter node processes rows as they stream out of the index scan and discards those that do not satisfy the predicate. When the query has LIMIT K, the engine stops as soon as K rows pass the filter — or stops at the index scan's candidate set boundary if K passing rows were not found.

This two-phase structure means the HNSW index scan fetches its candidates without knowing which ones will pass the filter. It finds the approximate K nearest vectors in the graph, delivers them upstream, and is done. If all K candidates fail the WHERE predicate — because the filter is highly selective and the nearest vectors happen not to satisfy it — the result is empty, even though vectors satisfying both constraints exist elsewhere in the table. No error is raised. The application receives zero results, which is indistinguishable from a legitimate "no match" response. This is the standard failure mode for retrieval pipelines that combine semantic search with metadata filtering.

The pgvector extension addresses this with iterative index scans. Instead of fetching a fixed candidate set and then applying the filter — which can produce zero results — the HNSW traversal continues exploring graph neighbours, checking each candidate against the predicate until it finds K results satisfying the filter or exhausts the scan budget. Enable it per-transaction with SET LOCAL hnsw.iterative_scan = 'relaxed_order'. The relaxed_order mode is faster and sufficient for most retrieval use cases; strict_order preserves exact distance rank at higher cost.

hnsw.max_scan_tuples (default 20,000) controls the budget. A scan that exhausts the budget returns however many results were found — not an error. For low-selectivity filters this approaches sequential scan cost. The trade-off is worth it for most filter cardinalities, but tune the budget against the latency SLO for the most selective paths.

Iterative scans require the filter to be a direct WHERE predicate on the indexed table. A filter applied in an outer query wrapping the vector search cannot benefit from this mechanism — the outer filter sees only the rows already returned by the inner index scan, and the inner scan has no visibility into what the outer query wants to keep. The predicate must be pushed down to the index scan node, which requires it to appear as a direct condition on the same table — not inside a subquery or a lateral join.

Quantization and the 8 KB Page Budget

PostgreSQL stores index entries on 8 KB pages. The maximum vector that fits on a single page determines the dimension ceiling:

float32:8,192 bytes4 bytes/dim2,000 dims effective limit\text{float32}: \frac{8{,}192 \text{ bytes}}{4 \text{ bytes/dim}} \approx 2{,}000 \text{ dims effective limit} float16 (halfvec):8,192 bytes2 bytes/dim4,000 dims effective limit\text{float16 (halfvec)}: \frac{8{,}192 \text{ bytes}}{2 \text{ bytes/dim}} \approx 4{,}000 \text{ dims effective limit}

For 1024-dim embeddings vector(1024) is fine. For 3072-dim embeddings (OpenAI text-embedding-3-large) use halfvec or an expression index on a halfvec cast to stay within the limit and halve storage. At 1M rows the difference between vector(3072) and halfvec(3072) is 6 GB of table storage, plus the HNSW index on halfvec comes in around 7.7 GB versus a DiskANN index at roughly 840 MB — a meaningful gap once the index approaches the size of shared_buffers.

When the HNSW index no longer fits in shared_buffers, every query triggers disk reads and QPS drops sharply. Benchmarks show 2,250 QPS at a 4 GB index falling to 12.9 QPS at 38 GB as the index exceeds available cache 5. DiskANN's SBQ compression keeps the index in memory at scales where HNSW would not fit. The cost is that DiskANN does not currently support iterative scans, and pgvectorscale must be installed as a separate extension.

HNSW Parameter Tuning

Three parameters govern HNSW behaviour. m (default 16) sets the number of connections per node. A denser graph improves recall but doubling m can quadruple build time. ef_construction (default 64) controls how thoroughly the builder constructs the graph; values of 64–128 cover most workloads. ef_search (default 40) is the recall-latency dial at query time — the practical range is 40–200 before PostgreSQL determines that a full table scan is cheaper than continuing the graph traversal 1.

A reasonable starting point is to fix ef_search at 100, then tune m and ef_construction against a recall baseline — comparing HNSW results against brute-force exact search (SET enable_indexscan = off) on a sample of 500–1,000 representative queries until HNSW recall is within 2% of brute-force. From there, reduce ef_search until the latency target is met. When PostgreSQL switches to a sequential scan at higher ef_search values, the right response is to rebuild the index with higher m and ef_construction rather than keep raising ef_search.

Both m and ef_construction directly affect build time, which grows substantially at higher values. pgvector supports parallel index builds using multiple maintenance workers — set max_parallel_maintenance_workers and increase maintenance_work_mem to 1–2 GB before a large build. Parallel HNSW builds have been a source of correctness issues in some releases; consult the pgvector release notes for the version in use before enabling them in production 6.

Operational Pitfalls

Updating non-vector columns on a table with an HNSW index is far slower than expected — observed overhead is 58 ms rising to 6 s for a 10K-row update 7. The update triggers HNSW graph maintenance even though the vector column is unchanged. The mitigation is to keep the HNSW-indexed table append-heavy and separate frequently-updated metadata into a side table joined by primary key.

Insert performance degrades when the HNSW index exceeds maintenance_work_mem. Initial batches of 10K rows take minutes; later batches take tens of minutes as the graph spills to disk 8. Insert all data before creating the index — pgvector's documentation recommends this, and it produces better graph quality than incremental inserts into an existing index.

QPS degrades sharply when the index grows beyond shared_buffers. At that point every query requires disk reads to traverse the graph, and throughput approaches single-digit QPS even on fast storage 5. Monitor the ratio of idx_blks_hit to idx_blks_hit + idx_blks_read on the HNSW index; a sustained cache-hit rate below 90% under steady-state load is the leading indicator.

IVFFlat computes centroids once at build time. As the dataset mutates, recall degrades. Rebuild the index when more than 30% of rows have changed since the last build, or when recall drops without a configuration change. REINDEX CONCURRENTLY rebuilds without a full table lock on PostgreSQL 15+.

Design Principles for pgvector in Production

Always declare vector(N) with an explicit dimension — in every ORM model, in every init_vectorstore_table call, in every Alembic migration. A bare vector column is unindexable. Pass vector_size to init_vectorstore_table and verify the column type with \d <your_table> after deployment.

Always match the operator class to the distance metric at index creation time. vector_cosine_ops for <=>, vector_l2_ops for <->, vector_ip_ops for <#>. Validate with a catalog query against pg_opclass and pg_am after running migrations.

Never create vector indexes inside ORM __table_args__ or inline in init_vectorstore_table. Use Alembic migrations with raw SQL and autocommit_block() for CONCURRENTLY builds.

Use SET LOCAL inside transactions for all vector GUC parameters (hnsw.ef_search, hnsw.iterative_scan, ivfflat.probes). Session-level settings persist across pool checkouts.

Enable hnsw.iterative_scan = 'relaxed_order' whenever vector search is combined with WHERE filters. Without it, highly selective filters silently return zero results.

Insert data before creating the HNSW index. Build time is faster and graph quality is better than inserting into an existing index.

Observability

Silent full-table scans on vector columns are invisible to standard application monitoring. Running EXPLAIN (ANALYZE, BUFFERS) on the similarity query is the fastest diagnostic. Adding an automated CI check that runs EXPLAIN on representative vector queries and fails on Seq Scan catches regressions before deployment.

For production monitoring, the index inventory and usage rate are available directly from the PostgreSQL catalog:

SELECT
    s.relname AS table_name,
    s.indexrelname AS index_name,
    s.idx_scan AS total_scans,
    pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
    ROUND(
        100.0 * s.idx_blks_hit / NULLIF(s.idx_blks_hit + s.idx_blks_read, 0),
        1
    ) AS cache_hit_pct
FROM pg_stat_user_indexes s
WHERE s.indexrelname ILIKE '%embedding%'
ORDER BY s.idx_scan ASC;

idx_scan = 0 after 24 hours in production indicates a dead index. cache_hit_pct below 90% under steady load indicates the HNSW index has grown beyond shared_buffers. PostgreSQL reports build progress in pg_stat_progress_create_index during index creation — DiskANN is the exception and does not report to this view.

A common variant of the inventory query casts the index OID directly to a name:

-- Unreliable outside the default search_path
SELECT indexrelid::regclass::text AS index_name
FROM pg_index i
JOIN pg_class c ON c.oid = i.indrelid
WHERE c.relname = 'documents';

The ::regclass::text cast resolves the OID using the current search_path. In a non-default schema, or when search_path is restricted to '' as a hardening measure, the cast either returns an unqualified name that cannot be used in subsequent DDL or raises ERROR: relation with OID ... does not exist. The correct query joins pg_class and pg_namespace explicitly and reads c.relname directly from the catalog — a value unaffected by search_path:

SELECT
    c.relname AS index_name,
    am.amname AS index_type,
    pg_get_indexdef(i.indexrelid) AS index_definition
FROM pg_index i
JOIN pg_class c  ON c.oid = i.indexrelid
JOIN pg_class t  ON t.oid = i.indrelid
JOIN pg_am am    ON am.oid = c.relam
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE t.relname = 'documents'
  AND n.nspname = current_schema();

Pair this with the pg_stat_user_indexes query above for a complete picture of index existence and usage.

Making the zero-scan check executable in tests

idx_scan is the most actionable of the observability signals because a test suite can assert it against the migration target database — not just monitor it in production after the fact. Apply migrations in a test database, run one representative similarity query to give the planner an opportunity to use the index, then assert the scan counter is non-zero.

def test_hnsw_index_is_used(db_conn, sample_query_vector):
    # Run one similarity query so the planner has a chance to use the index.
    db_conn.execute(
        "SELECT id FROM documents ORDER BY embedding <=> %s LIMIT 5",
        (sample_query_vector,),
    )
    row = db_conn.execute(
        """SELECT idx_scan
           FROM pg_stat_user_indexes
           WHERE indexrelname = 'ix_documents_embedding_hnsw'"""
    ).fetchone()
    assert row is not None, "Index ix_documents_embedding_hnsw not found in catalog"
    assert row["idx_scan"] > 0, (
        "HNSW index has zero scans after a similarity query — "
        "check operator class, column dimension, and query structure"
    )

pg_stat_user_indexes counts scans since the last server start or statistics reset. In a fresh test database, any non-zero count after a single query confirms the planner chose the index. A zero count means the query fell back to a sequential scan somewhere in the three-layer configuration — wrong operator class, bare vector column, or a non-sargable query expression.

For stricter validation, run EXPLAIN (FORMAT JSON) on the same similarity query and assert that the top-level plan node type is Index Scan, not Seq Scan. This distinguishes a genuine index use from a coincidental non-zero scan count.

import json
 
def test_hnsw_index_plan(db_conn, sample_query_vector):
    result = db_conn.execute(
        "EXPLAIN (FORMAT JSON) SELECT id FROM documents ORDER BY embedding <=> %s LIMIT 5",
        (sample_query_vector,),
    ).fetchone()
    plan = json.loads(result[0])[0]["Plan"]
    assert plan["Node Type"] != "Seq Scan", (
        f"Expected Index Scan, got {plan['Node Type']} — index is being bypassed"
    )

Both tests belong in the migration test suite, run after every schema migration, not only in production post-deployment checks. Catching a dead index in CI — before the change reaches production — turns the idx_scan signal from a reactive alert into a preventive gate.

Standard PostgreSQL dashboards do not surface vector-specific metrics — recall degradation, ef_search misconfiguration, and operator class mismatches are all invisible to generic monitoring. Explicit alerts are needed for zero-scan indexes after deployments, for recall drift detected by a nightly brute-force comparison, and for latency SLO violations on the similarity search endpoint.

Conclusion

The three failures that started this investigation — a missing vector dimension, an operator class mismatch, and a non-sargable distance expression — share one frustrating property. Nothing breaks. PostgreSQL creates the index without complaint. The query runs and returns results. The application gets similarity scores. Everything looks fine until you run EXPLAIN ANALYZE and see that the index has never been touched.

That is what makes these problems easy to miss. The fix is not complicated, but it requires keeping three things consistent at the same time — the column type, the index operator class, and how the query presents the distance expression to the planner. Change one without checking the others and you trade one silent scan for another.

The same thing plays out as the data grows. Recall quietly drops. The index quietly gets abandoned when ef_search makes it more expensive than a table scan. Filtered queries quietly return nothing when the candidate set is too small. None of it surfaces as an error. In a pgvector deployment, things can be silently wrong for a long time — which is why the monitoring needs to be deliberate and specific, not just inherited from a generic PostgreSQL dashboard.

Footnotes

  1. Boutaga, B. (2026). "pgvector, a guide for DBA — Part 2: Indexes (update March 2026)." dbi-services Blog. https://www.dbi-services.com/blog/pgvector-a-guide-for-dba-part-2-indexes-update-march-2026/ 2 3

  2. Sarkar, P. (2026). "HNSW at Scale: Why Your RAG System Gets Worse as the Vector Database Grows." Towards Data Science, Jan 7, 2026. https://towardsdatascience.com/hnsw-at-scale-why-your-rag-system-gets-worse-as-the-vector-database-grows/

  3. Li, H. et al. (2026). "AQR-HNSW: Accelerating Approximate Nearest Neighbor Search via Density-aware Quantization and Multi-stage Re-ranking." arXiv:2602.21600 (submitted to DAC 2026). https://arxiv.org/abs/2602.21600

  4. Munyampirwa, B., Lakshman, V., & Coleman, B. (2025). "Down with the Hierarchy: The 'H' in HNSW Stands for Hubs." arXiv:2412.01940. Poster, 1st Workshop on Vector Databases at ICML 2025. https://arxiv.org/abs/2412.01940

  5. pgvector contributors. (2025). "HNSW QPS Degradation as Index Size Grows Beyond Memory." pgvector Issue #700. https://github.com/pgvector/pgvector/issues/700 2

  6. pgvector contributors. (2026). "pgvector 0.8.2 released." PostgreSQL News, Feb 26, 2026. Fixes CVE-2026-3172 buffer overflow in parallel HNSW index builds. https://www.postgresql.org/about/news/pgvector-082-released-3245/

  7. pgvector contributors. (2024). "UPDATE performance degradation with HNSW index on non-vector columns." pgvector Issue #875. https://github.com/pgvector/pgvector/issues/875

  8. pgvector contributors. (2024). "HNSW Indexing Times Issue." pgvector Issue #430. https://github.com/pgvector/pgvector/issues/430