Elasticsearch vs PostgreSQL Full-Text Search: Which One to Use? ================================================================
Full-text search is one of the most common and complex features in modern applications. When designing search functionality you must weigh trade-offs among relevance, features, scalability, operational complexity, cost, and consistency. Two frequently compared options are:
- Elasticsearch (distributed search engine built on Lucene)
- PostgreSQL Full-Text Search (FTS) — built-in capabilities in a general-purpose RDBMS
This article is a comprehensive, practical deep dive that covers history, underlying theory, architecture, features, practical examples, tuning, scaling, integration patterns, operational considerations, and decision guidance for choosing between Elasticsearch and PostgreSQL FTS — including modern concerns like embeddings and hybrid search.
Table of contents
- Quick summary / TL;DR
- Historical context
- Core concepts and theoretical foundations
- Inverted index and tokenization
- Stemming, stopwords, synonyms
- Ranking (BM25, tf-idf, ts_rank)
- Fuzzy matching, n-grams, trigram similarity
- Vector search and hybrid retrieval
- Elasticsearch: overview, architecture, features
- Sample index mapping and queries
- Relevance, scoring, and tuning
- Advanced features: aggregations, suggesters, percolator, highlighting
- Scaling, HA, ops considerations
- PostgreSQL Full-Text Search: overview, features
- tsvector/tsquery, dictionaries, stemming
- Index types (GIN/GiST) and trigram (pg_trgm)
- Sample schemas and queries
- Relevance tuning and limitations
- Scaling, HA, ops considerations
- Feature-by-feature comparison
- Typical use cases and recommendations
- Integration patterns and sync strategies (Postgres ↔ Elasticsearch)
- Embeddings, vector search, and hybrid approaches
- Benchmarks and performance considerations (caveats)
- Security, licensing, and ecosystem
- Decision checklist
- Example migration / starting templates
- Conclusion
Quick summary / TL;DR
- Use PostgreSQL FTS if:
- Search needs are moderate (site search, internal tools).
- You want transactional consistency with your DB.
- You prefer to minimize operational overhead (no separate search cluster).
- Your dataset and query volumes are not extremely large or highly concurrent.
- You need ACID guarantees and simpler operational model.
- Use Elasticsearch if:
- You need rich search features (complex relevance tuning, multi-field analyzers, advanced aggregations, custom scoring).
- You must support large scale, high throughput, and multi-GB/TB corpora.
- You need distributed search and high read concurrency.
- You want advanced features like suggesters, per-field analyzers, and mature aggregation dashboards.
- Consider a hybrid approach when:
- You want a single source of truth in Postgres but need advanced search features and scale; replicate data to Elasticsearch with near-real-time sync (Debezium / logical decoding / dual-write / periodic batch).
Historical context
- Postgres FTS: built-in support evolved through the 2000s and 2010s. PostgreSQL includes the tsvector/tsquery types, dictionaries (stemming, stop words), and GIN/GiST indexes. It is excellent for many internal or moderate search needs and supports extensions like pg_trgm (trigram) for fuzzy matching and pgvector for embeddings.
- Elasticsearch: launched in 2010, built on Apache Lucene. Designed from the ground up as a distributed search and analytics engine. Evolved to provide a complete toolset for large-scale search, analytics, and log data (ELK stack). Since 2021 elastic.co changed the license away from Apache 2; OpenSearch (AWS fork) continued under Apache 2.0.
Core concepts and theoretical foundations
Inverted index and tokenization
- Inverted index: maps terms → posting lists (documents and positional info). Both Postgres FTS and ES build inverted indices.
- Tokenization: breaking input text into tokens (terms). Tokenization options strongly affect recall/precision and are configurable in both systems.
Stemming, stopwords, synonyms
- Stemming reduces words to a base form (e.g., "running" → "run"). Postgres uses Snowball stemmers via text search dictionaries. Elasticsearch provides analyzers with tokenizers, char filters, and token filters (including stemmers and synonym filters).
- Stopwords remove extremely common words. Synonym filters provide mapping of terms for recall enhancement.
Ranking: BM25, tf-idf, ts_rank
- Elasticsearch (Lucene) uses BM25 by default; customizable with functionscore, scriptscore, rescoring.
- PostgreSQL provides tsrank / tsrank_cd for ranking; the algorithm is flexible but less feature-rich than BM25 out-of-the-box. Extensions / custom weighting are possible.
Fuzzy matching, n-grams, trigram similarity
- Fuzzy search (edit distance) is supported by ES via fuzzy queries (Levenshtein). ES also supports n-gram analyzers for prefix/middle matching.
- PostgreSQL offers pg_trgm for trigram-based similarity and indexes using GIN/GiST. It supports similarity operators and ILIKE but higher-latency fuzzy search on large datasets.
Vector search and hybrid retrieval
- Newer requirement: semantic search with embeddings. Both ecosystems have solutions:
- Elasticsearch: dense_vector + kNN (HNSW) and hybrid rank combining BM25 + vector similarity.
- Postgres: pgvector extension with approximate NN via ivfflat or other indexes; integration with Postgres for hybrid queries is possible but often less mature performance-wise.
Elasticsearch: overview, architecture, features
Overview
- Distributed search engine built on Lucene.
- Index → shard (primary + replicas) → segment architecture.
- Near real-time (NRT) indexing with refresh intervals (default 1s).
- Powerful query DSL, rich aggregations, suggesters, analyzers.
Core features
- Highly configurable analyzers (tokenizers + filters).
- Default BM25 scoring; support for custom scoring.
- Aggregations (facets / analytics), highlighting, term vectors, percolator (match queries against stored queries), search templates.
- Ingest pipelines and processors for preprocessing documents.
- Multi-field mappings (indexed differently for search, sorting, aggregation).
- Built-in support for geo, nested objects, parent-child relations, and now vector types.
Sample index creation and queries (Elasticsearch)
- Create index with custom analyzer:
`` PUT /products { "settings": { "analysis": { "filter": { "englishstop": { "type": "stop", "stopwords": "english" }, "englishstemmer": { "type": "stemmer", "language": "english" } }, "analyzer": { "myenglish": { "tokenizer": "standard", "filter": ["lowercase", "englishstop", "englishstemmer"] } } } }, "mappings": { "properties": { "title": { "type": "text", "analyzer": "myenglish" }, "description": { "type": "text", "analyzer": "my_english" }, "price": { "type": "double" }, "tags": { "type": "keyword" } } } } ``
- Search with multi_match and boosting:
`` GET /products/search { "query": { "bool": { "must": { "multimatch": { "query": "wireless headphones", "fields": ["title^3", "description"] } }, "filter": { "term": { "tags": "electronics" } } } }, "highlight": { "fields": { "description": {} } } } ``
Relevance and tuning
- Use analyzers per field, boost fields, field norms, function_score for business metrics, custom scripts, or rescoring for precision on top-K.
- Relevance tuning features are richer and more flexible than Postgres FTS.
Advanced features
- Aggregations: perform metrics, buckets, histograms efficiently (used in analytics and faceted navigation).
- Suggesters (term, phrase, completion) for type-ahead and misspells.
- Percolator: store queries and match incoming documents (useful for alerting).
- Highlighting: contextual snippets.
Scaling and ops
- Horizontal scaling via shards and replication.
- Needs cluster management (master/data/ingest nodes), memory tuning (heap), JVM GC tuning, monitoring.
- Managed offerings: Elastic Cloud, Amazon OpenSearch Service.
- Operational complexity is higher (overhead, upgrades, cluster rebalancing).
PostgreSQL Full-Text Search: overview, features
Overview
- Postgres includes powerful FTS primitives: tsvector (indexed document), tsquery (query), dictionaries (stemming), and GIN/GiST indexes.
- Extensions: pg_trgm for trigram matching, pgvector for vectors, fuzzystrmatch for metaphone/damerau-levenshtein.
- Use when you already have Postgres and need integrated search without external systems.
Core FTS types and operations
- tsvector: normalized searchable text with lexemes and positions.
- tsquery: search query language (phrasal and boolean operators).
- totsvector, totsquery, plainto_tsquery: utilities for parsing and normalization.
- Ranking: tsrank, tsrank_cd allow weighting per-document or per-field.
Sample schema, index, and query (Postgres)
``` -- table CREATE TABLE articles ( id serial PRIMARY KEY, title text, body text, created_at timestamptz DEFAULT now() );
-- add a tsvector column for fast search ALTER TABLE articles ADD COLUMN document_tsv tsvector;
-- populate via trigger CREATE FUNCTION articlestsvtrigger() RETURNS trigger AS $$ begin new.documenttsv := setweight(totsvector('english', coalesce(new.title,'')), 'A') || setweight(to_tsvector('english', coalesce(new.body,'')), 'B'); return new; end $$ LANGUAGE plpgsql;
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION articlestsvtrigger();
-- GIN index CREATE INDEX idxarticlesdocumenttsv ON articles USING GIN (documenttsv);
-- Query SELECT id, tsrank(documenttsv, plaintotsquery('english', 'wireless headphones')) AS rank FROM articles WHERE documenttsv @@ plainto_tsquery('english', 'wireless headphones') ORDER BY rank DESC LIMIT 20; ```
Fuzzy and similarity
- pg_trgm provides % operator and similarity() for fuzzy matches, supports GIN/GiST indexes with fast lookups.
- ILIKE and regex are less performant without indexes.
Relevance tuning and limitations
- Field weighting via setweight and ts_rank parameters.
- Less flexible analyzer pipeline vs ES (no custom token filters pipeline, though you can configure text search dictionaries).
- Highlighting is available via contrib functions or lexeme-based extraction but not as feature-rich as ES highlighting.
Scaling and ops
- Postgres is transactional, ACID, and simpler operationally if it's already used.
- Scaling reads via replicas; write scaling requires sharding or partitioning (Citus, pg_shard).
- High concurrency large-scale search can push Postgres beyond its sweet spot; backups and maintenance are simpler than ES in many setups.
Feature-by-feature comparison
- Core relevance:
- ES: BM25 default, flexible scoring, scripting.
- PG: ts_rank-based scoring, configurable weights.
- Analyzers and tokenization:
- ES: highly configurable analyzers per field (char filters, tokenizers, token filters).
- PG: built-in text search configurations and dictionaries (language-based). Less granular.
- Fuzzy search:
- ES: fuzzy queries + n-grams.
- PG: pg_trgm similarity; requires extensions.
- Highlighting:
- ES: strong built-in support.
- PG: requires custom logic or third-party functions; less polished.
- Aggregations / analytics:
- ES: excellent (fast bucketing, histograms).
- PG: possible via SQL but not optimized for faceted analytics at search scale.
- Nested objects and complex documents:
- ES: native nested type, ...