When Not to Use DB Indexes

Avoid Indexing DB Columns with Highly Repeated Values Indexing is a powerful tool in databases that can dramatically improve query performance when used appropriately. However, not all indices are created equal regarding impact and efficiency. A common mistake developers make is adding indexes to columns that contain highly repeated (low-cardinality) values. In most cases, this wastes resources and can degrade performance. Here's why. Understanding Indexes in PostgreSQL An index in PostgreSQL is essentially a data structure (commonly a B-tree) that allows the database engine to quickly locate rows matching a condition. Indexes shine when they help narrow down the number of rows PostgreSQL needs to scan, especially in large tables. The Problem with Low-Cardinality Columns Low-cardinality columns are those where the number of distinct values is small compared to the total number of rows. For example, a status column with values like active, inactive, or archived, in a table of millions of rows, is low in cardinality. Why indexing them is usually a bad idea: Low Selectivity Indexes are effective when they help filter out large portions of the data. If 90% of your table has status = 'active', an index won't help much — PostgreSQL will likely fall back to a sequential scan because it's cheaper than jumping back and forth between the index and table rows (known as "random I/O"). Extra Overhead on Writes Every time you INSERT, UPDATE, or DELETE a row, PostgreSQL must update any indexes that involve that row. If your low-cardinality column has an index, you're paying the overhead for maintaining it — with little or no performance gain in reads. Bloated Indexes Indexes on repetitive values often result in large, inefficient index structures that waste disk space and can lead to slower maintenance operations like VACUUM or ANALYZE. Query Planner Might Ignore It PostgreSQL is smart. It analyzes data distribution and will avoid using indexes if they don't offer performance benefits. So even if you create an index on a column like is_active, PostgreSQL might not use it in queries unless the distribution changes significantly. When It Might Be Okay There are exceptions. Indexing a low-cardinality column can make sense if: The table is relatively small, and you're optimizing for a very frequent query The column is combined with other, more selective columns in a composite index You're using partial indexes — e.g., CREATE INDEX ON users (id) WHERE status = 'archived' You're optimizing for index-only scans and the column is in a covering index Better Alternatives Instead of blindly indexing low-cardinality columns, consider: Partial indexes: Useful when queries always filter on specific values Multicolumn indexes: More selective when used with additional filtering criteria Materialized views: For precomputed subsets of data Denormalization or restructuring: Sometimes, changing schema design helps more than indexing Conclusion Indexes are a powerful but limited resource. Indexing a column with highly repeated values in PostgreSQL often results in little to no performance gain and unnecessary overhead. Before creating an index, always consider the cardinality of the column and analyze your query patterns. Use tools like EXPLAIN ANALYZE to measure actual query performance and validate whether an index is truly beneficial.

May 1, 2025 - 08:07

Avoid Indexing DB Columns with Highly Repeated Values

Indexing is a powerful tool in databases that can dramatically improve query performance when used appropriately. However, not all indices are created equal regarding impact and efficiency. A common mistake developers make is adding indexes to columns that contain highly repeated (low-cardinality) values. In most cases, this wastes resources and can degrade performance. Here's why.

Understanding Indexes in PostgreSQL

An index in PostgreSQL is essentially a data structure (commonly a B-tree) that allows the database engine to quickly locate rows matching a condition. Indexes shine when they help narrow down the number of rows PostgreSQL needs to scan, especially in large tables.

The Problem with Low-Cardinality Columns

Low-cardinality columns are those where the number of distinct values is small compared to the total number of rows. For example, a status column with values like active, inactive, or archived, in a table of millions of rows, is low in cardinality.

Why indexing them is usually a bad idea:

Low Selectivity

Indexes are effective when they help filter out large portions of the data. If 90% of your table has status = 'active', an index won't help much — PostgreSQL will likely fall back to a sequential scan because it's cheaper than jumping back and forth between the index and table rows (known as "random I/O").

Extra Overhead on Writes

Every time you INSERT, UPDATE, or DELETE a row, PostgreSQL must update any indexes that involve that row. If your low-cardinality column has an index, you're paying the overhead for maintaining it — with little or no performance gain in reads.

Bloated Indexes

Indexes on repetitive values often result in large, inefficient index structures that waste disk space and can lead to slower maintenance operations like VACUUM or ANALYZE.

Query Planner Might Ignore It

PostgreSQL is smart. It analyzes data distribution and will avoid using indexes if they don't offer performance benefits. So even if you create an index on a column like is_active, PostgreSQL might not use it in queries unless the distribution changes significantly.

When It Might Be Okay

There are exceptions. Indexing a low-cardinality column can make sense if:

The table is relatively small, and you're optimizing for a very frequent query
The column is combined with other, more selective columns in a composite index
You're using partial indexes — e.g., CREATE INDEX ON users (id) WHERE status = 'archived'
You're optimizing for index-only scans and the column is in a covering index

Better Alternatives

Instead of blindly indexing low-cardinality columns, consider:

Partial indexes: Useful when queries always filter on specific values
Multicolumn indexes: More selective when used with additional filtering criteria
Materialized views: For precomputed subsets of data
Denormalization or restructuring: Sometimes, changing schema design helps more than indexing

Conclusion

Indexes are a powerful but limited resource. Indexing a column with highly repeated values in PostgreSQL often results in little to no performance gain and unnecessary overhead. Before creating an index, always consider the cardinality of the column and analyze your query patterns. Use tools like EXPLAIN ANALYZE to measure actual query performance and validate whether an index is truly beneficial.