Solving the OLTP-OLAP Divide: PostgreSQL B-tree and Hash Indexes for Columnar Data

What if you could get 1,185x faster lookups without sacrificing your columnar storage benefits? TimescaleDB's hypercore engine accelerates real-time analytics through its hybrid storage system: a rowstore for rapid transactional data ingestion and a columnstore for fast analytics on compressed storage. Until now, developers faced a trade-off when data moved to the columnstore, PostgreSQL's powerful indexing capabilities were lost, slowing down critical operations like looking up specific financial transactions or updating historical IoT sensor data. TimescaleDB 2.18 solves this problem with Early Access support for PostgreSQL's B-tree and hash indexes in the columnstore, delivering 1,185x faster record retrievals and 224x faster inserts. Most columnar databases simply can't do this. This guide digs into the implementation behind these gains. We'll explore when B-tree indexes make sense for your range operations and constraint enforcement, and when hash indexes deliver better performance for exact-match lookups. What Are B-tree and Hash Indexes, and Why Are They Useful? Indexes are a fundamental part of database performance optimization. PostgreSQL offers multiple index types, For example, the default B-tree, hash, GIN, and BRIN. All implemented as Index Access Methods (IAMs). B-tree and hash indexes are among PostgreSQL’s most widely used index types, each designed for different queries. These specialized indexes power critical operations in high-performance environments like financial analytics, where institutions must process billions of market data points, trades, and transactions in real time. While columnstores excel at storing and analyzing historical data, such as years of price movements, they often struggle with certain query patterns. This is where rowstore indexes—particularly B-tree and Hash indexes—become invaluable. B-tree Indexes Explained B-tree Indexes are the default and most versatile type of index in PostgreSQL. B-tree indexes keep data sorted in a hierarchical structure, making them ideal for queries that involve range lookups (>, 52.5 GROUP BY device_id; device_id | count -----------+------- 68 | 1 258 | 1 192 | 1 276 | 1 114 | 1 227 | 1 153 | 1 210 | 1 266 | 1 165 | 1 296 | 1 144 | 1 93 | 1 285 | 1 221 | 1 167 | 1 14 | 1 123 | 1 152 | 1 206 | 1 230 | 1 136 | 1 256 | 2 1 | 1 (24 rows) We can create a partial B-tree index to make this query faster. A partial B-tree index is a B-tree index that only includes rows satisfying a specific WHERE condition, making it smaller and more efficient for queries that match that condition. Since, in this case, we only care about temperature values above 52.5, we would need to index those values: CREATE INDEX ON readings (temperature) where temperature > 52.5; With the existing columnstore sparse min/max index, the query takes 63.3 ms to execute and only 14 ms with the B-tree index above, so 4.5x faster. How We Implemented B-tree and Hash Indexes in Hypercore’s Columnstore To support indexes on hypercore’s columnstore data, we leveraged PostgreSQL's pluggable architecture for both index access methods (IAMs) and table access methods (TAMs). PostgreSQL's IAM system has long supported multiple index types (B-tree, hash, GIN, BRIN), but the TAM interface—introduced in PostgreSQL 12 for table storage—was the key technology enabling us to better integrate our columnar engine with PostgreSQL, support indexes on columnstore data, and deliver numerous other enhancements. PostgreSQL architecture for table and index access methods (TAM and IAM, respectively). The beauty of this design is its simplicity: TAM hides storage implementation details IAM uses the TAM API to access table data When scanning, the index locates keys then retrieves records using TAM Previously, TimescaleDB used PostgreSQL's heap TAM (optimized for row storage) and TOAST mechanism to store columnar data as compressed arrays. However this created a problem: indexing this compressed data would index only the opaque compressed values, not the actual content. Enter TimescaleDB 2.18. It changes the game by introducing PostgreSQL indexes support in columnstore with the help of a new table access method: hypercore TAM. The new table access method handles decompression automatically and enables PostgreSQL to use its standard interfaces for: Indexing Statistics collection Constraint enforcement Tuple locking This also allows PostgreSQL’s built-in scan nodes, such as sequential and index scans, to operate in the columnstore. Custom scan nodes continue to be used for analytical query performance optimizations, including vectorized filtering and aggregation. TimescaleDB 2.18 Early Acc

Apr 7, 2025 - 20:21
 0
Solving the OLTP-OLAP Divide: PostgreSQL B-tree and Hash Indexes for Columnar Data

What if you could get 1,185x faster lookups without sacrificing your columnar storage benefits?

TimescaleDB's hypercore engine accelerates real-time analytics through its hybrid storage system: a rowstore for rapid transactional data ingestion and a columnstore for fast analytics on compressed storage.

Until now, developers faced a trade-off when data moved to the columnstore, PostgreSQL's powerful indexing capabilities were lost, slowing down critical operations like looking up specific financial transactions or updating historical IoT sensor data.

TimescaleDB 2.18 solves this problem with Early Access support for PostgreSQL's B-tree and hash indexes in the columnstore, delivering 1,185x faster record retrievals and 224x faster inserts. Most columnar databases simply can't do this.

This guide digs into the implementation behind these gains. We'll explore when B-tree indexes make sense for your range operations and constraint enforcement, and when hash indexes deliver better performance for exact-match lookups.

What Are B-tree and Hash Indexes, and Why Are They Useful?

Indexes are a fundamental part of database performance
optimization
. PostgreSQL offers multiple index types, For example, the default B-tree, hash, GIN, and BRIN. All implemented as Index Access Methods (IAMs). B-tree and hash indexes are among PostgreSQL’s most widely used index types, each designed for different queries.

These specialized indexes power critical operations in high-performance environments like financial analytics, where institutions must process billions of market data points, trades, and transactions in real time. While columnstores excel at storing and analyzing historical data, such as years of price movements, they often struggle with certain query patterns. This is where rowstore indexes—particularly B-tree and Hash indexes—become invaluable.

B-tree Indexes Explained

B-tree Indexes are the default and most versatile type of index in PostgreSQL. B-tree indexes keep data sorted in a hierarchical structure, making them ideal for queries that involve range lookups (>, <, BETWEEN) and equality lookups (=). When a query searches for a specific value or a range of values in an indexed column. The B-tree structure enables the database to quickly traverse the tree and find the relevant records in logarithmic time (O(log n)), significantly improving performance compared to a full table scan.

Here’s how to create a B-tree index:

-- This creates a B-tree index on the product_id column of the products table
-- It will speed up queries that filter or join on product_id
CREATE INDEX index_product_id ON products (product_id);

Key characteristics of B-tree indexes:

  • Default index type in PostgreSQL

  • Supports <, <=, =, >=, >, BETWEEN, IN, IS NULL, IS NOT NULL

  • Organizes entries in ascending order

Hash Indexes Explained

Hash indexes are designed for exact-match lookups (=) and use a hashing function to map values to unique disk locations. When searching for a single value—such as looking up a transaction by its transaction ID—hash indexes can be even faster than B-tree indexes as they don’t require tree traversal and have an amortized constant O(1) lookup. Their limitation is that they don’t support range queries, making them specifically suited for cases with frequently queried, unique keys.

To implement a hash index, use the following:

-- This creates a hash index on the product_id column
-- It optimizes for exact-match lookups of product_id values
CREATE INDEX index_product_id ON products USING HASH (product_id);

Hash indexes have these distinctive properties:

  • Ideal for equality checks, especially for integers

  • Doesn’t support range queries or sorting

The performance advantage of Hash indexes comes from their optimized data structure designed for efficient key searching. This results in fewer disk pages needing to be read, which reduces I/O spikes when locating specific data points or enforcing uniqueness. In some cases, decompression can be avoided altogether.

When to Use B-tree and Hash Indexes

These indexes are helpful, for example, in financial analytics, where firms continuously process vast amounts of market data, trades, and transactions in real time. Using B-tree and hash indexes gives better performance with the same indexes as rowstore, while columnstores efficiently store and analyze years of historical price movements and trading patterns.

B-tree Indexes: You use a B-tree index to check unique constraints or for range-based filtering and shine in scenarios requiring more complex operations. For example, when analysts need to isolate all transactions over $10,000 to assess market liquidity or flag unusual activity, B-tree indexes navigate directly to the relevant data without the performance penalty of full-table scans.

Hash Indexes: When pinpointing a specific trade by its unique ID, hash indexes deliver unmatched performance. By computing a mathematical fingerprint of each value, they enable near-instantaneous retrieval with minimal computational overhead—essential for time-sensitive operations requiring precise lookups.

B-tree and hash indexes are particularly helpful when:

  • You need fast lookups on non-SEGMENTBY keys. For example, querying specific records by UUID

  • Query latency on compressed data is a bottleneck for your application

  • You perform frequent updates to historical data and need efficient uniqueness enforcement

Performance Gains: Hypercore TAM With PostgreSQL Indexes in Action

To demonstrate the benefits, we have generated a 100 million-row dataset to represent an IoT application that tracks room temperature and humidity across multiple buildings. This is a relatively small data set but enough to demonstrate the benefits of using B-tree and hash indexes.

We stored the data in a readings hypertable and enabled the columnstore with and without using the new hypercore table access method to run our benchmarks.

Our benchmarks show substantial performance improvements by adding PostgreSQL’s indexing capabilities to columnstore in TimescaleDB.

  • Point lookups: Retrieving a single record by UUID using a hash index delivered 1,185x faster execution (10.9ms vs. 12,915ms without indexing)

  • Insert operations: When enforcing uniqueness constraints through B-tree indexes, inserts executed 224.3x faster (1,289ms vs. 289,139ms)

  • Upsert operations: Updating existing records using ON CONFLICT clauses performed 2.6x faster (9,520ms vs. 24,805ms)

  • Range queries: Filtering for anomalous readings above threshold values ran 4.5x faster (14ms vs. 63.3ms)

RTABench is a new benchmark we have developed to evaluate databases using query patterns that mirror real-world application workloads—something missing from existing benchmarks.

Point lookups

Indexes are particularly useful for highly selective queries, such as retrieving a unique event by its identifier. For example:

SELECT 
    created_at,
    device_id,
    temperature
FROM readings 
WHERE metric_uuid = 'dd19f5d3-d04b-4afc-a78f-9b231fb29e52';

Without an index, the query requires scanning and filtering the entire dataset, leading to slow execution and high I/O usage. Timescale introduced min/max sparse indexes in version 2.15. They would help when using incremental numeric IDs but would not work well for random numeric values or IDs like UUIDs.

Instead, a hash index on metric_uuid allows for a direct lookup, significantly improving performance by decompressing only the relevant data segment.

CREATE INDEX readings_metric_uuid_hash_idx ON readings USING hash (metric_uuid);

The SELECT query above performed 1,185x faster when using the hypercore TAM with a hash index, coming in at 10.9 ms vs. 12,915 ms.

Image description

_By adding PostgreSQL’s indexing capabilities to our columnstore, point lookup queries are now 1,185 times faster in TimescaleDB.
_

Backfill and updates to historical data

A very common use case in real-time applications is backfilling or updating old data.

If a sensor fails during a batch upload or gets temporarily disconnected, it may resend the data later. If you don’t check if the data already exists in the database before storing it, you could end up with duplicated records.

The backend database can enforce uniqueness using a unique index or primary key, such as , to prevent duplicate entries. In this case, you need to include the hypertable partition key as well, since a UNIQUE constraint on a hypertable must include it. Below, we add a primary key to the table, which automatically creates a B-tree index.

ALTER TABLE readings ADD PRIMARY KEY (device_id, created_at);

An insert statement ensuring no duplicates might look like this:

INSERT INTO readings VALUES (...) ON CONFLICT (device_id, created_at) DO NOTHING;

Primary constraints are enforced through unique indexes, making conflict checks fast. Without an index, verifying uniqueness would require scanning and decompressing potentially large amounts of data, significantly slowing inserts and consuming excessive IOPS.

Our benchmarks showed 224.3x faster inserts, reducing the execution time from 289,139 ms to 1,289 ms.

Image description

TimescaleDB 2.18 now delivers 224.3x faster inserts by adding a B-tree index on compressed data, according to our benchmarks.

Historical data updates typically use upserts in ingestion pipelines—inserting new rows or updating existing ones when conflicts occur. This approach allows efficient re-ingestion of new data versions without separate update statements, a common practice among Timescale Cloud customers.

The query below attempts to insert a new record. If a record for the same metric_uuid and uploaded_at values already exists, it updates the temperature with the corresponding value from the new record.

INSERT INTO readings VALUES (...) ON CONFLICT (device_id, created_at) DO UPDATE SET temperature = EXCLUDED.temperature;

With a primary key index, conflicting rows are directly located within the compressed data segment in the columnstore. Without an index, the system would need to scan and decompress data, considerably slowing ingestion speed.

Our benchmarks showed 2.6x faster upserts, reducing the execution time from 24,805 ms to 9,520 ms.

Image description

Fast anomaly detection

Imagine you need to report at regular intervals how many times each device has exceeded a critical temperature. You can easily do this by counting the times the temperature exceeded and grouping by device ID.

SELECT
    device_id,
    COUNT(temperature)
FROM readings
WHERE temperature > 52.5 
GROUP BY device_id;

device_id | count  
-----------+-------
       68 |     1
      258 |     1
      192 |     1
      276 |     1
      114 |     1
      227 |     1
      153 |     1
      210 |     1
      266 |     1
      165 |     1
      296 |     1
      144 |     1
       93 |     1
      285 |     1
      221 |     1
      167 |     1
       14 |     1
      123 |     1
      152 |     1
      206 |     1
      230 |     1
      136 |     1
      256 |     2
        1 |     1
(24 rows)

We can create a partial B-tree index to make this query faster. A partial B-tree index is a B-tree index that only includes rows satisfying a specific WHERE condition, making it smaller and more efficient for queries that match that condition. Since, in this case, we only care about temperature values above 52.5, we would need to index those values:

CREATE INDEX ON readings (temperature) where temperature > 52.5;

With the existing columnstore sparse min/max index, the query takes 63.3 ms to execute and only 14 ms with the B-tree index above, so 4.5x faster.

Image description

How We Implemented B-tree and Hash Indexes in Hypercore’s Columnstore

To support indexes on hypercore’s columnstore data, we leveraged PostgreSQL's pluggable architecture for both index access methods (IAMs) and table access methods (TAMs).

PostgreSQL's IAM system has long supported multiple index types (B-tree, hash, GIN, BRIN), but the TAM interface—introduced in PostgreSQL 12 for table storage—was the key technology enabling us to better integrate our columnar engine with PostgreSQL, support indexes on columnstore data, and deliver numerous other enhancements.

Image description

PostgreSQL architecture for table and index access methods (TAM and IAM, respectively).

The beauty of this design is its simplicity:

  • TAM hides storage implementation details

  • IAM uses the TAM API to access table data

  • When scanning, the index locates keys then retrieves records using TAM

Previously, TimescaleDB used PostgreSQL's heap TAM (optimized for row storage) and TOAST mechanism to store columnar data as compressed arrays. However this created a problem: indexing this compressed data would index only the opaque compressed values, not the actual content.

Enter TimescaleDB 2.18. It changes the game by introducing PostgreSQL indexes support in columnstore with the help of a new table access method: hypercore TAM. The new table access method handles decompression automatically and enables PostgreSQL to use its standard interfaces for:

  • Indexing
  • Statistics collection
  • Constraint enforcement
  • Tuple locking

This also allows PostgreSQL’s built-in scan nodes, such as sequential and index scans, to operate in the columnstore. Custom scan nodes continue to be used for analytical query performance optimizations, including vectorized filtering and aggregation.

TimescaleDB 2.18 Early Access release supports B-tree and hash indexes, making point lookups, upserts, and unique constraint enforcement more efficient on the columnstore. Read the documentation for more details on how these PostgreSQL indexes work and their recommended use cases.

Conclusion

Real-time analytics applications require more than fast inserts and analytical queries. They also need high performance when retrieving individual records, enforcing constraints, or performing upserts, something that OLAP/columnar databases lack.

TimescaleDB supports and accelerates real-time analytics using hypercore without missing out on important PostgreSQL features, including support for standard PostgreSQL indexes. We call hypercore a “hybrid” storage engine precisely because it supports deep analytics while staying true to PostgreSQL.By adding full support for B-tree and hash indexes on columnstore data, developers can now perform point lookups 1,185x faster, enforce unique constraints with a performance boost of 224.3x faster inserts, and execute upserts 2.6x faster—all while maintaining columnstore compression and analytics performance.

PostgreSQL indexes in hypercore TAM are available in Early Access—check out the documentation to start using it today.

Sign up for a free Timescale Cloud account or self-install TimescaleDB on your machine.