PostgreSQL JSONB Indexing Limitations with B-Tree and GIN
In a previous post, I highlighted a significant limitation of relational databases: a One-to-Many relationship requires two tables, yet no index can optimize access with selective predicates on both of them, resulting in unnecessary rows that must be joined before elimination. A comment from @dogacel suggested continuing with PostgreSQL's JSON approach, so here it is. SQL databases have evolved into general-purpose or converged databases capable of storing documents, like JSONB in PostgreSQL. However, they face challenges with indexing compared to document databases like MongoDB, which offer multi-key indexes for optimizing equality, sort, and range filtering. Let's explore JSONB documents and their indexing options. Instead of an Order Details table, I embed the details into the Orders table: create table orders( primary key(id) , id bigserial , country_id int , created_at timestamptz default clock_timestamp() , details jsonb ); I insert one million orders similar to the previous post: insert into orders (country_id, details) select (10 * random())::int as country_id, -- generate a random country_id ( select jsonb_agg( jsonb_build_object( 'line', gs, 'product_id', ((log(2, (1 + o.id * random())::numeric)::int)), 'quantity', (100 * random())::int ) ) from generate_series(1, 10) as gs -- create 10 lines per order ) as details from generate_series(1, 1000000) as o(id); -- generate 1,000,000 orders Here is an example of Order with its details: postgres=# select * from orders limit 1; -[ RECORD 1 ]-------------------------------------------------------- id | 1 country_id | 8 created_at | 2025-04-15 19:12:27.525731+00 details | [ {"line": 1, "quantity": 6, "product_id": 0}, {"line": 2, "quantity": 91, "product_id": 1}, {"line": 3, "quantity": 66, "product_id": 1}, {"line": 4, "quantity": 100, "product_id": 1}, {"line": 5, "quantity": 15, "product_id": 0}, {"line": 6, "quantity": 80, "product_id": 0}, {"line": 7, "quantity": 96, "product_id": 1}, {"line": 8, "quantity": 9, "product_id": 0}, {"line": 9, "quantity": 89, "product_id": 0}, {"line": 10, "quantity": 14, "product_id": 0} ] I want to run a simple query to retrieve the last ten orders from a specific country that contain a specific product: SELECT country_id, details FROM orders WHERE country_id = 1 AND details @> '[{"product_id": 15}]' ORDER BY created_at DESC LIMIT 10 ; Be aware of filters at various levels: on the order for country and date, and on the details for the product. With a normalized model and two tables, it was impossible to have one index covering all those filters. However, using MongoDB made it straightforward to create an index on "country_id", "order_details.product_id", and "created_at", resulting in only ten documents being read. Expression index on scalar values The normalized columns in the row can easily be indexed with a B-Tree: CREATE INDEX orders1 ON orders ("country_id", "created_at" desc) ; A similar B-Tree index can be created on the JSONB attributes as long as they are top-level, or sub-documents without repeating groups (without JSON array). However, it is impossible to create such index when the JSON path has an array because in PostgreSQL a B-Tree index cannot have multiple index key for the same row. Of course, it is possible to create any index, as there's no schema declared for the JSONB content: CREATE INDEX orders0 ON orders ( (("details"->>'product_id')::int) ) ; However, such index will never find a document where "product_id" is an array as it indexes a scalar expression. A query on the content of the array cannot use the index: postgres=# set enable_seqscan=off; SET postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF) SELECT * from orders WHERE details @> '[{"product_id": 15}]' ; QUERY PLAN --------------------------------------------------------------------- Seq Scan on orders (actual time=38.631..1293.388 rows=446355 loops=1) Filter: (details @> '[{"product_id": 15}]'::jsonb) Rows Removed by Filter: 553645 Buffers: shared hit=16128 read=94984 Generalized Inverted Index (GIN) for arrays In order to index though an array, we need an inverted index and PostgreSQL has GIN: CREATE INDEX orders2 ON orders using GIN ( details ) ; This index is used for the predicate on the array: postgres=# set enable_seqscan=off; SET postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF) SELECT * from orders WHERE details @> '[{"product_id": 15}]' ; QUERY PLAN ---------------------------------------------------

In a previous post, I highlighted a significant limitation of relational databases: a One-to-Many relationship requires two tables, yet no index can optimize access with selective predicates on both of them, resulting in unnecessary rows that must be joined before elimination. A comment from @dogacel suggested continuing with PostgreSQL's JSON approach, so here it is.
SQL databases have evolved into general-purpose or converged databases capable of storing documents, like JSONB in PostgreSQL. However, they face challenges with indexing compared to document databases like MongoDB, which offer multi-key indexes for optimizing equality, sort, and range filtering. Let's explore JSONB documents and their indexing options.
Instead of an Order Details table, I embed the details into the Orders table:
create table orders(
primary key(id)
, id bigserial
, country_id int
, created_at timestamptz default clock_timestamp()
, details jsonb
);
I insert one million orders similar to the previous post:
insert into orders (country_id, details)
select
(10 * random())::int as country_id, -- generate a random country_id
(
select jsonb_agg(
jsonb_build_object(
'line', gs,
'product_id', ((log(2, (1 + o.id * random())::numeric)::int)),
'quantity', (100 * random())::int
)
)
from generate_series(1, 10) as gs -- create 10 lines per order
) as details
from generate_series(1, 1000000) as o(id); -- generate 1,000,000 orders
Here is an example of Order with its details:
postgres=# select * from orders limit 1;
-[ RECORD 1 ]--------------------------------------------------------
id | 1
country_id | 8
created_at | 2025-04-15 19:12:27.525731+00
details | [
{"line": 1, "quantity": 6, "product_id": 0},
{"line": 2, "quantity": 91, "product_id": 1},
{"line": 3, "quantity": 66, "product_id": 1},
{"line": 4, "quantity": 100, "product_id": 1},
{"line": 5, "quantity": 15, "product_id": 0},
{"line": 6, "quantity": 80, "product_id": 0},
{"line": 7, "quantity": 96, "product_id": 1},
{"line": 8, "quantity": 9, "product_id": 0},
{"line": 9, "quantity": 89, "product_id": 0},
{"line": 10, "quantity": 14, "product_id": 0}
]
I want to run a simple query to retrieve the last ten orders from a specific country that contain a specific product:
SELECT country_id, details
FROM orders
WHERE country_id = 1
AND details @> '[{"product_id": 15}]'
ORDER BY created_at DESC
LIMIT 10
;
Be aware of filters at various levels: on the order for country and date, and on the details for the product. With a normalized model and two tables, it was impossible to have one index covering all those filters. However, using MongoDB made it straightforward to create an index on "country_id", "order_details.product_id", and "created_at", resulting in only ten documents being read.
Expression index on scalar values
The normalized columns in the row can easily be indexed with a B-Tree:
CREATE INDEX orders1 ON orders ("country_id", "created_at" desc)
;
A similar B-Tree index can be created on the JSONB attributes as long as they are top-level, or sub-documents without repeating groups (without JSON array). However, it is impossible to create such index when the JSON path has an array because in PostgreSQL a B-Tree index cannot have multiple index key for the same row.
Of course, it is possible to create any index, as there's no schema declared for the JSONB content:
CREATE INDEX orders0 ON orders ( (("details"->>'product_id')::int) )
;
However, such index will never find a document where "product_id" is an array as it indexes a scalar expression.
A query on the content of the array cannot use the index:
postgres=# set enable_seqscan=off;
SET
postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF)
SELECT * from orders WHERE details @> '[{"product_id": 15}]'
;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on orders (actual time=38.631..1293.388 rows=446355 loops=1)
Filter: (details @> '[{"product_id": 15}]'::jsonb)
Rows Removed by Filter: 553645
Buffers: shared hit=16128 read=94984
Generalized Inverted Index (GIN) for arrays
In order to index though an array, we need an inverted index and PostgreSQL has GIN:
CREATE INDEX orders2 ON orders using GIN ( details )
;
This index is used for the predicate on the array:
postgres=# set enable_seqscan=off;
SET
postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF)
SELECT * from orders WHERE details @> '[{"product_id": 15}]'
;
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on orders (actual time=95.272..1196.876 rows=446355 loops=1)
Recheck Cond: (details @> '[{"product_id": 15}]'::jsonb)
Rows Removed by Index Recheck: 303886
Heap Blocks: exact=42718 lossy=65930
Buffers: shared hit=371 read=108648 written=3415
-> Bitmap Index Scan on orders2 (actual time=63.730..63.730 rows=499460 loops=1)
Index Cond: (details @> '[{"product_id": 15}]'::jsonb)
Buffers: shared hit=371
I disabled Seq Scan because this predicate alone is not selective enough, returning half of the rows, but it proves that the GIN index can be used.
However, when I run the full query with additional WHERE and ORDER BY LIMIT filters, which are covered by the B-Tree index, the GIN index is no longer used:
postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF)
SELECT country_id, details
FROM orders
WHERE country_id = 1
AND details @> '[{"product_id": 15}]'
ORDER BY created_at DESC
LIMIT 10
;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (actual time=0.036..0.117 rows=10 loops=1)
Buffers: shared hit=32
-> Index Scan using orders1 on orders (actual time=0.035..0.115 rows=10 loops=1)
Index Cond: (country_id = 1)
Filter: (details @> '[{"product_id": 15}]'::jsonb)
Rows Removed by Filter: 38
Buffers: shared hit=32
To understand the reason, I remove the LIMIT clause:
postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF)
SELECT country_id, details
FROM orders
WHERE country_id = 1
AND details @> '[{"product_id": 15}]'
ORDER BY created_at DESC
-- LIMIT 10
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Gather Merge (actual time=237.895..257.523 rows=44864 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=158 read=61738, temp read=4584 written=4592
-> Sort (actual time=222.683..226.142 rows=14955 loops=3)
Sort Key: created_at DESC
Sort Method: external merge Disk: 12688kB
Buffers: shared hit=158 read=61738, temp read=4584 written=4592
Worker 0: Sort Method: external merge Disk: 11616kB
Worker 1: Sort Method: external merge Disk: 12368kB
-> Parallel Bitmap Heap Scan on orders (actual time=74.822..211.433 rows=14955 loops=3)
Recheck Cond: ((country_id = 1) AND (details @> '[{"product_id": 15}]'::jsonb))
Rows Removed by Index Recheck: 96641
Heap Blocks: exact=9701 lossy=11292
Buffers: shared hit=145 read=61733
-> BitmapAnd (actual time=78.189..78.190 rows=0 loops=1)
Buffers: shared hit=145 read=613
-> Bitmap Index Scan on orders1 (actual time=11.496..11.497 rows=100362 loops=1)
Index Cond: (country_id = 1)
Buffers: shared read=387
-> Bitmap Index Scan on orders2 (actual time=64.445..64.445 rows=499460 loops=1)
Index Cond: (details @> '[{"product_id": 15}]'::jsonb)
Buffers: shared hit=145 read=226
PostgreSQL can utilize both indexes, but it requires a Bitmap Scan to combine them, which does not preserve the B-Tree index order. Consequently, it must perform a Sort operation that reads all rows before returning the first one, making it unsuitable for the LIMIT clause due to excessive row reading.
Composite B-Tree + GIN with extension
With the btree_gin
extension available in contrib, I can create a single index that is supposed to combine the advantages of B-Tree and GIN indexes:
postgres=# CREATE EXTENSION BTREE_GIN;
CREATE EXTENSION
postgres=# CREATE INDEX orders3 ON orders
using GIN (country_id , details, created_at);
CREATE INDEX
Running it without LIMIT shows that it doesn't solve the problem:
postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF)
SELECT country_id, details
FROM orders
WHERE country_id = 1
AND details @> '[{"product_id": 15}]'
ORDER BY created_at DESC
-- LIMIT 10
;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Gather Merge (actual time=123.675..143.726 rows=44864 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=237 read=40117, temp read=4585 written=4594
-> Sort (actual time=109.979..113.574 rows=14955 loops=3)
Sort Key: created_at DESC
Sort Method: external merge Disk: 12456kB
Buffers: shared hit=237 read=40117, temp read=4585 written=4594
Worker 0: Sort Method: external merge Disk: 11720kB
Worker 1: Sort Method: external merge Disk: 12504kB
-> Parallel Bitmap Heap Scan on orders (actual time=18.096..98.799 rows=14955 loops=3)
Recheck Cond: ((country_id = 1) AND (details @> '[{"product_id": 15}]'::jsonb))
Rows Removed by Index Recheck: 1760
Heap Blocks: exact=13486
Buffers: shared hit=226 read=40110
-> Bitmap Index Scan on orders3 (actual time=25.197..25.197 rows=50144 loops=1)
Index Cond: ((country_id = 1) AND (details @> '[{"product_id": 15}]'::jsonb))
Buffers: shared hit=226 read=251
Only one index was scanned, with all conditions in its Index Cond, but it is still a Bitmap Scan which doesn't preserve the index key order, because that's how PostgreSQL inverted index works. A GIN index, even with the BTREE_GIN extension, is not an equivalent to MongoDB multi-key indexes which have their index entries ordered.
Even if my new index has all columns, it is not used by the pagination query:
postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF)
SELECT country_id, details
FROM orders
WHERE country_id = 1
AND details @> '[{"product_id": 15}]'
ORDER BY created_at DESC
LIMIT 10
;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (actual time=0.034..0.117 rows=10 loops=1)
Buffers: shared hit=32
-> Index Scan using orders1 on orders (actual time=0.034..0.115 rows=10 loops=1)
Index Cond: (country_id = 1)
Filter: (details @> '[{"product_id": 15}]'::jsonb)
Rows Removed by Filter: 38
Buffers: shared hit=32
If I drop the B-Tree index, the B-Tree GIN is used, but not efficient as it has to read and sort all rows before returning the ten row for the result:
postgres=# drop index orders1;
DROP INDEX
postgres=# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS, SUMMARY OFF)
SELECT country_id, details
FROM orders
WHERE country_id = 1
AND details @> '[{"product_id": 15}]'
ORDER BY created_at DESC
LIMIT 10
;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (actual time=215.701..215.704 rows=10 loops=1)
Buffers: shared hit=229 read=40107 written=5
-> Sort (actual time=215.700..215.701 rows=10 loops=1)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 46kB
Buffers: shared hit=229 read=40107 written=5
-> Bitmap Heap Scan on orders (actual time=31.466..204.149 rows=44864 loops=1)
Recheck Cond: ((country_id = 1) AND (details @> '[{"product_id": 15}]'::jsonb))
Rows Removed by Index Recheck: 5280
Heap Blocks: exact=39859
Buffers: shared hit=229 read=40107 written=5
-> Bitmap Index Scan on orders3 (actual time=24.544..24.544 rows=50144 loops=1)
Index Cond: ((country_id = 1) AND (details @> '[{"product_id": 15}]'::jsonb))
Buffers: shared hit=229 read=248
Seeing all conditions in the Index Cond is not sufficient, especially with GIN indexes because they return false positives that must be rechecked later. Even if you don't have an ORDER BY LIMIT, the scan itself reads more rows than necessary, as reported by Rows Removed by Index Recheck.
Conclusion
PostgreSQL's JSONB with GIN indexing allows for schema-on-read capabilities within a schema-on-write relational model. While it enables document storage, it does not convert PostgreSQL into a document database.
Although GIN indexes efficiently handle equality predicates on array items, they are less effective for range or sorting operations compared to MongoDB's multi-key indexes, which support equality, sorting, and range queries transparently. You may use JSON datatypes to provide more flexibility to the relational model, but SQL databases should be used as relational databases because their engine is optimized for it. Always check the execution plan to avoid surprises.