Postgres: Index Scans

Using an index to help improve query performance is common practice. An index is a structure organized in such a way to help make it cheaper to access data records stored on disk. One example of such a structure is the b+tree. In Postgres case, the data stored on disk is referred to as the heap. Having an index on a table does not guarantee that it will be used to retrieve data for a query. It is important to first test and measure the effect of adding an index in order to find out if it helps improve performance for a target query. In the worst case, blindly adding an index to every query performance problem encountered may further degrade performance since indexes do not come cheap as well. There can be cases where the planner might choose a different, non-index, plan to execute a query even though an index exists that matches the query filter and/or the order-by clause. The best tool to use for this is the EXPLAIN command. To demonstrate a trivial scenario: First we create a table and an index postgres=# CREATE TABLE foo(id1 INT, id2 INT, id3 INT, id4 INT, descr TEXT); -- new table postgres=# INSERT INTO foo SELECT i, i3, i+i, i2, 'hello' || i FROM generate_series(1, 10000000) i; -- 10M records postgres=# CREATE INDEX idx_id1_id2_id3 ON foo(id1, id2, id3); Then run EXPLAIN on a query postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000; The path selected by the planner is shown below QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on foo (cost=0.00..198530.00 rows=9999027 width=28) (actual time=0.283..2260.973 rows=9999000 loops=1) Filter: (id1 > 1000) Rows Removed by Filter: 1000 Buffers: shared hit=16274 read=57256 Planning: Buffers: shared read=4 Planning Time: 0.386 ms Execution Time: 2906.885 ms (8 rows) The query is looking for all the records where id1 field value is greater than 1000. Let's dig into the details of the plan: the planner picks a sequential scan as the cheapest path to executing the query. The sequential scan plan has zero startup cost and total cost of executing the query is estimated to be 198530.00. Number of rows to be fetched is estimated at 9999027 while the real number of rows retrieved is 9999000(very close). A curious mind might wonder why an index scan is ignored when there is an index on id1 field. When using an index, the database needs to read the index structure, then if the data to be returned is not in the index, read the relevant data pages from disk. This means, the database would potentially have to perform 2 disk IOs for each index entry, one in the index file then another in the heap to get the data. This can be really expensive when there are many rows to be returned. It adds on to the cost of an index scan. For a sequential scan it only needs a single disk read if the page is not in the buffer. Often times the index might be entirely in memory if it fits, hence speeding up index lookups. The EXPLAIN command can only show the cheapest plan that was chosen by the planner, out of all the plans that were considered. Meaning if the user needs to investigate why a particular plan was not used, they will be limited in what they can get from the EXPLAIN command since the planner throws away other plans it considered inferior hence such plans are never available during debugging. I made an extension PG_ALL_PLANS to show all plans considered by the planner. Using the extension with our query above postgres=# SELECT * FROM show_all_plans('EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000'); query_plans ----------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------Plan 1------------------------------- Seq Scan on foo (cost=0.00..198530.00 rows=9999027 width=28) (actual time=0.250..2603.390 rows=9999000 loops=1) Filter: (id1 > 1000) Rows Removed by Filter: 1000 Buffers: shared hit=16182 read=57348 Planning: Buffers: shared hit=47 read=4 dirtied=3 Planning Time: 0.939 ms Execution Time: 3263.912 ms -------------------------------Plan 2------------------------------- Index Scan using idx_id1_id2_id3 on foo (cost=0.43..498815.28 rows=9999027 width=28) (actual time=0.444..3309.395 rows=9999000 loops=1) Index Cond: (id1 > 1000) Buffers: shared hit=2 read=111835 written=8120 Planning: Buffers: shared hit=16229 read=57352 dirtied=3 Planning Time: 0.947 ms Execution Time: 3981.410 ms -------------------------------Plan 3------------------------------- Bitmap Heap Scan on foo (cost=231288.89..429813.47 rows=9999027 width=28) (actual time=1319.351..3678.529 rows=9999000 loops=1) Recheck Cond: (id1 > 1000) Rows Removed by Index Recheck: 35

Apr 6, 2025 - 19:14

Using an index to help improve query performance is common practice. An index is a structure organized in such a way to help make it cheaper to access data records stored on disk. One example of such a structure is the b+tree. In Postgres case, the data stored on disk is referred to as the heap.

Having an index on a table does not guarantee that it will be used to retrieve data for a query. It is important to first test and measure the effect of adding an index in order to find out if it helps improve performance for a target query. In the worst case, blindly adding an index to every query performance problem encountered may further degrade performance since indexes do not come cheap as well. There can be cases where the planner might choose a different, non-index, plan to execute a query even though an index exists that matches the query filter and/or the order-by clause. The best tool to use for this is the EXPLAIN command. To demonstrate a trivial scenario:

First we create a table and an index

postgres=# CREATE TABLE foo(id1 INT, id2 INT, id3 INT, id4 INT, descr TEXT);  -- new table
postgres=# INSERT INTO foo SELECT i, i*3, i+i, i*2, 'hello' || i FROM generate_series(1, 10000000) i; -- 10M records
postgres=# CREATE INDEX idx_id1_id2_id3 ON foo(id1, id2, id3);

Then run EXPLAIN on a query

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000;

The path selected by the planner is shown below

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..198530.00 rows=9999027 width=28) (actual time=0.283..2260.973 rows=9999000 loops=1)
   Filter: (id1 > 1000)
   Rows Removed by Filter: 1000
   Buffers: shared hit=16274 read=57256
 Planning:
   Buffers: shared read=4
 Planning Time: 0.386 ms
 Execution Time: 2906.885 ms
(8 rows)

The query is looking for all the records where id1 field value is greater than 1000.

Let's dig into the details of the plan: the planner picks a sequential scan as the cheapest path to executing the query. The sequential scan plan has zero startup cost and total cost of executing the query is estimated to be 198530.00. Number of rows to be fetched is estimated at 9999027 while the real number of rows retrieved is 9999000(very close). A curious mind might wonder why an index scan is ignored when there is an index on id1 field. When using an index, the database needs to read the index structure, then if the data to be returned is not in the index, read the relevant data pages from disk. This means, the database would potentially have to perform 2 disk IOs for each index entry, one in the index file then another in the heap to get the data. This can be really expensive when there are many rows to be returned. It adds on to the cost of an index scan. For a sequential scan it only needs a single disk read if the page is not in the buffer. Often times the index might be entirely in memory if it fits, hence speeding up index lookups.

The EXPLAIN command can only show the cheapest plan that was chosen by the planner, out of all the plans that were considered. Meaning if the user needs to investigate why a particular plan was not used, they will be limited in what they can get from the EXPLAIN command since the planner throws away other plans it considered inferior hence such plans are never available during debugging. I made an extension PG_ALL_PLANS to show all plans considered by the planner. Using the extension with our query above

postgres=# SELECT * FROM show_all_plans('EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000');
                                                                  query_plans
-----------------------------------------------------------------------------------------------------------------------------------------------
 -------------------------------Plan 1-------------------------------
 Seq Scan on foo  (cost=0.00..198530.00 rows=9999027 width=28) (actual time=0.250..2603.390 rows=9999000 loops=1)
   Filter: (id1 > 1000)
   Rows Removed by Filter: 1000
   Buffers: shared hit=16182 read=57348
 Planning:
   Buffers: shared hit=47 read=4 dirtied=3
 Planning Time: 0.939 ms
 Execution Time: 3263.912 ms

 -------------------------------Plan 2-------------------------------
 Index Scan using idx_id1_id2_id3 on foo  (cost=0.43..498815.28 rows=9999027 width=28) (actual time=0.444..3309.395 rows=9999000 loops=1)
   Index Cond: (id1 > 1000)
   Buffers: shared hit=2 read=111835 written=8120
 Planning:
   Buffers: shared hit=16229 read=57352 dirtied=3
 Planning Time: 0.947 ms
 Execution Time: 3981.410 ms

 -------------------------------Plan 3-------------------------------
 Bitmap Heap Scan on foo  (cost=231288.89..429813.47 rows=9999027 width=28) (actual time=1319.351..3678.529 rows=9999000 loops=1)
   Recheck Cond: (id1 > 1000)
   Rows Removed by Index Recheck: 35
   Heap Blocks: exact=40497 lossy=33026
   Buffers: shared read=111837
   ->  Bitmap Index Scan on idx_id1_id2_id3  (cost=0.00..228789.14 rows=9999027 width=0) (actual time=1301.783..1301.783 rows=9999000 loops=1)
         Index Cond: (id1 > 1000)
         Buffers: shared read=38314
 Planning:
   Buffers: shared hit=16231 read=169190 dirtied=3 written=8120
 Planning Time: 0.951 ms
 Execution Time: 4338.259 ms

 -------------------------------Plan 4-------------------------------
 Gather  (cost=1000.00..1126516.03 rows=9999027 width=28) (actual time=0.524..1658.062 rows=9999000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=16228 read=57302
   ->  Parallel Seq Scan on foo  (cost=0.00..125613.33 rows=4166261 width=28) (actual time=0.118..917.219 rows=3333000 loops=3)
         Filter: (id1 > 1000)
         Rows Removed by Filter: 333
         Buffers: shared hit=16228 read=57302
 Planning:
   Buffers: shared hit=16231 read=281027 dirtied=3 written=8120
 Planning Time: 0.952 ms
 Execution Time: 2434.354 ms

(47 rows)

There are 4 plans displayed now in addition to the plain sequential scan plan. The additional plans are thrown away by the planner. From this, we can see that index scan is more expensive, has a higher startup cost(0.43) and total cost(498815.28) for retrieving all the tuples, than the sequential scan. One interesting part is that the Bitmap Heap Scan has a lower total cost(429813.470 than the index scan but is considered inferior due to its higher startup cost(231288.89). The extension can be a great tool to dig deeper into the choices made by the planner when executing a query.

We can attempt to persuade the planner to pick the index scan plan by limiting the number of rows returned between 1000 to 1.5M. This helps reduce the selectivity of the plan thus only a small number of rows are returned.

postgres=# SELECT * FROM show_all_plans('EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000 AND id1 < 1500000');
                                                                query_plans
--------------------------------------------------------------------------------------------------------------------------------------------
 -------------------------------Plan 1-------------------------------
 Index Scan using idx_id1_id2_id3 on foo  (cost=0.43..187892.55 rows=1498365 width=28) (actual time=0.023..548.997 rows=1498999 loops=1)
   Index Cond: ((id1 > 1000) AND (id1 < 1500000))
   Buffers: shared hit=2 read=16768
 Planning:
   Buffers: shared read=4
 Planning Time: 0.170 ms
 Execution Time: 651.542 ms

 -------------------------------Plan 2-------------------------------
 Seq Scan on foo  (cost=0.00..223530.00 rows=1498365 width=28) (actual time=0.204..2333.970 rows=1498999 loops=1)
   Filter: ((id1 > 1000) AND (id1 < 1500000))
   Rows Removed by Filter: 8501001
   Buffers: shared hit=10696 read=62834
 Planning:
   Buffers: shared hit=2 read=16772
 Planning Time: 0.174 ms
 Execution Time: 2432.406 ms

 -------------------------------Plan 3-------------------------------
 Bitmap Heap Scan on foo  (cost=38406.68..205106.49 rows=1498365 width=28) (actual time=236.667..539.696 rows=1498999 loops=1)
   Recheck Cond: ((id1 > 1000) AND (id1 < 1500000))
   Heap Blocks: exact=11023
   Buffers: shared hit=171 read=16599
   ->  Bitmap Index Scan on idx_id1_id2_id3  (cost=0.00..38032.08 rows=1498365 width=0) (actual time=233.444..233.445 rows=1498999 loops=1)
         Index Cond: ((id1 > 1000) AND (id1 < 1500000))
         Buffers: shared hit=171 read=5576
 Planning:
   Buffers: shared hit=10698 read=79606
 Planning Time: 0.182 ms
 Execution Time: 638.010 ms

 -------------------------------Plan 4-------------------------------
 Gather  (cost=1000.00..286866.50 rows=1498365 width=28) (actual time=0.518..2226.432 rows=1498999 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=11023 read=62507
   ->  Parallel Seq Scan on foo  (cost=0.00..136030.00 rows=624319 width=28) (actual time=0.093..805.573 rows=499666 loops=3)
         Filter: ((id1 > 1000) AND (id1 < 1500000))
         Rows Removed by Filter: 2833667
         Buffers: shared hit=11023 read=62507
 Planning:
   Buffers: shared hit=10869 read=96205
 Planning Time: 0.182 ms
 Execution Time: 2349.584 ms

(46 rows)

Now the planner chooses the index scan as it has a lower total cost(187892.55) than the sequential plan(223530.00). The index scan potentially doing 2 IOs per entry still manages to be cheaper than the sequential scan. It does this majorly due to the reduced number of rows it has to scan(1498365 compared to 9999027 rows) hence reduced number of page reads. Also, the sequential scan reads a large number of rows which it end ups throwing away when filtering the results(8501001 rows).

It is good practice to log and analyze slow queries to figure out the performance bottlenecks before adding an index or any other techniques. Always check your indexes if they are being used at all. Indexes occupy disk space and can degrade performance especially for write intensive workloads, so it would make more sense to only add an index to a relation if it is used in the execution of the relevant query.