A better way to search arrays in Postgres
When using arrays in Postgres often time you would have a need to filter based on an element in the array. This can be done using a simple gin index on the column. Example Assume resources Expense, Tag, and Org. 5M Expenses 10 Orgs 2K Tags create table expenses( id serial primary key, org_id integer not null, tag_ids integer[] not null ); insert into expenses select id, 1+id%9 org_id, gi.user_ids from generate_series(1, 5000000) id join lateral ( select i%5000000 pid, array_agg(round(random() * 2000)) user_ids from generate_series(1, 5.5*5000000) i group by i%5000000 ) gi on pid=id ; Roughly every expense can have 5 tags. We need to be able to list all expenses in an org with a certain tag. The org_id filter in most cases comes from access layers where a user with a token can access the data of one org only for which the session is active. select * from expenses ex where ex.org_id=1 and ex.tag_ids @> '{7}' ; Note: @> is 'contains' operator Postgresql: functions-array No Indexes: Execution Time: 1682.684 ms Index on org_id: create index on expenses(org_id); Execution Time: 228.352 ms Improvement of 7.3x Can we do better? Enter Gin Index on tag_ids: create index on expenses(org_id); create index on expenses using gin (tag_ids); Execution Time: 43.473 ms Improvement of another 5.2x Can we do even better? Enter Partial Index: create index on expenses using gin (org_id, tag_ids); ERROR: data type integer has no default operator class for access method "gin" Enter btree_gin extension: create extension btree_gin; create index on expenses using gin (org_id, tag_ids); Execution Time: 7.293 ms Improvement of another 5.9x Summary: Time: 1682.68 -> 228.35 -> 43.47 -> 7.29ms Improvements: total: 230x from just org_id index: 31x Postgres provides you with an extension out of the box called 'btree_gin'. btree_gin provides sample GIN operator classes that implement B-tree equivalent behavior for simple non-composite data types. References: Postgresql: btree-gin Postgresql: gin-intro

When using arrays in Postgres often time you would have a need to filter based on an element in the array. This can be done using a simple gin index on the column.
Example
Assume resources Expense, Tag, and Org.
- 5M Expenses
- 10 Orgs
- 2K Tags
create table expenses(
id serial primary key,
org_id integer not null,
tag_ids integer[] not null
);
insert into expenses
select
id,
1+id%9 org_id,
gi.user_ids
from
generate_series(1, 5000000) id
join lateral (
select
i%5000000 pid,
array_agg(round(random() * 2000)) user_ids
from generate_series(1, 5.5*5000000) i
group by i%5000000
) gi on pid=id
;
Roughly every expense can have 5 tags. We need to be able to list all expenses in an org with a certain tag.
The org_id filter in most cases comes from access layers where a user with a token can access the data of one org only for which the session is active.
select
*
from
expenses ex
where
ex.org_id=1
and ex.tag_ids @> '{7}'
;
Note: @> is 'contains' operator
Postgresql: functions-array
No Indexes:
Execution Time: 1682.684 ms
Index on org_id:
create index on expenses(org_id);
Execution Time: 228.352 ms
Improvement of 7.3x
Can we do better? Enter Gin Index on tag_ids:
create index on expenses(org_id);
create index on expenses using gin (tag_ids);
Execution Time: 43.473 ms
Improvement of another 5.2x
Can we do even better? Enter Partial Index:
create index on expenses using gin (org_id, tag_ids);
ERROR: data type integer has no default operator class for access method "gin"
Enter btree_gin extension:
create extension btree_gin;
create index on expenses using gin (org_id, tag_ids);
Execution Time: 7.293 ms
Improvement of another 5.9x
Summary:
Time: 1682.68 -> 228.35 -> 43.47 -> 7.29ms
Improvements:
- total: 230x
- from just org_id index: 31x
Postgres provides you with an extension out of the box called 'btree_gin'.
btree_gin provides sample GIN operator classes that implement B-tree equivalent behavior for simple non-composite data types.
References: