TIL efficient subqueries with EXISTS
Today I was debugging performance issues in one of our reports, and I could not make heads nor tails of it for a long time. I was pretty sure the problem was with one of the more complex scopes where UNION was being used, but having spent a couple hours trying several approaches, I ended up pasting the scope in GPT and asking for tips. This is what I stared out with: scope :special_records, ->(project_id:) do project_config_entries = CompanyProjects::TeklaReports::TotalsConfigEntry.where(project_id: project_id) specials_from_report_file = unscoped.where(special: true).select(:id) specials_by_exact_config_match = unscoped.where(uid: project_config_entries.lookup_exact.select(:lookup_term)).select(:id) specials_by_prefix_config_match = unscoped.joins( "INNER JOIN (#{project_config_entries.lookup_prefix.to_sql}) AS specials_config_entries ON #{table_name}.uid LIKE specials_config_entries.lookup_term || '%'" ).select(:id) union_sql = [ specials_from_report_file.to_sql, specials_by_exact_config_match.to_sql, specials_by_prefix_config_match.to_sql ].join("\nUNION\n") where(id: from("(#{union_sql}) AS singles_from_several_sources").select(:id)) end The tricky part here is that I need to UNION several datasources that could identify a record as "special", and I need (or at least I thought I did) to JOIN using LIKE to get the prefix-based match lookup to work. Turns out an EXISTS pattern can be used both for the JOIN and for the ultimate WHERE id IN, leading to several orders of magnitude improvement - in my test data I went from 30s report generation query time to under 1s! And this is what I ended up with: scope :specials, ->(project_id:) do project_config_entries = CompanyProjects::TeklaReports::TotalsConfigEntry.where(project_id: project_id) specials_from_report_file = unscoped.where(special: true).select(:id) specials_by_exact_config_match = unscoped.where( "EXISTS ( SELECT 1 FROM (#{project_config_entries.lookup_exact.to_sql}) AS specials_config_entries WHERE specials_config_entries.lookup_term = #{table_name}.uid )" ).select(:id) specials_by_prefix_config_match = unscoped.where( "EXISTS ( SELECT 1 FROM (#{project_config_entries.lookup_prefix.to_sql}) AS specials_config_entries WHERE #{table_name}.uid LIKE specials_config_entries.lookup_term || '%' )" ).select(:id) union_sql = [ specials_from_report_file.to_sql, specials_by_exact_config_match.to_sql, specials_by_prefix_config_match.to_sql ].join("\nUNION\n") where( "EXISTS ( SELECT 1 FROM (#{union_sql}) AS specials_from_several_sources WHERE specials_from_several_sources.id = #{table_name}.id )" ) end Notice that replacing WHERE field IN with WHERE EXISTS (SELECT 1 .. is a general pattern and can yield good results wherever the subquery has a lot of records.

Today I was debugging performance issues in one of our reports, and I could not make heads nor tails of it for a long time.
I was pretty sure the problem was with one of the more complex scopes where UNION was being used, but having spent a couple hours trying several approaches, I ended up pasting the scope in GPT and asking for tips.
This is what I stared out with:
scope :special_records, ->(project_id:) do
project_config_entries = CompanyProjects::TeklaReports::TotalsConfigEntry.where(project_id: project_id)
specials_from_report_file = unscoped.where(special: true).select(:id)
specials_by_exact_config_match = unscoped.where(uid: project_config_entries.lookup_exact.select(:lookup_term)).select(:id)
specials_by_prefix_config_match = unscoped.joins(
"INNER JOIN (#{project_config_entries.lookup_prefix.to_sql}) AS specials_config_entries ON #{table_name}.uid LIKE specials_config_entries.lookup_term || '%'"
).select(:id)
union_sql = [
specials_from_report_file.to_sql,
specials_by_exact_config_match.to_sql,
specials_by_prefix_config_match.to_sql
].join("\nUNION\n")
where(id: from("(#{union_sql}) AS singles_from_several_sources").select(:id))
end
The tricky part here is that I need to UNION several datasources that could identify a record as "special", and I need (or at least I thought I did) to JOIN using LIKE to get the prefix-based match lookup to work.
Turns out an EXISTS pattern can be used both for the JOIN and for the ultimate WHERE id IN
, leading to several orders of magnitude improvement - in my test data I went from 30s report generation query time to under 1s!
And this is what I ended up with:
scope :specials, ->(project_id:) do
project_config_entries = CompanyProjects::TeklaReports::TotalsConfigEntry.where(project_id: project_id)
specials_from_report_file = unscoped.where(special: true).select(:id)
specials_by_exact_config_match = unscoped.where(
"EXISTS (
SELECT 1 FROM (#{project_config_entries.lookup_exact.to_sql}) AS specials_config_entries
WHERE specials_config_entries.lookup_term = #{table_name}.uid
)"
).select(:id)
specials_by_prefix_config_match = unscoped.where(
"EXISTS (
SELECT 1 FROM (#{project_config_entries.lookup_prefix.to_sql}) AS specials_config_entries
WHERE #{table_name}.uid LIKE specials_config_entries.lookup_term || '%'
)"
).select(:id)
union_sql = [
specials_from_report_file.to_sql,
specials_by_exact_config_match.to_sql,
specials_by_prefix_config_match.to_sql
].join("\nUNION\n")
where(
"EXISTS (
SELECT 1 FROM (#{union_sql}) AS specials_from_several_sources
WHERE specials_from_several_sources.id = #{table_name}.id
)"
)
end
Notice that replacing WHERE field IN
with WHERE EXISTS (SELECT 1 ..
is a general pattern and can yield good results wherever the subquery has a lot of records.