How to Resolve Django DISTINCT ON Order By Error
When working with Django ORM, you might encounter the need to retrieve distinct records, especially when filtering on a column such as sku. However, you may run into a specific issue if you aim to sort by another column, like id. This article will cover why this error arises and how you can solve it effectively. Understanding the Problem The error you encountered, django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions, occurs because of the specific SQL syntax requirements when using DISTINCT ON() in combination with an ORDER BY clause. When you apply DISTINCT ON, the PostgreSQL database expects that the column specified in DISTINCT ON appears at the beginning of the ORDER BY clause. Because your query orders by id after using DISTINCT ON(sku), PostgreSQL throws this error. Example Data Structure Let's assume you have a Django model that looks like this: class Product(models.Model): sku = models.CharField(max_length=255) id = models.AutoField(primary_key=True) Your goal is to achieve the distinct sku values alongside their relationships to the id. For this, let’s see how you can re-structure your query. Step-by-Step Solution Step 1: Distinct Query with Correct Ordering The key to resolving this issue is to ensure that your ORDER BY clause matches the DISTINCT ON clause. You should start by retrieving distinct sku values and then use a subquery if needed to further sort by id. Applying a Subquery Here’s how to do it: from django.db.models import OuterRef, Subquery, F # First, create a subquery that retrieves distinct SKU values. subquery = Product.objects.filter(sku=OuterRef('sku')).order_by('id') queryset = Product.objects.distinct('sku').annotate(min_id=Subquery(subquery.values('id')[:1])) queryset = queryset.order_by('min_id') Explanation of the Query Subquery Creation: The subquery filters the Product model based on matching SKUs and orders them by id. We denote OuterRef('sku') to reference the outer queryset's sku column. Distinct Declaration: Using .distinct('sku') allows us to select unique sku values. Annotation: We then annotate each distinct object with a minimum id associated with that sku from the subquery. Final Ordering: Finally, we use .order_by('min_id') to sort the output based on the minimum id for each distinct sku. Final SQL Query This Django ORM structure translates into an SQL query somewhat like this: SELECT DISTINCT ON ("defapp_log"."sku") "defapp_log".* FROM "defapp_log" ORDER BY "defapp_log"."sku", "defapp_log"."id"; Conclusion By restructuring your query to correctly use PostgreSQL’s DISTINCT ON along with the ORDER BY clause, you're able to avoid the ProgrammingError. This allows you to retrieve distinct records efficiently while retaining the desired sorting based on id. Frequently Asked Questions What databases support DISTINCT ON syntax? PostgreSQL is the primary database that supports the DISTINCT ON syntax. It is not supported in MySQL or SQLite, so be cautious of your database type when using this feature. Can I achieve the same result without using DISTINCT ON? Yes, you could alternatively utilize grouping, however, that might not always yield the same efficiency as using DISTINCT ON, especially based on your specific data layout. Will this work for large data sets? Using subqueries along with DISTINCT ON may lead to performance hits on large datasets, so always consider running performance tests as your data grows.

When working with Django ORM, you might encounter the need to retrieve distinct records, especially when filtering on a column such as sku
. However, you may run into a specific issue if you aim to sort by another column, like id
. This article will cover why this error arises and how you can solve it effectively.
Understanding the Problem
The error you encountered, django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
, occurs because of the specific SQL syntax requirements when using DISTINCT ON()
in combination with an ORDER BY
clause. When you apply DISTINCT ON
, the PostgreSQL database expects that the column specified in DISTINCT ON
appears at the beginning of the ORDER BY
clause. Because your query orders by id
after using DISTINCT ON(sku)
, PostgreSQL throws this error.
Example Data Structure
Let's assume you have a Django model that looks like this:
class Product(models.Model):
sku = models.CharField(max_length=255)
id = models.AutoField(primary_key=True)
Your goal is to achieve the distinct sku
values alongside their relationships to the id
. For this, let’s see how you can re-structure your query.
Step-by-Step Solution
Step 1: Distinct Query with Correct Ordering
The key to resolving this issue is to ensure that your ORDER BY
clause matches the DISTINCT ON
clause. You should start by retrieving distinct sku
values and then use a subquery if needed to further sort by id
.
Applying a Subquery
Here’s how to do it:
from django.db.models import OuterRef, Subquery, F
# First, create a subquery that retrieves distinct SKU values.
subquery = Product.objects.filter(sku=OuterRef('sku')).order_by('id')
queryset = Product.objects.distinct('sku').annotate(min_id=Subquery(subquery.values('id')[:1]))
queryset = queryset.order_by('min_id')
Explanation of the Query
-
Subquery Creation: The subquery filters the
Product
model based on matching SKUs and orders them byid
. We denoteOuterRef('sku')
to reference the outer queryset'ssku
column. -
Distinct Declaration: Using
.distinct('sku')
allows us to select uniquesku
values. -
Annotation: We then annotate each distinct object with a minimum
id
associated with thatsku
from the subquery. -
Final Ordering: Finally, we use
.order_by('min_id')
to sort the output based on the minimumid
for each distinctsku
.
Final SQL Query
This Django ORM structure translates into an SQL query somewhat like this:
SELECT DISTINCT ON ("defapp_log"."sku") "defapp_log".*
FROM "defapp_log"
ORDER BY "defapp_log"."sku", "defapp_log"."id";
Conclusion
By restructuring your query to correctly use PostgreSQL’s DISTINCT ON
along with the ORDER BY
clause, you're able to avoid the ProgrammingError. This allows you to retrieve distinct records efficiently while retaining the desired sorting based on id
.
Frequently Asked Questions
What databases support DISTINCT ON syntax?
PostgreSQL is the primary database that supports the DISTINCT ON
syntax. It is not supported in MySQL or SQLite, so be cautious of your database type when using this feature.
Can I achieve the same result without using DISTINCT ON?
Yes, you could alternatively utilize grouping, however, that might not always yield the same efficiency as using DISTINCT ON
, especially based on your specific data layout.
Will this work for large data sets?
Using subqueries along with DISTINCT ON
may lead to performance hits on large datasets, so always consider running performance tests as your data grows.