Foreign Keys: A must in SQL, but not in a Document Database?
In relational SQL databases, foreign keys are essential for guaranteeing referential integrity. There are two main reasons to declare them, particularly regarding their cascading behavior: Enforcing Strong Relationships: SQL databases require entities to be stored in separate tables, even when they share the same lifecycle. This separation is mandated by normalization rules, particularly the first normal form for One-To-Many relationships. Foreign keys establish a dependency between these tables and ensure that changes in one table (such as deletions) are either reflected in the child table (using "ON DELETE CASCADE") or prohibited (with "ON DELETE RESTRICT"). For instance, in a blog application, you want to avoid orphaned comments that are not linked to a blog post. If the blog post is deleted, all associated comments should also be deleted, or the deletion should be declined when comments exist. In addition to consistency, the declaration of the foreign key is necessary for the query planner to be aware of the strong relationship between the two tables, so that it can estimate the result of the join more accurately. Consistency in Join Operations: SQL tables are often joined using inner joins, the default, where unmatched rows are discarded. This can lead to inconsistencies in query results, depending on whether there is a join or not. Foreign keys help mitigate these inconsistencies by ensuring that a valid foreign key reference always matches a corresponding row. For example, an author may be removed due to the GDPR "Right to Erasure" but not if it could make some blog posts invisible. The application must do it differently, assigning an unknown author entry before the deletion. Another possibility would be to always use an outer join in queries, but this limits the possible join optimizations. There are other reasons to enforce foreign keys, but they also involve some application logic and cannot be a simple database declaration, except if you accept business logic in triggers. For example, if you rename a blog category, do you want all blog posts to reflect the new name? Should the old name remain for existing posts, and the new category only apply to new posts? You might want to prevent the removal of a category that is still in use or allow it while removing the category from all posts. If it was the main category of a post, you may need to designate another category as the main one, but not pick a random one. These decisions depend on business reasons for renaming: was it a simple typo, or is the category replaced or being split into two? Such complex scenarios cannot be simply managed by the database through generic ON DELETE or ON UPDATE commands. The business transaction that removes a category would first analyze the consequences and make the necessary changes. Typically, the category is logically deleted, meaning it can no longer be used. However, joins will still show results with the old name until the posts in that category are reassigned. The category can later be physically deleted, when there are no orphaned records. What About MongoDB? The primary difference between a relational data model and a document data model is that the former is designed for a central database without knowing the future use cases, while the latter is tailored to specific domains with known application access patterns. A document model is application-centric, meaning the application itself implements a logic that maintains the integrity, including validating lists of values or existing references before insertion, and safe preparation before deletion or update. In the two cases mentioned above, when referential integrity does not involve additional application logic but is part of the data model, MongoDB does not require foreign keys for enforcement: Enforcing Strong Relationships is inherent in the document model, as entities sharing the same lifecycle are typically embedded within a single document. Since MongoDB can store and index sub-documents without restrictions, the separation of a One-to-Many relationship into different documents is unnecessary. Sub-documents cannot become orphaned because they are owned by their parent document. Consistency in Join Operations is guaranteed through the $lookup operation, which functions similarly to a left outer join. This ensures that records from the driving collection are not removed even if their lookup references do not match. To illustrate other cases, consider a Many-to-One relationship, such as a product catalog in an order-entry system. A key advantage of normalization is that changing a product name requires only a single row update in the reference table, reflecting across all queries that join the product table. However, this can lead to inconsistencies outside the database. For example, a customer may receive an email at order completion or a printed bill displaying the old product name, which could mislead them if a different product n

In relational SQL databases, foreign keys are essential for guaranteeing referential integrity. There are two main reasons to declare them, particularly regarding their cascading behavior:
Enforcing Strong Relationships: SQL databases require entities to be stored in separate tables, even when they share the same lifecycle. This separation is mandated by normalization rules, particularly the first normal form for One-To-Many relationships. Foreign keys establish a dependency between these tables and ensure that changes in one table (such as deletions) are either reflected in the child table (using "ON DELETE CASCADE") or prohibited (with "ON DELETE RESTRICT"). For instance, in a blog application, you want to avoid orphaned comments that are not linked to a blog post. If the blog post is deleted, all associated comments should also be deleted, or the deletion should be declined when comments exist.
In addition to consistency, the declaration of the foreign key is necessary for the query planner to be aware of the strong relationship between the two tables, so that it can estimate the result of the join more accurately.Consistency in Join Operations: SQL tables are often joined using inner joins, the default, where unmatched rows are discarded. This can lead to inconsistencies in query results, depending on whether there is a join or not. Foreign keys help mitigate these inconsistencies by ensuring that a valid foreign key reference always matches a corresponding row. For example, an author may be removed due to the GDPR "Right to Erasure" but not if it could make some blog posts invisible. The application must do it differently, assigning an unknown author entry before the deletion. Another possibility would be to always use an outer join in queries, but this limits the possible join optimizations.
There are other reasons to enforce foreign keys, but they also involve some application logic and cannot be a simple database declaration, except if you accept business logic in triggers. For example, if you rename a blog category, do you want all blog posts to reflect the new name? Should the old name remain for existing posts, and the new category only apply to new posts? You might want to prevent the removal of a category that is still in use or allow it while removing the category from all posts. If it was the main category of a post, you may need to designate another category as the main one, but not pick a random one. These decisions depend on business reasons for renaming: was it a simple typo, or is the category replaced or being split into two?
Such complex scenarios cannot be simply managed by the database through generic ON DELETE or ON UPDATE commands. The business transaction that removes a category would first analyze the consequences and make the necessary changes. Typically, the category is logically deleted, meaning it can no longer be used. However, joins will still show results with the old name until the posts in that category are reassigned. The category can later be physically deleted, when there are no orphaned records.
What About MongoDB?
The primary difference between a relational data model and a document data model is that the former is designed for a central database without knowing the future use cases, while the latter is tailored to specific domains with known application access patterns.
A document model is application-centric, meaning the application itself implements a logic that maintains the integrity, including validating lists of values or existing references before insertion, and safe preparation before deletion or update.
In the two cases mentioned above, when referential integrity does not involve additional application logic but is part of the data model, MongoDB does not require foreign keys for enforcement:
Enforcing Strong Relationships is inherent in the document model, as entities sharing the same lifecycle are typically embedded within a single document. Since MongoDB can store and index sub-documents without restrictions, the separation of a One-to-Many relationship into different documents is unnecessary. Sub-documents cannot become orphaned because they are owned by their parent document.
Consistency in Join Operations is guaranteed through the $lookup operation, which functions similarly to a left outer join. This ensures that records from the driving collection are not removed even if their lookup references do not match.
To illustrate other cases, consider a Many-to-One relationship, such as a product catalog in an order-entry system. A key advantage of normalization is that changing a product name requires only a single row update in the reference table, reflecting across all queries that join the product table. However, this can lead to inconsistencies outside the database. For example, a customer may receive an email at order completion or a printed bill displaying the old product name, which could mislead them if a different product name is displayed on the website.
Instead of deleting or renaming a product, a new version is created in a relational database, while previous orders reference the earlier version. In a document model, product information associated with the order, as seen in confirmation emails or on bills, is stored directly in the document. While this may appear as denormalization, the duplication is driven by business logic, not changing what was sent or printed, rather than solely for performance.
Typically, additional fields are included as an extended reference, rather than embedding the entire product document. A separate collection of products still exists, allowing updates without cascading effects, since necessary details are copied to the order document itself. Conceptually, this resembles having two entities for products, because of the temporality: the current catalog for new orders and a snapshot of the catalog at the time of order, embedded within the order.
I did not cover Many-to-Many relationships, as they are transformed into two One-to-Many relationships in SQL with an additional association table. With document data modeling, this functions similarly as a reference or extended reference, as MongoDB accommodates an array of references. For example, a new blog post with multiple authors requires complex migration in SQL but fits seamlessly into the same model with the same indexes in MongoDB. Another example can be found in The World Is Too Messy for SQL to Work.
Comparison
In conclusion, MongoDB does not lack foreign keys. It simply manages relationships differently than relational SQL databases. While SQL relies on foreign keys for referential integrity, MongoDB employs an application-centric approach through data embedding that is not limited by the normal forms, and update logic to cascade the updates, though an aggregation pipeline. This strategy effectively maintains document relationships, ensuring consistency without the need for foreign key constraints.
Although it may require more code, avoiding unnecessary foreign keys facilitates sharding and horizontal scaling with minimal performance impact. Ultimately, with MongoDB, maintaining database integrity during reference table updates is a shared responsibility between the database and the application, contrasting SQL's independent in-database declarations that are use-case agnostic.
Migration
When migrating from PostgreSQL to MongoDB, provided that the relational data modeling was done correctly, consider the following:
- ON DELETE CASCADE indicates a shared lifecycle and should be transformed into an embedded sub-document.
- ON DELETE SET NULL/DEFAULT represents an independent lifecycle. Use an extended reference to copy relevant fields to the child at insert time instead of using joins during queries.
- ON DELETE RESTRICT/NO ACTION requires application logic before deletion. In MongoDB, you can manage consistency and performance with logical deletions prior to physical ones, and utilize an aggregation pipeline to execute them.
- ON UPDATE actions indicate a natural key was used, and any key exposed to the user might be updated, but it's preferable to utilize an immutable _id for references.
- Primary Keys that include the Foreign Keys suggest a Many-to-Many association. Convert this into an array of references on the side where the size is predictable. This approach eliminates the need for an additional array on the opposite side because the multi-key index on the array serves for the navigation from the other side.