Postgresql soft-delete strategies: balancing data retention

Usually, when we work in applications where the capacity is needed to restore deleted records at certain point in time, such as messaging systems where the user may have deleted messages/attachments but may be retained for moderation by admins that can be restored at a later point, or in a content management service where files are deleted by user and can be restored from trash. Traditional soft-delete We simply add a flag is_deleted or is_active or as a timestamp of deletion deleted_at and call it a day. And all the queries will have this flag in the where clause to differentiate deleted and active service data. Most popular ORMs provide such provision for soft-deletion(as paranoid tables). Here is an example of soft-deletion in action in sequelize ORM class Post extends Model {} Post.init( { /* attributes here */ }, { sequelize, paranoid: true, // If you want to give a custom name to the deletedAt column deletedAt: 'destroyTime', }, ); Soft-deletion and restoration happens as await Post.destroy({ where: { id: 1, }, }); // UPDATE "posts" SET "deletedAt"=[timestamp] WHERE "deletedAt" IS NULL AND "id" = 1 // Restoring every soft-deleted post with more than 100 likes await Post.restore({ where: { likes: { [Op.gt]: 100, }, }, }); Row Level Security(RLS) in postgres There may be inconsistent data if the developer forgets to add the is_active flag to the query, typically when we move specific high frequency APIs to different language or ecosystem. In such scenarios, we can let the database take of soft-deletion so it leaves less room for error for the developer. Using Row-Level Security (RLS) with soft deletes in PostgreSQL is a powerful way to ensure deleted data is automatically hidden from regular queries — without relying on every query writer to remember WHERE is_deleted = false Let’s say you have a table like this: CREATE TABLE tasks ( id SERIAL PRIMARY KEY, title TEXT, is_deleted BOOLEAN DEFAULT FALSE, owner_id INT ); We can enforce the database to ignore deleted rows for select queries. -- Enable RLS ALTER TABLE tasks ENABLE ROW LEVEL SECURITY; -- Create a RLS policy CREATE POLICY only_active_tasks ON tasks FOR SELECT USING ( is_deleted = FALSE ); -- Enforce the policy ALTER TABLE tasks FORCE ROW LEVEL SECURITY; This can be extended for update and delete operations as well. Also ensure db admins have provision to bypass RLS policy for data restoration. All these are the default go to solution when it comes to soft-delete functionality for simple applications or internal tools. It is going to be a different game when it comes to enterprise (or) B2B (or) fintech applications where multiple policies, compliances must be followed. Eg., there may audti compliance to retain deleted data for 90 days; provision for the user to delete user-related data(including historic data) completely from the platform. Biggest concern when managing data is complex would be data leakage into backups, that is when data is to be deleted but may be retained in backups taken awhile ago. When large teams work on same table/database having is_delete will always have serious regressions. Archive tables One easier way is to maintain a separate archive table that holds the deleted records with supporting metadata like table,deleted_by, deleted_at. This way the active service data will look clean and devoid of any paranoid clauses. No additional attention is needed on data leakage into backups. This table can be ignored from the periodic service data backup and the archive table can have a specific backup of its own. A simple trigger can be written which catches the delete queries and moves them to the archive table. CREATE OR REPLACE FUNCTION archive_deleted_rows() RETURNS TRIGGER AS $$ BEGIN INSERT INTO archive_table ( table_name, deleted_data, deleted_at, deleted_by ) VALUES ( TG_TABLE_NAME, to_jsonb(OLD), now(), current_setting('app.current_user', true) --user from session ); RETURN OLD; END; $$ LANGUAGE plpgsql; Way forward Soft-delete is one of such features which may seem harmless at face value, but need a thorough thought out plan at scale. In the end, everything boils down to the nature of the application, compliances involved, where adopting the right techniques will be helpful in the long run.

Apr 23, 2025 - 10:04
 0
Postgresql soft-delete strategies: balancing data retention

Usually, when we work in applications where the capacity is needed to restore deleted records at certain point in time, such as messaging systems where the user may have deleted messages/attachments but may be retained for moderation by admins that can be restored at a later point, or in a content management service where files are deleted by user and can be restored from trash.

Traditional soft-delete

We simply add a flag is_deleted or is_active or as a timestamp of deletion deleted_at and call it a day. And all the queries will have this flag in the where clause to differentiate deleted and active service data.

Most popular ORMs provide such provision for soft-deletion(as paranoid tables). Here is an example of soft-deletion in action in sequelize ORM

class Post extends Model {}
Post.init(
  {
    /* attributes here */
  },
  {
    sequelize,
    paranoid: true,

    // If you want to give a custom name to the deletedAt column
    deletedAt: 'destroyTime',
  },
);

Soft-deletion and restoration happens as

await Post.destroy({
  where: {
    id: 1,
  },
});
// UPDATE "posts" SET "deletedAt"=[timestamp] WHERE "deletedAt" IS NULL AND "id" = 1


// Restoring every soft-deleted post with more than 100 likes
await Post.restore({
  where: {
    likes: {
      [Op.gt]: 100,
    },
  },
});

Row Level Security(RLS) in postgres

There may be inconsistent data if the developer forgets to add the is_active flag to the query, typically when we move specific high frequency APIs to different language or ecosystem. In such scenarios, we can let the database take of soft-deletion so it leaves less room for error for the developer.

Using Row-Level Security (RLS) with soft deletes in PostgreSQL is a powerful way to ensure deleted data is automatically hidden from regular queries — without relying on every query writer to remember WHERE is_deleted = false

Let’s say you have a table like this:

CREATE TABLE tasks (
    id SERIAL PRIMARY KEY,
    title TEXT,
    is_deleted BOOLEAN DEFAULT FALSE,
    owner_id INT
);

We can enforce the database to ignore deleted rows for select queries.

-- Enable RLS
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;

-- Create a RLS policy
CREATE POLICY only_active_tasks
ON tasks
FOR SELECT
USING (
    is_deleted = FALSE
);

-- Enforce the policy
ALTER TABLE tasks FORCE ROW LEVEL SECURITY;

This can be extended for update and delete operations as well. Also ensure db admins have provision to bypass RLS policy for data restoration.

All these are the default go to solution when it comes to soft-delete functionality for simple applications or internal tools. It is going to be a different game when it comes to enterprise (or) B2B (or) fintech applications where multiple policies, compliances must be followed. Eg., there may audti compliance to retain deleted data for 90 days; provision for the user to delete user-related data(including historic data) completely from the platform.

Biggest concern when managing data is complex would be data leakage into backups, that is when data is to be deleted but may be retained in backups taken awhile ago. When large teams work on same table/database having is_delete will always have serious regressions.

Archive tables

One easier way is to maintain a separate archive table that holds the deleted records with supporting metadata like table,deleted_by, deleted_at. This way the active service data will look clean and devoid of any paranoid clauses. No additional attention is needed on data leakage into backups. This table can be ignored from the periodic service data backup and the archive table can have a specific backup of its own.

A simple trigger can be written which catches the delete queries and moves them to the archive table.

CREATE OR REPLACE FUNCTION archive_deleted_rows()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO archive_table (
        table_name,
        deleted_data,
        deleted_at,
        deleted_by
    )
    VALUES (
        TG_TABLE_NAME,
        to_jsonb(OLD),
        now(),
        current_setting('app.current_user', true)  --user from session
    );
    RETURN OLD;
END;
$$ LANGUAGE plpgsql;

Way forward

Soft-delete is one of such features which may seem harmless at face value, but need a thorough thought out plan at scale. In the end, everything boils down to the nature of the application, compliances involved, where adopting the right techniques will be helpful in the long run.