Fetch Values from Previous Non-Null Value Rows — From SQL to SPL #24

Problem description & analysis: The database table organisation_user_link stores the current state of the account, where dossier_created is the account creation time. The database table organisation_user_link_status_history stores the change history of account status. Task: Now, based on the specified query date, list the account status for each day within the period from today (March 14, 2024) to the query date (March 1, 2024). Requirement: Reasonably fill in the status of blank dates, for example, from today to the latest change date, it should be filled in as today’s, and from the latest change date to the next nearest change date, it should be filled in as the latest’s; Supplement the account creation date; Finally, sort by account and date in reverse order. Code comparisons: SQL: WITH RECURSIVE dates ( date ) AS ( SELECT DATE('2024-03-01') UNION ALL SELECT DATE(date) + INTERVAL 1 DAY FROM dates WHERE DATE(DATE) < (NOW() - INTERVAL 1 DAY) ), current_history_data_query AS ( SELECT current_history_data.* FROM ( SELECT DATE(timestamp) AS date, user_id, organisation_id, status_id, stopped_reason_id, dossier_created, 'history-data' AS src FROM ( SELECT oulsh.user_id, oulsh.organisation_id, oulsh.timestamp, oulsh.status_id, oulsh.stopped_reason_id, oul.dossier_created, ROW_NUMBER() OVER (PARTITION BY oulsh.user_id, oulsh.organisation_id, DATE(oulsh.timestamp) ORDER BY oulsh.timestamp DESC) AS row_num FROM organisation_user_link_status_history AS oulsh INNER JOIN organisation_user_link AS oul ON oulsh.user_id = oul.user_id AND oulsh.organisation_id = oul.organisation_id ) AS numbered_rows WHERE row_num = 1 AND DATE(timestamp) != DATE(NOW()) UNION ALL SELECT CURRENT_DATE AS date, oul.user_id, oul.organisation_id, oul.status_id, oul.stopped_reason_id, oul.dossier_created, 'current-data' AS src FROM organisation_user_link AS oul ) AS current_history_data ORDER BY DATE DESC ) SELECT d.date, u.user_id, u.organisation_id, ( SELECT status_id FROM current_history_data_query WHERE user_id = u.user_id AND organisation_id = u.organisation_id AND date >= d.date ORDER BY date ASC LIMIT 1 ) AS status_id, ( SELECT stopped_reason_id FROM current_history_data_query WHERE user_id = u.user_id AND organisation_id = u.organisation_id AND date >= d.date ORDER BY date ASC LIMIT 1 ) AS stopped_reason_id, ( SELECT dossier_created FROM current_history_data_query WHERE user_id = u.user_id AND organisation_id = u.organisation_id AND date >= d.date ORDER BY date ASC LIMIT 1 ) AS dossier_created FROM dates d JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u ORDER BY d.date DESC, u.user_id; SQL uses recursive subqueries to create date sequences, which have complex structures. It uses multi-layer nested queries and window functions to mark state changes, and then fills in blank date data with join statements, making the code cumbersome. SPL: SPL does not aggregate after grouping by accounts, but continues to calculate subsets of the groups. SPL provides functions for generating date sequences and records based on date sequences.

Apr 29, 2025 - 04:11
 0
Fetch Values from Previous Non-Null Value Rows — From SQL to SPL #24

Problem description & analysis:

The database table organisation_user_link stores the current state of the account, where dossier_created is the account creation time.

source table 1

The database table organisation_user_link_status_history stores the change history of account status.

source table 2

Task: Now, based on the specified query date, list the account status for each day within the period from today (March 14, 2024) to the query date (March 1, 2024). Requirement: Reasonably fill in the status of blank dates, for example, from today to the latest change date, it should be filled in as today’s, and from the latest change date to the next nearest change date, it should be filled in as the latest’s; Supplement the account creation date; Finally, sort by account and date in reverse order.

expected results

Code comparisons:

SQL

WITH RECURSIVE dates ( date ) AS (
    SELECT DATE('2024-03-01')
    UNION ALL
    SELECT DATE(date) + INTERVAL 1 DAY
    FROM dates
    WHERE DATE(DATE) < (NOW() - INTERVAL 1 DAY)
),
current_history_data_query AS (
    SELECT 
        current_history_data.*
    FROM (
        SELECT
           DATE(timestamp) AS date,
           user_id,
           organisation_id,
           status_id,
           stopped_reason_id,
           dossier_created,
           'history-data' AS src
         FROM (
           SELECT
               oulsh.user_id,
               oulsh.organisation_id,
               oulsh.timestamp,
               oulsh.status_id,
               oulsh.stopped_reason_id,
               oul.dossier_created,
               ROW_NUMBER() OVER (PARTITION BY oulsh.user_id, oulsh.organisation_id, DATE(oulsh.timestamp) ORDER BY oulsh.timestamp DESC) AS row_num
           FROM organisation_user_link_status_history AS oulsh
           INNER JOIN organisation_user_link AS oul ON oulsh.user_id = oul.user_id AND oulsh.organisation_id = oul.organisation_id
         ) AS numbered_rows
         WHERE row_num = 1 AND DATE(timestamp) != DATE(NOW())

         UNION ALL

         SELECT CURRENT_DATE AS date, oul.user_id, oul.organisation_id, oul.status_id, oul.stopped_reason_id, oul.dossier_created, 'current-data' AS src
         FROM organisation_user_link AS oul
    ) AS current_history_data
    ORDER BY DATE DESC
)
SELECT d.date, u.user_id, u.organisation_id,
  (
    SELECT status_id
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS status_id,
  (
    SELECT stopped_reason_id
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS stopped_reason_id,
  (
    SELECT dossier_created
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS dossier_created
FROM dates d
JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
ORDER BY d.date DESC, u.user_id;

SQL uses recursive subqueries to create date sequences, which have complex structures. It uses multi-layer nested queries and window functions to mark state changes, and then fills in blank date data with join statements, making the code cumbersome.

SPL: SPL does not aggregate after grouping by accounts, but continues to calculate subsets of the groups. SPL provides functions for generating date sequences and records based on date sequences.