What is a good approach to sync remote API data to a database?

Problem statement We have a remote API that can be queried to fetch some arbitrary objects (we're expecting around 500 thousand records). Our application needs fast access to this data (faster than the API can provide) so we're using a relational database as a read-only cache. The question is how best to keep the DB in sync with the remote API such that the data in the DB is as fresh as reasonably possible. Possible approaches Write a program that periodically fetches data from the API, compares it to the data in the DB, and does some clever diffing to update what needs updating. I'm worried about the complexity of implementing and testing this. It seems inefficient to compare every single record on both sides. Write a program that fetches data from the API and simply replaces everything in the DB without doing a diff. To avoid discrepancies this replace will have to be an atomic transaction, but I worry about how long the transaction will take (and block queries to the DB). Especially if we have to run it frequently. Use some already-built API-to-DB syncing tool. Do you know of any?

Apr 7, 2025 - 13:40
 0
What is a good approach to sync remote API data to a database?

Problem statement

We have a remote API that can be queried to fetch some arbitrary objects (we're expecting around 500 thousand records). Our application needs fast access to this data (faster than the API can provide) so we're using a relational database as a read-only cache.

The question is how best to keep the DB in sync with the remote API such that the data in the DB is as fresh as reasonably possible.

Possible approaches

  1. Write a program that periodically fetches data from the API, compares it to the data in the DB, and does some clever diffing to update what needs updating.
    1. I'm worried about the complexity of implementing and testing this.
    2. It seems inefficient to compare every single record on both sides.
  2. Write a program that fetches data from the API and simply replaces everything in the DB without doing a diff.
    1. To avoid discrepancies this replace will have to be an atomic transaction, but I worry about how long the transaction will take (and block queries to the DB). Especially if we have to run it frequently.
  3. Use some already-built API-to-DB syncing tool. Do you know of any?