DIY Database Backup - quick and dirty backup using rsync and s3

Let's say you have a local database (DynamoDB / Postgresql) for a local project; it's not production yet, so no need for RDS or alike. However, you would still want to backup this database. How? Setup MongoDB 7.0 in a Docker container exposed on port 27090 PostgreSQL with pgvector in a custom Docker container exposed on port 54032 Data directories mounted from the host at: /root/data/mongodb for MongoDB /root/data/postgresql for PostgreSQL Storage Layer VS Application Layer Storage Layer Backups: Direct copying of database files Application Layer Backups: Using database-specific tools like mongodump and pg_dump 1st try - tar datafiles: export filepath=/root/data/backups/mongo_db_backup_day.tar.gz tar -cf /root/data/mongo_db $filepath Problem: what if files are changing during the backup? 2nd try - copy before taring: cp /root/data/mongo_db /root/data/backups/mongo_db export filepath=/root/data/backups/mongo_db_backup_day.tar.gz tar -cf /root/data/backups/tmp/mongo_db $filepath Problem: Copying is faster then archiving, but still - files could change during the copy. 3rd try - 2-pass rsync rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/ rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/ export filepath=/root/data/backups/mongo_db_backup_day.tar.gz tar -cf /root/data/backups/tmp/mongo_db $filepath This looks way better! We do 2-pass rsync copying. We do rsync twice - first we copy all files, and on second pass we copy only changes files from the first. rsync syncs (copies) only changed files. Feature Request - Saving Latest X Backups We would like to save latest X backups, eg latest 3 backups; how shall we do this? One way, is to do aws s3 ls and delete old backups; However, we want quick and dirty solution; so we take the current day and module 3. This way we will rotate backup "shard" every day. # DAY_MOD is day of the year module 3 export DAY_MOD=$(( $(date +%j) % 3 )) export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz tar -cf /root/data/backups/tmp/mongo_db $filepath Feature Request - Speed of Backup? For this, we can parallize the process. We could use parallel command, but seems like pigz is better; Let's limit also number of cpus to 10 (we could also choose nproc//2, or nproc-1 for that matter). export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz tar -c /root/data/backups/tmp/mongo_db | pigz -p 10 > $filepath Feature request - Application Layer Backup This could be done using pg_dump, or using mongodump. It's really 2 minutes talking to Claude, to get Docker command like this: docker run --rm --network=host \ -v $(dirname $BACKUP_PATH):/backup \ mongo:7.0.15-jammy \ mongodump --host=localhost --port=27090 \ --username="$MONGO_USERNAME" \ --password="$MONGO_PASSWORD" \ --authenticationDatabase=admin \ --archive=/backup/$(basename $BACKUP_PATH) --gzip Q: Why double-rsync? A: The first rsync copies most files. During this time, some files might change. The second rsync then efficiently copies only the files that changed during the first pass, resulting in a more consistent snapshot. Q: Storage layer backup? Isn't this a problem? A: Yes, it is; it will require using the exact same database version, eg the same Docker tag. Q: What about differential backup? A: For larger systems, this makes lots of sense to integrate CDC and do faster backup. However, for larger systems we might be using managed solutions already.

Apr 16, 2025 - 11:38
 0
DIY Database Backup - quick and dirty backup using rsync and s3

MongoDB Application Layer backup example

Let's say you have a local database (DynamoDB / Postgresql) for a local project; it's not production yet, so no need for RDS or alike.
However, you would still want to backup this database. How?

Setup

  • MongoDB 7.0 in a Docker container exposed on port 27090
  • PostgreSQL with pgvector in a custom Docker container exposed on port 54032

Data directories mounted from the host at:
/root/data/mongodb for MongoDB
/root/data/postgresql for PostgreSQL

Storage Layer VS Application Layer

Storage Layer Backups: Direct copying of database files
Application Layer Backups: Using database-specific tools like mongodump and pg_dump

1st try - tar datafiles:

export filepath=/root/data/backups/mongo_db_backup_day.tar.gz

tar -cf /root/data/mongo_db $filepath

Problem: what if files are changing during the backup?

2nd try - copy before taring:

cp /root/data/mongo_db /root/data/backups/mongo_db
export filepath=/root/data/backups/mongo_db_backup_day.tar.gz

tar -cf /root/data/backups/tmp/mongo_db $filepath

Problem: Copying is faster then archiving, but still - files could change during the copy.

3rd try - 2-pass rsync

rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/
rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/
export filepath=/root/data/backups/mongo_db_backup_day.tar.gz

tar -cf /root/data/backups/tmp/mongo_db $filepath

This looks way better!
We do 2-pass rsync copying.
We do rsync twice - first we copy all files, and on second pass we copy only changes files from the first. rsync syncs (copies) only changed files.

Feature Request - Saving Latest X Backups

We would like to save latest X backups, eg latest 3 backups; how shall we do this?
One way, is to do aws s3 ls and delete old backups;
However, we want quick and dirty solution; so we take the current day and module 3. This way we will rotate backup "shard" every day.

# DAY_MOD is day of the year module 3
export DAY_MOD=$(( $(date +%j) % 3 ))
export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz

tar -cf /root/data/backups/tmp/mongo_db $filepath

Feature Request - Speed of Backup?

For this, we can parallize the process. We could use parallel command, but seems like pigz is better;
Let's limit also number of cpus to 10 (we could also choose nproc//2, or nproc-1 for that matter).

export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz

tar -c /root/data/backups/tmp/mongo_db | pigz -p 10 > $filepath

Feature request - Application Layer Backup

This could be done using pg_dump, or using mongodump.

It's really 2 minutes talking to Claude, to get Docker command like this:

docker run --rm --network=host \
  -v $(dirname $BACKUP_PATH):/backup \
  mongo:7.0.15-jammy \
  mongodump --host=localhost --port=27090 \
  --username="$MONGO_USERNAME" \
  --password="$MONGO_PASSWORD" \
  --authenticationDatabase=admin \
  --archive=/backup/$(basename $BACKUP_PATH) --gzip

Q: Why double-rsync?
A: The first rsync copies most files. During this time, some files might change. The second rsync then efficiently copies only the files that changed during the first pass, resulting in a more consistent snapshot.

Q: Storage layer backup? Isn't this a problem?
A: Yes, it is; it will require using the exact same database version, eg the same Docker tag.

Q: What about differential backup?
A: For larger systems, this makes lots of sense to integrate CDC and do faster backup. However, for larger systems we might be using managed solutions already.