DIY Database Backup - quick and dirty backup using rsync and s3
Let's say you have a local database (DynamoDB / Postgresql) for a local project; it's not production yet, so no need for RDS or alike. However, you would still want to backup this database. How? Setup MongoDB 7.0 in a Docker container exposed on port 27090 PostgreSQL with pgvector in a custom Docker container exposed on port 54032 Data directories mounted from the host at: /root/data/mongodb for MongoDB /root/data/postgresql for PostgreSQL Storage Layer VS Application Layer Storage Layer Backups: Direct copying of database files Application Layer Backups: Using database-specific tools like mongodump and pg_dump 1st try - tar datafiles: export filepath=/root/data/backups/mongo_db_backup_day.tar.gz tar -cf /root/data/mongo_db $filepath Problem: what if files are changing during the backup? 2nd try - copy before taring: cp /root/data/mongo_db /root/data/backups/mongo_db export filepath=/root/data/backups/mongo_db_backup_day.tar.gz tar -cf /root/data/backups/tmp/mongo_db $filepath Problem: Copying is faster then archiving, but still - files could change during the copy. 3rd try - 2-pass rsync rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/ rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/ export filepath=/root/data/backups/mongo_db_backup_day.tar.gz tar -cf /root/data/backups/tmp/mongo_db $filepath This looks way better! We do 2-pass rsync copying. We do rsync twice - first we copy all files, and on second pass we copy only changes files from the first. rsync syncs (copies) only changed files. Feature Request - Saving Latest X Backups We would like to save latest X backups, eg latest 3 backups; how shall we do this? One way, is to do aws s3 ls and delete old backups; However, we want quick and dirty solution; so we take the current day and module 3. This way we will rotate backup "shard" every day. # DAY_MOD is day of the year module 3 export DAY_MOD=$(( $(date +%j) % 3 )) export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz tar -cf /root/data/backups/tmp/mongo_db $filepath Feature Request - Speed of Backup? For this, we can parallize the process. We could use parallel command, but seems like pigz is better; Let's limit also number of cpus to 10 (we could also choose nproc//2, or nproc-1 for that matter). export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz tar -c /root/data/backups/tmp/mongo_db | pigz -p 10 > $filepath Feature request - Application Layer Backup This could be done using pg_dump, or using mongodump. It's really 2 minutes talking to Claude, to get Docker command like this: docker run --rm --network=host \ -v $(dirname $BACKUP_PATH):/backup \ mongo:7.0.15-jammy \ mongodump --host=localhost --port=27090 \ --username="$MONGO_USERNAME" \ --password="$MONGO_PASSWORD" \ --authenticationDatabase=admin \ --archive=/backup/$(basename $BACKUP_PATH) --gzip Q: Why double-rsync? A: The first rsync copies most files. During this time, some files might change. The second rsync then efficiently copies only the files that changed during the first pass, resulting in a more consistent snapshot. Q: Storage layer backup? Isn't this a problem? A: Yes, it is; it will require using the exact same database version, eg the same Docker tag. Q: What about differential backup? A: For larger systems, this makes lots of sense to integrate CDC and do faster backup. However, for larger systems we might be using managed solutions already.

Let's say you have a local database (DynamoDB / Postgresql) for a local project; it's not production yet, so no need for RDS or alike.
However, you would still want to backup this database. How?
Setup
- MongoDB 7.0 in a Docker container exposed on port 27090
- PostgreSQL with pgvector in a custom Docker container exposed on port 54032
Data directories mounted from the host at:
/root/data/mongodb
for MongoDB
/root/data/postgresql
for PostgreSQL
Storage Layer VS Application Layer
Storage Layer Backups: Direct copying of database files
Application Layer Backups: Using database-specific tools like mongodump and pg_dump
1st try - tar datafiles:
export filepath=/root/data/backups/mongo_db_backup_day.tar.gz
tar -cf /root/data/mongo_db $filepath
Problem: what if files are changing during the backup?
2nd try - copy before taring:
cp /root/data/mongo_db /root/data/backups/mongo_db
export filepath=/root/data/backups/mongo_db_backup_day.tar.gz
tar -cf /root/data/backups/tmp/mongo_db $filepath
Problem: Copying is faster then archiving, but still - files could change during the copy.
3rd try - 2-pass rsync
rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/
rsync -av /root/data/mongodb/ /root/data/backups/tmp/mongo_db/
export filepath=/root/data/backups/mongo_db_backup_day.tar.gz
tar -cf /root/data/backups/tmp/mongo_db $filepath
This looks way better!
We do 2-pass rsync copying.
We do rsync twice - first we copy all files, and on second pass we copy only changes files from the first. rsync syncs (copies) only changed files.
Feature Request - Saving Latest X Backups
We would like to save latest X backups, eg latest 3 backups; how shall we do this?
One way, is to do aws s3 ls
and delete old backups;
However, we want quick and dirty solution; so we take the current day and module 3. This way we will rotate backup "shard" every day.
# DAY_MOD is day of the year module 3
export DAY_MOD=$(( $(date +%j) % 3 ))
export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz
tar -cf /root/data/backups/tmp/mongo_db $filepath
Feature Request - Speed of Backup?
For this, we can parallize the process. We could use parallel
command, but seems like pigz
is better;
Let's limit also number of cpus to 10 (we could also choose nproc//2, or nproc-1 for that matter).
export filepath=/root/data/backups/mongo_db_backup_day${DAY_MOD}.tar.gz
tar -c /root/data/backups/tmp/mongo_db | pigz -p 10 > $filepath
Feature request - Application Layer Backup
This could be done using pg_dump, or using mongodump.
It's really 2 minutes talking to Claude, to get Docker command like this:
docker run --rm --network=host \
-v $(dirname $BACKUP_PATH):/backup \
mongo:7.0.15-jammy \
mongodump --host=localhost --port=27090 \
--username="$MONGO_USERNAME" \
--password="$MONGO_PASSWORD" \
--authenticationDatabase=admin \
--archive=/backup/$(basename $BACKUP_PATH) --gzip
Q: Why double-rsync?
A: The first rsync copies most files. During this time, some files might change. The second rsync then efficiently copies only the files that changed during the first pass, resulting in a more consistent snapshot.
Q: Storage layer backup? Isn't this a problem?
A: Yes, it is; it will require using the exact same database version, eg the same Docker tag.
Q: What about differential backup?
A: For larger systems, this makes lots of sense to integrate CDC and do faster backup. However, for larger systems we might be using managed solutions already.