Angelos Orfanakos

Painless PostgreSQL backups to Amazon S3

I recently wanted to have backups of a PostgreSQL database running on a server that backs a Rails app.

Requirements & solutions

Stored off-site

Otherwise it wouldn’t really be a backup.

For this, I use Amazon S3.

Low transfer and storage cost

Again, Amazon S3 transfer and storage costs are reasonable in this context.

Inacessible from the server

So that even if someone broke into the server, they wouldn’t be able to access the off-site location and mess with the backups.

For this, I created a write-only Amazon S3 bucket (how to do is beyond the scope of this article)

Encrypted on the server

This prevents Amazon or whoever might break into the bucket to read the data.

For this, I use GnuPG with symmetric encryption (i.e. a passphrase for both encryption and decryption, instead of a public-private keypair)

Simple, with as few dependencies as possible

I wrote a custom shell script that uses s4cmd to handle the upload.

Secure

I hardcoded all credentials in the script file (readable only by my user), used environment variables as little as possible and avoided passing passwords as command-line arguments.

Automatic

I run the script daily with cron.

Script

Important security points:

  • Before writing your credentials to this file, make sure it is not readable by anyone else by setting its permissions: chmod go-rwx database-backup.sh
  • Make sure the editor you use does not keep temporary files of things you type (e.g. for undo history)
  • Either write this file in the server directly or in your workstation and transfer it securely with SSH.
#!/usr/bin/env bash

# ---- BEGIN configuration ----
database_host=localhost
database_port=5432
database_username=app
database_password=
database_name=app_production
database_backup_passphrase=
database_backup_s3_access_key=
database_backup_s3_secret_key=
database_backup_s3_bucket_name=
# ---- END configuration -----

dump_file="${database_name}-$(date --utc +%FT%TZ).dump"
dump_file_gpg=$dump_file.gpg

trap "{ rm -f $dump_file; rm -f $dump_file_gpg; }" EXIT

PGPASSWORD=$database_password \
  pg_dump \
  -Fc --no-acl \
  -h $database_host \
  -p $database_port \
  -U $database_username \
  -f $dump_file $database_name

echo $database_backup_passphrase | \
  gpg -c --cipher-algo AES256 --no-symkey-cache --batch --passphrase-fd 0 \
  -o $dump_file_gpg $dump_file

S3_ACCESS_KEY=$database_backup_s3_access_key S3_SECRET_KEY=$database_backup_s3_secret_key \
  s4cmd \
  --sync-check --retry 3 --retry-delay 120 \
  put $dump_file_gpg s3://$database_backup_s3_bucket_name/

Crontab

Issue crontab -e and write:

0 3 * * * bash /usr/local/bin/database-backup.sh

This will run the backup once per day at 3 AM.

Does it work?

One should regularly test the recovery of any backup system.

Ensure backup files appear in your S3 bucket shortly after cron runs.

Also, try decrypting a backup file:

gpg -d app_production-2020-04-20T20:31:34Z.dump.gpg >app_production-2020-04-20T20:31:34Z.dump

And restoring the resulting database dump file to a local, empty, development database:

pg_restore -h $database_host -d $database_name -U $database_user app_production-2020-04-20T20:31:34Z.dump

Run your app and make sure the data looks correct.

Next steps

I feel like the only thing missing is some kind of notification if something goes wrong and backups fail. AWS has a notification service but I haven’t got around to making it work yet (ironically, it’s called “Simple Notification Service”).