Angelos Orfanakos

Painless PostgreSQL backups to Amazon S3

I recently wanted to have backups of a PostgreSQL database backing a Rails app.

Requirements

  1. Stored off-site (otherwise it wouldn’t really be a backup)
  2. Low transfer and storage cost
  3. Inacessible from the server, so that even if someone broke into the server, they wouldn’t be able to access the off-site location and mess with the backups
  4. Encrypted on the server before storing off-site (prevents Amazon or whoever might break into the bucket to read the data)
  5. Simple, with as few dependencies as possible
  6. Secure
  7. Automatic

Solution

For 1 and 2, I chose Amazon S3.

For 3, I created a write-only S3 bucket (how to do this is out of the scope of this article).

For 4, I used GnuPG with symmetric encryption (i.e. a passphrase for both encryption and decryption, instead of a public-private keypair).

For 5, I wrote a custom shell script that uses s4cmd to handle the upload.

For 6, I hardcoded all credentials in the script file, used environment variables as little as possible and avoided passing passwords as command-line arguments.

For 7, I run the script daily with cron.

Script

Important security points:

  • Before writing your credentials to this file, make sure it is not readable by anyone else by setting its permissions: chmod go-rwx database-backup.sh
  • Make sure the editor you use does not keep temporary files of things you type (e.g. for undo history)
  • Either write this file in the server directly or in your workstation and transfer it securely with SSH.
#!/usr/bin/env bash

# ---- BEGIN configuration ----
database_host=localhost
database_port=5432
database_username=app
database_password=
database_name=app_production
database_backup_passphrase=
database_backup_s3_access_key=
database_backup_s3_secret_key=
database_backup_s3_bucket_name=
# ---- END configuration -----

dump_file="${database_name}-$(date --utc +%FT%TZ).dump"
dump_file_gpg=$dump_file.gpg

trap "{ rm -f $dump_file; rm -f $dump_file_gpg; }" EXIT

PGPASSWORD=$database_password \
  pg_dump \
  -Fc --no-acl \
  -h $database_host \
  -p $database_port \
  -U $database_username \
  -f $dump_file $database_name

echo $database_backup_passphrase | \
  gpg -c --cipher-algo AES256 --no-symkey-cache --batch --passphrase-fd 0 \
  -o $dump_file_gpg $dump_file

S3_ACCESS_KEY=$database_backup_s3_access_key S3_SECRET_KEY=$database_backup_s3_secret_key \
  s4cmd \
  --sync-check --retry 3 --retry-delay 120 \
  put $dump_file_gpg s3://$database_backup_s3_bucket_name/

Crontab

Issue crontab -e and write:

0 3 * * * bash /usr/local/bin/database-backup.sh

This will run the backup once per day at 3 AM.

Does it work?

You should always test the recovery of your backup system when you first set it up.

Ensure backup files appear in your S3 bucket shortly after cron runs.

Also, try decrypting a backup file:

gpg -d app_production-2020-04-20T20:31:34Z.dump.gpg

And restoring the resulting database dump file to a local, empty, development database:

pg_restore -h $database_host -d $database_name -U $database_user app_production-2020-04-20T20:31:34Z.dump

Run your app and make sure the data looks correct.

Next steps

I feel like the only thing missing is some kind of notification if something goes wrong and backups fail. AWS has a notification service but I haven’t got around to making it work yet (ironically, it’s called “Simple Notification Service”).