Mario Finelli Blog
5 April 2020
It's been a long time since I managed a PostgreSQL cluster; these days I prefer to use a managed service which almost every cloud provider is offering. One of the benefits of a managed database service is that backups are automated and the responsibility of the service provider. However, these backups are not usually exposed to you and you can't take them with you to another provider leading to vendor lock-in or potential issues should your provider have other issues. To solve this problem I also run my own automated backups on a less frequent schedule than the vendor. Should there be a need to restore from a backup within the same cloud I can use the normal backups with minimal data loss. Should there be a prolonged issue, or some other reason that would necessitate switching providers without being able to access the original, running database I can use my own logical backups to spin up a new database on my own or using a different provider.
You could also send the logical backups to storage in the same cloud provider but for me that defeats the purpose. I send my backups to a bucket on Google Cloud Storage. I use the standard pg_dump
tool to make logical backups, so I am maintaining a docker container that installs the Google gsutil
program into the official Postgres image. You can find the source on GitHub and the image on Docker Hub.
I also like to encrypt the backups client side using symmetric encryption (i.e., a passphrase) with gnupg
. You'll want to load in the passphrase into your cluster's secrets ahead of time. For Google backups (if you're not running on GKE) you'll also need to create a service account and load the configuration json file into the secrets as well. Finally, the way I have my application setup consumes the Postgres connection information as a URI so you'll also need that in the secret store. I run the backup cronjob in the same namespace as each application that has a database that should be backed up, but this is not strictly required.
Given the service account json in a file backups.json
and the passphrase in a file backups.key
you can run kubectl --n=app create secret generic backups --from-file=backups.json
and $ kubectl -n=app create secret generic backup --from-file=backup.key
Here's the cronjob configuration:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: pgbackup
namespace: app
spec:
schedule: "0 6 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: pgbackup
image: mfinelli/pgbackup
command: [/bin/sh]
args:
- -c
- >-
gcloud auth activate-service-account
--key-file $GOOGLE_APPLICATION_CREDENTIALS &&
pg_dump $DATABASE_URL --clean --create --format=plain
--column-inserts --if-exists --quote-all-identifiers |
gzip --best |
gpg --armor --cipher-algo AES256 --symmetric --yes --batch
--passphrase="$BACKUP_PASSPHRASE" |
gsutil cp -a project-private -
gs://YOUR-PG-BACKUP-BUCKET/$(date +%Y%m%d%H%M%S).sql.gz.asc
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /root/backups.json
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: postgres
key: postgres
- name: BACKUP_PASSPHRASE
valueFrom:
secretKeyRef:
name: backup
key: backup.key
volumeMounts:
- mountPath: /root/backups.json
subPath: backups.json
name: backups
restartPolicy: Never
volumes:
- name: backups
secret:
secretName: backups
The YAML syntax makes it a little hard to understand exactly what the command we're running is, here it is if we were going to run it from the command line.
gcloud auth activate-service-account \
--key-file $GOOGLE_APPLICATION_CREDENTIALS && \
pg_dump $DATABASE_URL --clean --create --format=plain --column-inserts \
--if-exists --quote-all-identifiers | \
gzip --best | \
gpg --armor --cipher-algo AES256 --symmetric --yes --batch \
--passphrase="$BACKUP_PASSPHRASE" | \
gsutil cp -a project-private - \
gs://YOUR-PG-BACKUP-BUCKET/$(date +%Y%m%d%H%M%S).sql.gz.asc
The first thing we do is authorize the gsutil
program with our service account credentials. Since the container is being spun up on each invocation there is no shared state and this needs to be done every time. Then we do a standard pg_dump
with all of our desired options which we then pipe to gzip
to compress it and then pipe that to gpg
to encrypt is as mentioned before and then finally, pipe it to gsutil
to stream directly to the storage bucket from stdin.
And that's it! Now you'll get nightly logical dumps of your database. Should it be necessary you could download one of the backups and play it into a new database:
gpg -d --batch --passphrase="$(cat backup.key)" ./path/to/file.gz.asc | \
gunzip | psql ...