Speed up your backups by compressing and posting in parallel

Its pretty typical to have data stores that are several hundreds of GB’s in size and need to be posted offsite.

At weheartit our database is ~1/2 TB uncompressed and the old method of compressing and posting to S3 took 9 hours and rarely completed.

I was able to speed up this process and now it completes in < 1 hour.
53 minutes in fact.

For compression I used pbzip2 instead of gzip.

This is how I am using it along with percona’s xtrabackup.

innobackupex --user root --password $PASSWORD --slave-info --safe-slave-backup --stream=tar ./ | pbzip2 -f -p40 -c  > $BACKUPDIR/$FILENAME

The backup and compression only takes 32 minutes and compresses it from 432GB to 180GB

Next comes speeding up the transfer to S3.

In November of 2010 amazon added this feature to S3 but for some reason this functionality hasn’t been added to s3cmd.
Instead I am using s3 multipart upload
Thanks David Arther!

This is how I am using it.

/usr/local/bin/s3-mp-upload.py --num-processes 40 -s 250 $BACKUPDIR/$FILENAME $BUCKET/$(hostname)/$TIME/$FILENAME

It only takes 20 minutes to copy 180GB over the internet!
That is crazy fast.
In both cases you can play around with the number of threads for both pbzip2 and s3 multi part upload, the threads I use work for me but that depends on the size of your system.

One thought on “Speed up your backups by compressing and posting in parallel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s