Its pretty typical to have data stores that are several hundreds of GB’s in size and need to be posted offsite.
At weheartit our database is ~1/2 TB uncompressed and the old method of compressing and posting to S3 took 9 hours and rarely completed.
I was able to speed up this process and now it completes in < 1 hour.
53 minutes in fact.
For compression I used pbzip2 instead of gzip.
This is how I am using it along with percona’s xtrabackup.
innobackupex --user root --password $PASSWORD --slave-info --safe-slave-backup --stream=tar ./ | pbzip2 -f -p40 -c > $BACKUPDIR/$FILENAME
The backup and compression only takes 32 minutes and compresses it from 432GB to 180GB
Next comes speeding up the transfer to S3.
In November of 2010 amazon added this feature to S3 but for some reason this functionality hasn’t been added to s3cmd.
Instead I am using s3 multipart upload
Thanks David Arther!
This is how I am using it.
/usr/local/bin/s3-mp-upload.py --num-processes 40 -s 250 $BACKUPDIR/$FILENAME $BUCKET/$(hostname)/$TIME/$FILENAME
It only takes 20 minutes to copy 180GB over the internet!
That is crazy fast.
In both cases you can play around with the number of threads for both pbzip2 and s3 multi part upload, the threads I use work for me but that depends on the size of your system.