automated MySQL query reports

Back in the day I had automated query reports for MySQL using a perl library.
This worked okay but it only reported on the slow queries and also I would have to install a bunch of icky perl stuff.

Percona’s pt-query-digest is a much better tool and when you combine it with tcpdump you get an analysis of all your queries not just the slow ones.

When writing this script I had to solve two problems.

1. run tcpdump for a specific amount of time

I was prepared to write a loop with a sleep statement and then figure out how to kill tcpdump but I didn’t need to.
Instead I just used timeout which was already installed on ubuntu.

2. How to email the resulting report as an attachment.

When sending emails I usually just use mail but I couldn’t figure out how send an attachment.
Instead I found mutt.

BTW for an extra challenge I decided to write this in bash, loops in bash are really ugly for the record.

The script:

#!/bin/bash
hn=`hostname`
# mutt won't send the mail from the command line without prompting you 
# for the bodies content. To work around I am using a empty file as the body.
touch /tmp/blank
queries=( SELECT INSERT UPDATE )
for i in "${queries[@]}"
do
   :
# To clean up the tcpdump you have to include the pipe to sed
        /usr/bin/timeout 180 /usr/sbin/tcpdump -s0 -A -i bond0 dst port 3306 | /usr/bin/strings | /bin/grep $i | /bin/sed 's/^.*$i/$i/' > /tmp/$i.log
        /usr/bin/pt-query-digest --type rawlog /tmp/$i.log > /tmp/$i.txt
        /usr/bin/mutt -s "$i report from $hn" xxxx@weheartit.com  -a /tmp/$i.txt </tmp/blank
done

Speed up your backups by compressing and posting in parallel

Its pretty typical to have data stores that are several hundreds of GB’s in size and need to be posted offsite.

At weheartit our database is ~1/2 TB uncompressed and the old method of compressing and posting to S3 took 9 hours and rarely completed.

I was able to speed up this process and now it completes in < 1 hour.
53 minutes in fact.

For compression I used pbzip2 instead of gzip.

This is how I am using it along with percona’s xtrabackup.

innobackupex --user root --password $PASSWORD --slave-info --safe-slave-backup --stream=tar ./ | pbzip2 -f -p40 -c  > $BACKUPDIR/$FILENAME

The backup and compression only takes 32 minutes and compresses it from 432GB to 180GB

Next comes speeding up the transfer to S3.

In November of 2010 amazon added this feature to S3 but for some reason this functionality hasn’t been added to s3cmd.
Instead I am using s3 multipart upload
Thanks David Arther!

This is how I am using it.

/usr/local/bin/s3-mp-upload.py --num-processes 40 -s 250 $BACKUPDIR/$FILENAME $BUCKET/$(hostname)/$TIME/$FILENAME

It only takes 20 minutes to copy 180GB over the internet!
That is crazy fast.
In both cases you can play around with the number of threads for both pbzip2 and s3 multi part upload, the threads I use work for me but that depends on the size of your system.