Its pretty typical to have data stores that are several hundreds of GB’s in size and need to be posted offsite.
At weheartit our database is ~1/2 TB uncompressed and the old method of compressing and posting to S3 took 9 hours and rarely completed.
I was able to speed up this process and now it completes in < 1 hour.
53 minutes in fact.
For compression I used pbzip2 instead of gzip.
This is how I am using it along with percona’s xtrabackup.
innobackupex --user root --password $PASSWORD --slave-info --safe-slave-backup --stream=tar ./ | pbzip2 -f -p40 -c > $BACKUPDIR/$FILENAME
The backup and compression only takes 32 minutes and compresses it from 432GB to 180GB
Next comes speeding up the transfer to S3.
In November of 2010 amazon added this feature to S3 but for some reason this functionality hasn’t been added to s3cmd.
Instead I am using s3 multipart upload
Thanks David Arther!
This is how I am using it.
/usr/local/bin/s3-mp-upload.py --num-processes 40 -s 250 $BACKUPDIR/$FILENAME $BUCKET/$(hostname)/$TIME/$FILENAME
It only takes 20 minutes to copy 180GB over the internet!
That is crazy fast.
In both cases you can play around with the number of threads for both pbzip2 and s3 multi part upload, the threads I use work for me but that depends on the size of your system.
I work @ weheartit.com where we rely on MySQL.
I’ve seen very little published about mts and nothing from outside a lab so I decided to test it out.
The results weren’t good.
Our main database group has 4 active schema, is running 5.6.12 and when a slave gets our of sync its takes a while to catch back up to the master.
One of the most interesting features for MySQL 5.6 is multi threaded slaves.
Without this feature the sync speed is limited to a single thread running on a single core.
Before I start let me clear up this point about mts which is that this feature will only help if you are running more than 1 schema per host as each thread can only process one schema at a time.
That being said I went and upgraded one of my slaves to 5.6.12, restored an xtrabackup to it.
Then I added the following lines to the my.cnf and ran start slave.
slave_parallel_workers = 4
master_info_repository = TABLE
relay_log_info_repository = TABLE
Now I can just run show slave status\G and watch it catching up.
However once it was caught up I stopped replication on the mts host and single thread slave for 20 minutes.
Then I started the slaves and it turns out that the single threaded slave caught up faster.
to eliminate disk and RAID configs( they were the same as I could tell) this next time I only stopped the sql_thread for 20 minutes.
Same results, the slave running mts is actually slower.
Looks like there is reason when you search for this topic the only posts are from people using it in a lab is because although it appears to function it doesn’t delivery what ultimately need to which is faster replication syncs.
I’ll keep watching the mysql releases and hope this gets fixed soon.
Rackspace offers 2 types of block storage:
Standard (SATA) @ $015/GBandHigh-Performance (SSD) @ $0.70/GB
Seeing that the SSD storage is 4x the cost of SATA I decided to see if the performance is also 4x.
An 8GB(RAM) system running ubuntu 12.04&2 100GB volumes with the xfs file system mounted with the following options:
/dev/xvdb /fast xfs noatime,nodiratime,allocsize=512m 0 0 /dev/xvdd /slow xfs noatime,nodiratime,allocsize=512m 0 0
Basic test using dd:
I’ve benchmarked lots of storage systems in the past and I always like to start out with dd.I do this because it doesn’t take anytime to set up and should give you some idea of how it performs.
In this test I create a 20GB file on each mounted filesystem using the following command:
dd if=/dev/zero of=10GB.file bs=1M count=20k
The results are a little surprising:
Volume write performance:
standard 105 MB/shigh-performance 103 MB/sthe hosts's own volume 135 MB/s
Wow, not what are were hoping for.I ran this test several times and the “high-performance” storage was always the slowest.To quote Homer Simpson “Doh!!”
I ran bonnie with the following args, basically I specified double the amount of RAM for the test.
For sequential reads and writes they were about the same, this is expected as dd already showed this:
Volume sequential reads sequential writes standard 95981/sec 16564/sec high-performance 95927/sec 15633/sec localVM 108463/sec 1208/sec
The results now show where the high-performance excels which is random seeks.
Volume random seeks standard 473.4/sec high-performance 1969/sec localVM 578.6/sec
The question was:Does the 4x cost of high-performance storage perform 4x?
The answer is yes.
Nice job rackspace.
However, as with the sequential numbers from above it doesn’t always out perform standard or local disk. So before you decide to use the more expensive option benchmark your application on it.
We are in the midst of upgrading from 5.3.10 to 5.4 and couldn’t find a debian package for it.The current stable version is 5.4.10 but this changes often and I wanted to automate the compiling and packaging process.First thanks to Jordan Sissel who is a bad ass sysadmin/developer and who wrote fpm which I’m using here to create the debian package.
The end result is a script that will install prerequisite pacakges, download, compile and package which ever php version you specify.
The basic process is:
1 install the prerequisite packages needed to compile
2 download the php source
6 make install but do this while changing its destination directory
7 create the package
Step 6. is where php got a little tricky.In the fpm wiki page which describes how to package something that uses make install ( https://github.com/jordansissel/fpm/wiki/PackageMakeInstall )It has you changing the destination directory in the make install process by specifying:
make install DESTDIR=/tmp/installdir
However this didn’t work with php, instead I had to specify the install_dir:
INSTALL_ROOT=/tmp/installdir make install
FPM is really simply to use, also because its a ruby gem is easy to install.To create the package I’m using the following command:
fpm -s dir -t deb -n php -v 5.4.10 -p "libmcrypt-dev" -p "libt1-dev" --deb-user root -p php-VERSION_ARCH.deb --description 'php 5.4.10 compiled for apache2' -C /tmp/installdir etc usr
Its pretty self explaintory but a few things I’ll point out are “-p” which are packages that are dependancies, and “etc” & “usr” are the sub directories to /tmp/installdir which you want packaged up.
Script to download, compile and package php
php 5.4.10 debian package for ubuntu 12.04
I take ownership of a companies infrastructure.
To manage it I write software.
My language of choice is ruby and the framework is chef.
Along with these skills I also bring an expertise in many technologies.
MySQL, apache, Linux, mongodb, data centers, cloud providers, etc……..
I also pair with engineering to streamline processes such as deployments, metrics and performance, scaling, security and continuous integration and testing.
I started using rackspace managed hosting back in 2005. This is before the “cloud” so they were all physical hosts.
Since then I’ve used them for a mix of cloud and physical for several different companies.
I have also build data centers from the ground up and managed a lot of services on AWS( amazon web services).
After being a customer of rackspace over the last seven years I have formed the following opinion.
1. Their service blows
If you think the cloud is “scary” and mysql gives you the willies then their support might seem like wizards but in reality they are not.
I found out from an inside source that many of their support team are high school graduates who are run through a boot camp.
I have had countless dealings with support where I found zero value in the exchange.
This isn’t true of their whole team of course, at some point you might get the contact information of someone who can actually help you.
When you do hold on to it.
A great example of this is their “managed mysql”, what it amounts to is mysql installed on a supposedly faster file system and backed up.
Of course its tuned which means the innodb_buffer_size is changed per RAM size.
If you really have a problem with MySQL no way are the bootcamp kids going to be of much use to you.
2. They have high uptime
In both cases of physical and cloud rackspace has very good uptime.
Especially in comparison to EC2, and I’m not just talking about the big EC2 outages I’m talking about the day to day.
Its pretty common in EC2 to have a server freeze up and need to be rebooted or have it reboot on its own.
Of course when you decide to move to the cloud this is something you need to plan for but in the case of rackspace I can only think of a few times when one of my cloud instances ever went offline unexpectedly.
Every place I’ve ever worked had cronjobs running all over the place.Some are simple tasks like clearing out a temp directory.Others end up being a critical piece of the infrastructure that a developer wrote with out telling anyone about.I like to call this type of scheduled job the glue as its usually holding your company together.
True story I once found a cronjob running on a cluster of 200 servers named brett.sh that restarted an app every 30 seconds!!
In most cases the “glue” cronjob is unknown to anyone as to where the job runs, how often and most importanlty when it fails.There are a few tools out there to put all of your scheduled jobs in one spot and will take actions on failure.Some of those include opswise (http://www.opswise.com/) which I’ve used in the past and had a lot of success with and Amazon’s Simple Workflow Service (http://aws.amazon.com/swf/) which I haven’t used yet.
There is also an opensource project sponsered by yelp called tron which does most of this already except for notifying when it fails.BTW there is a feature request for this already, ( https://github.com/Yelp/Tron/issues/25 )
Anyway as a quick work around I just add a check for the exit code in my crontab which will alert me if the job doesn’t exit zero.
1 0 * * * touch /home/dodell/foobar|| if [ $? -ne 0 ] ; then mail -s 'touch_file failed' firstname.lastname@example.org < /etc/hostname ;exit 1