Using chef to automate internal hostnames in a VPC


Typically cloud servers have both an internal and external ip address.

But with VPC’s(virtual private clouds) many companies are deciding to have a few hosts proxy ports 22, 80 & 443 and the other hosts have internal ip’s only.

In both scenarios having an DNS for internal hostname’s is important but in the case of VPC its critical.

One was of doing this is to simply create a DNS entry for each interface.


 But it doesn’t make sense having an name/ip of an internal server available on a public DNS server.

 What you can do instead is have chef set the hostnames, populate those names/ips in the /etc/hosts files and convince your OS to use that file when performing lookups.

 Setting the hostname.

I usually go with purpose.abreviated_company_name


 This is done in chef with this a recipe like this:

execute "set_hostname" do
 command "/bin/hostname -F /etc/hostname"
 user "root"
 action :nothing

template /etc/hostname do
 source "hostname.erb"
 notifies :run, execute[set_hostname]

And a template like this:

<%= node['hostname']%>.gyk


Ok now we have chef setting the correct hostname but hold on, after a while or a reboot (on EC2 at least) the name goes away.

This is because of this obscure setting in /etc/cloud/cloud.cfg”

# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false
If this is set to true then it will not wipe out the hostname you set.

I fixed this with my old friend sed.

execute "preserve_hostname" do
  command "sed -i  's/preserve_hostname\:\ false/preserve_hostname\:\ true/' /etc/cloud/cloud.cfg"
  not_if "grep 'preserve_hostname: true' /etc/cloud/cloud.cfg"
  user "root"

Okay so now we have a hostname that sticks, next step…

Populating /etc/hosts will all internal hosts.

 To do this I used the hostsfile cookbook (

 Basically I search chef for all hosts in my environment and write the hostname and ip to the file.

addresses = search(:node, "chef_environment:#{node.chef_environment}")
addresses.each do |dd|
  if dd.has_key?("ipaddress")
    hostsfile_entry dd.ipaddress do
      hostname  dd.hostname + “.gyk"
      action    :create
      unique    true


Get the OS to use the /etc/hosts file for lookups.


Back in the day all I have to do to get /etc/hosts to work was set this in the nsswitch.conf

hosts:          files dns

 At this point without the following step pings and telnet requests to the hosts will work but loopups(nsloopup, host, dig etc.,..) won’t

Yeah super confusing, I won’t go into the science of it but the basic reason is the commands use different libraries when performing lookups.

 However after reading and changing a lot of files with out any success I fixed the problem by installing the dnsmasq package.

Not sure what else this package does but I know it allows your /etc/hosts file to actually be useful.

Now you have a fully automated population of internal hostnames…no DNS entries needed!

centralized custom logging with rsyslog

There are a lot more robust centralized logging solutions out there but during a recent hack day I had about an hour to get logs from 24 servers to a centralized one for processing.

First you will need to get centralized logging set up, I won’t go into that here as that step is well documented but a quick view is to:

1. Configure the server to listen on udp.

2. add this line to the bottom of the rsyslog.conf file

*.* @$IP_ADDRESS:514

Okay so your servers are sending its system logs to a centralized host, now the fun part.

First you will need to create a config file for the log and place it in the /etc/rsyslog.d directory.

In my case I am going to ship a log named event.log

$InputFileName /srv/whi/shared/log/event.log
$InputFileTag eventlog
$InputFileStateFile eventlog
$InputFilePersistStateInterval 10
*.* @$IP__ADDRESS:514

Now add a config to the server to write the log to its own file.

$template ProxiesTemplate,"%msg%\n"
if $programname == 'eventlog' and $msg contains 'viewed.entr'  then /var/log/eventlog.log;ProxiesTemplate

In the first line above I am stripping the log or the timestamp and the hostname, I am only interested in the body .
The second line I am matching the program name and a particular string in the body of the message and writing them to a specific log file.

The last thing you need to do it put in a an exception to the current messages and syslog file otherwise these custom logs will also end up there.

In ubuntu I had to edit the file /etc/rsyslog.d/50-default.conf

I basically had to add this string “event.none” to the lines for syslog and messages

*.*;auth,authpriv.none,event.none    -/var/log/syslog

*.=info;*.=notice;*.=warn;auth,authpriv.none;cron,daemon.none;mail,news.none,event.none    -/var/log/messages

Thats basically it, enjoy.

if you attempt to run sudo pkill you might go insane

Lately sidekiq has been leaving a lot of processes around in the stopping state.
My coworker asked me if there as a command to kill them all.
Of course with linux there are a lot of tools to perform this.
In this case however we had to kill the process based on the long listing of it.
ie instead of killall ruby we had to match an argument to the process

sidekiq 2.17.3 whi [0 of 6 busy] stopping

The key was matching “stopping”, which would leave the other sidekiq processes running.

I used:

pkill -f stopping

and this worked perfect the first time.
I go to another box and it won’t work.
The command sudo -f stopping  does nothing, no error but the processes don’t die.
I upgrade the package, read the man page, search the internet.
Still nothing.
Am I going insane? Did I forget everything I know about linux?

Then I become root and run the command…….it works.
The difference was sudo vs. being root.
Of course when you sudo the command there is no warning and the man page doesn’t contain the word sudo.

So word to the wise, when in doubt become root!


custom ohai plugins for bonded interfaces

We use bonded interfaces on our production hardware.
But only on production hardware, staging and dev just use the ethX interfaces.
So we needed a way for chef to identify the public & private interfaces regardless of whether they are bonded or not.

To start with I pulled down the ohai cookbook and added a few scripts to the plugins directory.
Thats all there is.
The two plugins below identify the public and private interfaces as either being eth0 || bond0 & eth1 || bond1


 provides "private_interface"
cmd = '/sbin/ifconfig bond0'
if $? == 0
  private_interface "bond0"
  private_interface "eth0"


provides "public_interface"
cmd = '/sbin/ifconfig bond1'
if $? == 0
  public_interface "bond1"
  public_interface "eth1"

From the chef ui you can see that public_interface and private_interface are now listed on the top level for a node.

This allows me to specify in a template/recipe to use the public or private interface, ohai automatically discovers what the interface actually is.
Example from a recipe for ufw:

firewall_rule "http-internal" do
        port 8098
        action :allow
        interface node['private_interface']
        notifies :enable, "firewall[ufw]"

automated MySQL query reports

Back in the day I had automated query reports for MySQL using a perl library.
This worked okay but it only reported on the slow queries and also I would have to install a bunch of icky perl stuff.

Percona’s pt-query-digest is a much better tool and when you combine it with tcpdump you get an analysis of all your queries not just the slow ones.

When writing this script I had to solve two problems.

1. run tcpdump for a specific amount of time

I was prepared to write a loop with a sleep statement and then figure out how to kill tcpdump but I didn’t need to.
Instead I just used timeout which was already installed on ubuntu.

2. How to email the resulting report as an attachment.

When sending emails I usually just use mail but I couldn’t figure out how send an attachment.
Instead I found mutt.

BTW for an extra challenge I decided to write this in bash, loops in bash are really ugly for the record.

The script:

# mutt won't send the mail from the command line without prompting you 
# for the bodies content. To work around I am using a empty file as the body.
touch /tmp/blank
for i in "${queries[@]}"
# To clean up the tcpdump you have to include the pipe to sed
        /usr/bin/timeout 180 /usr/sbin/tcpdump -s0 -A -i bond0 dst port 3306 | /usr/bin/strings | /bin/grep $i | /bin/sed 's/^.*$i/$i/' &gt; /tmp/$i.log
        /usr/bin/pt-query-digest --type rawlog /tmp/$i.log &gt; /tmp/$i.txt
        /usr/bin/mutt -s "$i report from $hn"  -a /tmp/$i.txt &lt;/tmp/blank

Speed up your backups by compressing and posting in parallel

Its pretty typical to have data stores that are several hundreds of GB’s in size and need to be posted offsite.

At weheartit our database is ~1/2 TB uncompressed and the old method of compressing and posting to S3 took 9 hours and rarely completed.

I was able to speed up this process and now it completes in < 1 hour.
53 minutes in fact.

For compression I used pbzip2 instead of gzip.

This is how I am using it along with percona’s xtrabackup.

innobackupex --user root --password $PASSWORD --slave-info --safe-slave-backup --stream=tar ./ | pbzip2 -f -p40 -c  > $BACKUPDIR/$FILENAME

The backup and compression only takes 32 minutes and compresses it from 432GB to 180GB

Next comes speeding up the transfer to S3.

In November of 2010 amazon added this feature to S3 but for some reason this functionality hasn’t been added to s3cmd.
Instead I am using s3 multipart upload
Thanks David Arther!

This is how I am using it.

/usr/local/bin/ --num-processes 40 -s 250 $BACKUPDIR/$FILENAME $BUCKET/$(hostname)/$TIME/$FILENAME

It only takes 20 minutes to copy 180GB over the internet!
That is crazy fast.
In both cases you can play around with the number of threads for both pbzip2 and s3 multi part upload, the threads I use work for me but that depends on the size of your system.

mysql multi threaded slaves (mts) slower than single threaded

I work @ where we rely on MySQL.
I’ve seen very little published about mts and nothing from outside a lab so I decided to test it out.
The results weren’t good.

Our main database group has 4 active schema, is running 5.6.12 and when a slave gets our of sync its takes a while to catch back up to the master.

One of the most interesting features for MySQL 5.6 is multi threaded slaves.

Without this feature the sync speed is limited to a single thread running on a single core.

Before I start let me clear up this point about mts which is that this feature will only help if you are running more than 1 schema per host as each thread can only process one schema at a time.

That being said I went and upgraded one of my slaves to 5.6.12, restored an xtrabackup to it.

Then I added the following lines to the my.cnf and ran start slave.

 slave_parallel_workers = 4
 master_info_repository = TABLE
 relay_log_info_repository = TABLE

Now I can just run show slave status\G and watch it catching up.

However once it was caught up I stopped replication on the mts host and single thread slave for 20 minutes.
Then I started the slaves and it turns out that the single threaded slave caught up faster.
to eliminate disk and RAID configs( they were the same as I could tell) this next time I only stopped the sql_thread for 20 minutes.
Same results, the slave running mts is actually slower.

Looks like there is reason when you search for this topic the only posts are from people using it in a lab is because although it appears to function it doesn’t delivery what ultimately need to which is faster replication syncs.

I’ll keep watching the mysql releases and hope this gets fixed soon.

Benchmark: Rackspace’s block storage SATA vs. SSD vs. VM disk

Rackspace offers 2 types of block storage:

Standard (SATA) @ $015/GBandHigh-Performance (SSD) @ $0.70/GB

Seeing that the SSD storage is 4x the cost of SATA I decided to see if the performance is also 4x.

Lets see.

The setup:

An 8GB(RAM) system running ubuntu 12.04&2 100GB volumes with the xfs file system mounted with the following options:

/dev/xvdb   /fast     xfs noatime,nodiratime,allocsize=512m   0   0  /dev/xvdd   /slow     xfs noatime,nodiratime,allocsize=512m   0   0

Basic test using dd:

I’ve benchmarked lots of storage systems in the past and I always like to start out with dd.I do this because it doesn’t take anytime to set up and should give you some idea of how it performs.

In this test I create a 20GB file on each mounted filesystem using the following command:

dd if=/dev/zero of=10GB.file bs=1M count=20k

The results are a little surprising:

Volume write performance:

standard            105 MB/shigh-performance        103 MB/sthe hosts's own volume      135 MB/s

Wow, not what are were hoping for.I ran this test several times and the “high-performance” storage was always the slowest.To quote Homer Simpson “Doh!!”


I ran bonnie with the following args, basically I specified double the amount of RAM for the test.

bonnie++ -s 16g

For sequential reads and writes they were about the same, this is expected as dd already showed this:

Volume                sequential reads            sequential writes  standard              95981/sec                   16564/sec  high-performance      95927/sec                   15633/sec  localVM               108463/sec                  1208/sec

The results now show where the high-performance excels which is random seeks.

Volume                random seeks  standard              473.4/sec  high-performance      1969/sec  localVM               578.6/sec


The question was:Does the 4x cost of high-performance storage perform 4x?

The answer is yes.

Nice job rackspace.

However, as with the sequential numbers from above it doesn’t always out perform standard or local disk. So before you decide to use the more expensive option benchmark your application on it.

Compiling and packaging php 5.4.10 for ubuntu 12.04

We are in the midst of upgrading from 5.3.10 to 5.4 and couldn’t find a debian package for it.The current stable version is 5.4.10 but this changes often and I wanted to automate the compiling and packaging process.First thanks to Jordan Sissel who is a bad ass sysadmin/developer and who wrote fpm which I’m using here to create the debian package.

The end result is a script that will install prerequisite pacakges, download, compile and package which ever php version you specify.

The basic process is:

1 install the prerequisite packages needed to compile

2 download the php source

3 uncompress

4 configure

5 make

6 make install but do this while changing its destination directory

7 create the package

Step 6. is where php got a little tricky.In the fpm wiki page which describes how to package something that uses make install ( )It has you changing the destination directory in the make install process by specifying:

make install DESTDIR=/tmp/installdir

However this didn’t work with php, instead I had to specify the install_dir:

INSTALL_ROOT=/tmp/installdir make install

FPM is really simply to use, also because its a ruby gem is easy to install.To create the package I’m using the following command:

fpm -s dir -t deb -n php -v 5.4.10 -p "libmcrypt-dev" -p "libt1-dev" --deb-user root -p php-VERSION_ARCH.deb --description 'php 5.4.10 compiled for apache2' -C /tmp/installdir etc usr

Its pretty self explaintory but a few things I’ll point out are “-p” which are packages that are dependancies, and “etc” & “usr” are the sub directories to /tmp/installdir which you want packaged up.

Script to download, compile and package php

php 5.4.10 debian package for ubuntu 12.04

What I do in DevOps

I take ownership of a companies infrastructure.
To manage it I write software.
My language of choice is ruby and the framework is chef.
Along with these skills I also bring an expertise in many technologies.
MySQL, apache, Linux, mongodb, data centers, cloud providers, etc……..
I also pair with engineering to streamline processes such as deployments, metrics and performance, scaling, security and continuous integration and testing.