Using chef to automate internal hostnames in a VPC


Typically cloud servers have both an internal and external ip address.

But with VPC’s(virtual private clouds) many companies are deciding to have a few hosts proxy ports 22, 80 & 443 and the other hosts have internal ip’s only.

In both scenarios having an DNS for internal hostname’s is important but in the case of VPC its critical.

One was of doing this is to simply create a DNS entry for each interface.


 But it doesn’t make sense having an name/ip of an internal server available on a public DNS server.

 What you can do instead is have chef set the hostnames, populate those names/ips in the /etc/hosts files and convince your OS to use that file when performing lookups.

 Setting the hostname.

I usually go with purpose.abreviated_company_name


 This is done in chef with this a recipe like this:

execute "set_hostname" do
 command "/bin/hostname -F /etc/hostname"
 user "root"
 action :nothing

template /etc/hostname do
 source "hostname.erb"
 notifies :run, execute[set_hostname]

And a template like this:

<%= node['hostname']%>.gyk


Ok now we have chef setting the correct hostname but hold on, after a while or a reboot (on EC2 at least) the name goes away.

This is because of this obscure setting in /etc/cloud/cloud.cfg”

# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false
If this is set to true then it will not wipe out the hostname you set.

I fixed this with my old friend sed.

execute "preserve_hostname" do
  command "sed -i  's/preserve_hostname\:\ false/preserve_hostname\:\ true/' /etc/cloud/cloud.cfg"
  not_if "grep 'preserve_hostname: true' /etc/cloud/cloud.cfg"
  user "root"

Okay so now we have a hostname that sticks, next step…

Populating /etc/hosts will all internal hosts.

 To do this I used the hostsfile cookbook (

 Basically I search chef for all hosts in my environment and write the hostname and ip to the file.

addresses = search(:node, "chef_environment:#{node.chef_environment}")
addresses.each do |dd|
  if dd.has_key?("ipaddress")
    hostsfile_entry dd.ipaddress do
      hostname  dd.hostname + “.gyk"
      action    :create
      unique    true


Get the OS to use the /etc/hosts file for lookups.


Back in the day all I have to do to get /etc/hosts to work was set this in the nsswitch.conf

hosts:          files dns

 At this point without the following step pings and telnet requests to the hosts will work but loopups(nsloopup, host, dig etc.,..) won’t

Yeah super confusing, I won’t go into the science of it but the basic reason is the commands use different libraries when performing lookups.

 However after reading and changing a lot of files with out any success I fixed the problem by installing the dnsmasq package.

Not sure what else this package does but I know it allows your /etc/hosts file to actually be useful.

Now you have a fully automated population of internal hostnames…no DNS entries needed!

centralized custom logging with rsyslog

There are a lot more robust centralized logging solutions out there but during a recent hack day I had about an hour to get logs from 24 servers to a centralized one for processing.

First you will need to get centralized logging set up, I won’t go into that here as that step is well documented but a quick view is to:

1. Configure the server to listen on udp.

2. add this line to the bottom of the rsyslog.conf file

*.* @$IP_ADDRESS:514

Okay so your servers are sending its system logs to a centralized host, now the fun part.

First you will need to create a config file for the log and place it in the /etc/rsyslog.d directory.

In my case I am going to ship a log named event.log

$InputFileName /srv/whi/shared/log/event.log
$InputFileTag eventlog
$InputFileStateFile eventlog
$InputFilePersistStateInterval 10
*.* @$IP__ADDRESS:514

Now add a config to the server to write the log to its own file.

$template ProxiesTemplate,"%msg%\n"
if $programname == 'eventlog' and $msg contains 'viewed.entr'  then /var/log/eventlog.log;ProxiesTemplate

In the first line above I am stripping the log or the timestamp and the hostname, I am only interested in the body .
The second line I am matching the program name and a particular string in the body of the message and writing them to a specific log file.

The last thing you need to do it put in a an exception to the current messages and syslog file otherwise these custom logs will also end up there.

In ubuntu I had to edit the file /etc/rsyslog.d/50-default.conf

I basically had to add this string “event.none” to the lines for syslog and messages

*.*;auth,authpriv.none,event.none    -/var/log/syslog

*.=info;*.=notice;*.=warn;auth,authpriv.none;cron,daemon.none;mail,news.none,event.none    -/var/log/messages

Thats basically it, enjoy.

if you attempt to run sudo pkill you might go insane

Lately sidekiq has been leaving a lot of processes around in the stopping state.
My coworker asked me if there as a command to kill them all.
Of course with linux there are a lot of tools to perform this.
In this case however we had to kill the process based on the long listing of it.
ie instead of killall ruby we had to match an argument to the process

sidekiq 2.17.3 whi [0 of 6 busy] stopping

The key was matching “stopping”, which would leave the other sidekiq processes running.

I used:

pkill -f stopping

and this worked perfect the first time.
I go to another box and it won’t work.
The command sudo -f stopping  does nothing, no error but the processes don’t die.
I upgrade the package, read the man page, search the internet.
Still nothing.
Am I going insane? Did I forget everything I know about linux?

Then I become root and run the command…….it works.
The difference was sudo vs. being root.
Of course when you sudo the command there is no warning and the man page doesn’t contain the word sudo.

So word to the wise, when in doubt become root!


custom ohai plugins for bonded interfaces

We use bonded interfaces on our production hardware.
But only on production hardware, staging and dev just use the ethX interfaces.
So we needed a way for chef to identify the public & private interfaces regardless of whether they are bonded or not.

To start with I pulled down the ohai cookbook and added a few scripts to the plugins directory.
Thats all there is.
The two plugins below identify the public and private interfaces as either being eth0 || bond0 & eth1 || bond1


 provides "private_interface"
cmd = '/sbin/ifconfig bond0'
if $? == 0
  private_interface "bond0"
  private_interface "eth0"


provides "public_interface"
cmd = '/sbin/ifconfig bond1'
if $? == 0
  public_interface "bond1"
  public_interface "eth1"

From the chef ui you can see that public_interface and private_interface are now listed on the top level for a node.

This allows me to specify in a template/recipe to use the public or private interface, ohai automatically discovers what the interface actually is.
Example from a recipe for ufw:

firewall_rule "http-internal" do
        port 8098
        action :allow
        interface node['private_interface']
        notifies :enable, "firewall[ufw]"

Benchmark: Rackspace’s block storage SATA vs. SSD vs. VM disk

Rackspace offers 2 types of block storage:

Standard (SATA) @ $015/GBandHigh-Performance (SSD) @ $0.70/GB

Seeing that the SSD storage is 4x the cost of SATA I decided to see if the performance is also 4x.

Lets see.

The setup:

An 8GB(RAM) system running ubuntu 12.04&2 100GB volumes with the xfs file system mounted with the following options:

/dev/xvdb   /fast     xfs noatime,nodiratime,allocsize=512m   0   0  /dev/xvdd   /slow     xfs noatime,nodiratime,allocsize=512m   0   0

Basic test using dd:

I’ve benchmarked lots of storage systems in the past and I always like to start out with dd.I do this because it doesn’t take anytime to set up and should give you some idea of how it performs.

In this test I create a 20GB file on each mounted filesystem using the following command:

dd if=/dev/zero of=10GB.file bs=1M count=20k

The results are a little surprising:

Volume write performance:

standard            105 MB/shigh-performance        103 MB/sthe hosts's own volume      135 MB/s

Wow, not what are were hoping for.I ran this test several times and the “high-performance” storage was always the slowest.To quote Homer Simpson “Doh!!”


I ran bonnie with the following args, basically I specified double the amount of RAM for the test.

bonnie++ -s 16g

For sequential reads and writes they were about the same, this is expected as dd already showed this:

Volume                sequential reads            sequential writes  standard              95981/sec                   16564/sec  high-performance      95927/sec                   15633/sec  localVM               108463/sec                  1208/sec

The results now show where the high-performance excels which is random seeks.

Volume                random seeks  standard              473.4/sec  high-performance      1969/sec  localVM               578.6/sec


The question was:Does the 4x cost of high-performance storage perform 4x?

The answer is yes.

Nice job rackspace.

However, as with the sequential numbers from above it doesn’t always out perform standard or local disk. So before you decide to use the more expensive option benchmark your application on it.

Compiling and packaging php 5.4.10 for ubuntu 12.04

We are in the midst of upgrading from 5.3.10 to 5.4 and couldn’t find a debian package for it.The current stable version is 5.4.10 but this changes often and I wanted to automate the compiling and packaging process.First thanks to Jordan Sissel who is a bad ass sysadmin/developer and who wrote fpm which I’m using here to create the debian package.

The end result is a script that will install prerequisite pacakges, download, compile and package which ever php version you specify.

The basic process is:

1 install the prerequisite packages needed to compile

2 download the php source

3 uncompress

4 configure

5 make

6 make install but do this while changing its destination directory

7 create the package

Step 6. is where php got a little tricky.In the fpm wiki page which describes how to package something that uses make install ( )It has you changing the destination directory in the make install process by specifying:

make install DESTDIR=/tmp/installdir

However this didn’t work with php, instead I had to specify the install_dir:

INSTALL_ROOT=/tmp/installdir make install

FPM is really simply to use, also because its a ruby gem is easy to install.To create the package I’m using the following command:

fpm -s dir -t deb -n php -v 5.4.10 -p "libmcrypt-dev" -p "libt1-dev" --deb-user root -p php-VERSION_ARCH.deb --description 'php 5.4.10 compiled for apache2' -C /tmp/installdir etc usr

Its pretty self explaintory but a few things I’ll point out are “-p” which are packages that are dependancies, and “etc” & “usr” are the sub directories to /tmp/installdir which you want packaged up.

Script to download, compile and package php

php 5.4.10 debian package for ubuntu 12.04

What I do in DevOps

I take ownership of a companies infrastructure.
To manage it I write software.
My language of choice is ruby and the framework is chef.
Along with these skills I also bring an expertise in many technologies.
MySQL, apache, Linux, mongodb, data centers, cloud providers, etc……..
I also pair with engineering to streamline processes such as deployments, metrics and performance, scaling, security and continuous integration and testing.

rackspace: their service blows but they have high uptime

I started using rackspace managed hosting back in 2005. This is before the “cloud” so they were all physical hosts.
Since then I’ve used them for a mix of cloud and physical for several different companies.
I have also build data centers from the ground up and managed a lot of services on AWS( amazon web services).
After being a customer of rackspace over the last seven years I have formed the following opinion.

1. Their service blows
If you think the cloud is “scary” and mysql gives you the willies then their support might seem like wizards but in reality they are not.
I found out from an inside source that many of their support team are high school graduates who are run through a boot camp.

Instant wizards.

I have had countless dealings with support where I found zero value in the exchange.
This isn’t true of their whole team of course, at some point you might get the contact information of someone who can actually help you.
When you do hold on to it.
A great example of this is their “managed mysql”, what it amounts to is mysql installed on a supposedly faster file system and backed up.
Of course its tuned which means the innodb_buffer_size is changed per RAM size.
Thats it.
If you really have a problem with MySQL no way are the bootcamp kids going to be of much use to you.

2. They have high uptime
In both cases of physical and cloud rackspace has very good uptime.
Especially in comparison to EC2, and I’m not just talking about the big EC2 outages I’m talking about the day to day.
Its pretty common in EC2 to have a server freeze up and need to be rebooted or have it reboot on its own.
Of course when you decide to move to the cloud this is something you need to plan for but in the case of rackspace I can only think of a few times when one of my cloud instances ever went offline unexpectedly.

simple way to get notified when a cronjob fails

Every place I’ve ever worked had cronjobs running all over the place.Some are simple tasks like clearing out a temp directory.Others end up being a critical piece of the infrastructure that a developer wrote with out telling anyone about.I like to call this type of scheduled job the glue as its usually holding your company together.

True story I once found a cronjob running on a cluster of 200 servers named that restarted an app every 30 seconds!!

In most cases the “glue” cronjob is unknown to anyone as to where the job runs, how often and most importanlty when it fails.There are a few tools out there to put all of your scheduled jobs in one spot and will take actions on failure.Some of those include opswise ( which I’ve used in the past and had a lot of success with and Amazon’s Simple Workflow Service ( which I haven’t used yet.

There is also an opensource project sponsered by yelp called tron which does most of this already except for notifying when it fails.BTW there is a feature request for this already, ( )

Anyway as a quick work around I just add a check for the exit code in my crontab which will alert me if the job doesn’t exit zero.


1 0 * * * touch /home/dodell/foobar|| if [ $? -ne 0 ] ; then mail -s 'touch_file failed' < /etc/hostname ;exit 1

add timestamps to your standard out and standard error

A lot of time when executing a cronjob or a long running command I capture the standard out and standard out to a log file.This works okay but without time stamps it isn’t really useful especially for a job that runs many times a day which makes it difficult to tell which lines in the log match the run.What I do now is copy a script to all my systems (using chef of course) which will annotate any output I pipe to it.A command line example:

dodell@spork/etc$ cat resolv.conf | /usr/local/bin/   Thu Sep  6 14:39:59 PDT 2012: # Automatically generated, do not edit  Thu Sep  6 14:39:59 PDT 2012: nameserver  Thu Sep  6 14:39:59 PDT 2012: nameserver

Okay not a super useful example but you get my point.This is even more useful when added to a cronjob:

1 0 * * * /usr/local/bin/ backup 2>&1| /usr/local/bin/  >> /var/log/mysql/xtrabackup.log

and the output:

Thu Sep  6 00:01:02 PDT 2012:   Thu Sep  6 00:01:02 PDT 2012: InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy  Thu Sep  6 00:01:02 PDT 2012: and Percona Inc 2009-2012.  All Rights Reserved.  Thu Sep  6 00:01:02 PDT 2012:   Thu Sep  6 00:01:02 PDT 2012: This software is published under  Thu Sep  6 00:01:02 PDT 2012: the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.  Thu Sep  6 00:01:02 PDT 2012:   Thu Sep  6 00:01:02 PDT 2012: 120906 00:01:02  innobackupex: Starting mysql with options:  --password=xxxxxxxx --user='debian-sys-maint' --unbuffered --  Thu Sep  6 00:01:02 PDT 2012: 120906 00:01:02  innobackupex: Connected to database with mysql child process (pid=19867)  Thu Sep  6 00:01:08 PDT 2012: 120906 00:01:08  innobackupex: Connection to database server closed  Thu Sep  6 00:01:08 PDT 2012: IMPORTANT: Please check that the backup run completes successfully.  Thu Sep  6 00:01:08 PDT 2012: At the end of a successful backup run innobackupex  Thu Sep  6 00:01:08 PDT 2012: prints "completed OK!".

Ah, how beautiful standard out and error with time stamps…….magic.

The code:

#!/bin/bash  while read line  do     echo "$(date): ${line}"   done