access nested node attributes in a chef recipe

If you use chef bookmark this page as you will need to access nested keys at some point.Chef uses ohai to build a hash of each host chef-client is installed on.In a recipe this is stored in a hash named node.If you want to access a value for a key its simple.

ip = node['ipaddress']

Also if you want to determine if a key exists before you try and access it you can use attribute?()


BTW in most cases you will want to make sure the key exists because if it doesn’t chef-client will throw an error.

For nested keys its a bit more difficult, first of all you should always make sure the keys exists. This involves using the has_key? method in nested if statements.Then you can just pull the value from the keys.Below is one way to do it in a recipe.In this example I am making sure the keys filesystem>/dev/sda6>mount exist in the node hash.Then once I’m sure the hash exists I pull out the value.

if node.has_key? "filesystem"     if node["filesystem"].has_key? "/dev/sda6"        if node["filesystem"]["/dev/sda6"].has_key? "mount"           if node['filesystem']['/dev/sda6']['mount'] == '/srv'                execute "foo" do                command "touch /tmp/nested_keys_exist!"                action :run              end           end       end     end  end

riak cluster backup script with compression #riak

This script will create a compressed back up of a riak cluster and keep the previous days copy.I still have to add restoring as an option.

#!/usr/bin/env ruby  t =  f = t -86400  today = t.strftime("%Y-%m-%d")  yesterday = f.strftime("%Y-%m-%d")  def delete_old()    unless Dir.glob('/net/fs11/srv/posterous/nfs/riak/*-old').empty?      l = Dir.glob('/net/fs11/srv/posterous/nfs/riak/*-old')      puts "deleting oldeest backup"      File.delete(l[0])    end  end  def rotate_last(yesterday)    f = "/net/fs11/srv/posterous/nfs/riak/riak_backup-" + yesterday.chomp  + ".bz2"    t = f + "-old"    if  File.exists?(f)      puts "rotating old backup to old"      File.rename(f,t)    end  end  def run_backup(today)    puts "creating backup file"    dump = "/usr/sbin/riak-admin backup riak@ riak       /net/fs11/srv/posterous/nfs/riak/riak_backup-" + today.chomp + " all"    `#{dump}`  end  def compress(today)    puts "compressing backup"    compress = "/usr/bin/pbzip2 /net/fs11/srv/posterous/nfs/riak/riak_backup-" + today.chomp    `#{compress}`  end  delete_old()  rotate_last(yesterday)  run_backup(today)  compress(today)

How to create custom graphs with Munin

Munin is lacking many features that cacti has but one thing its really good at is creating custom graphs.Basically all you need is a script written in any language that when run will print out the values and when given the config argument will print the config for the graph.In the example below I am graphing the number of unicorn processes running on a box and the number of that are busy.The values:

./unicorn_inuse cap.value 21inuse.value 11

You can see above I am getting 2 values to graph, cap.value is the total number of unicorn processes running and inuse.value is the number that are busy.

The config:

./unicorn_inuse configgraph_title Total Unicorns in useinuse.type GAUGEinuse.label Unicorns in useinuse.draw LINE1graph_category Unicorngraph_args --base 1000 -l 0graph_scale nocap.label Total Unicornscap.draw LINE2cap.type GAUGE

Not too many details in the config but graph_category is how to put graphs in a specific bucket in the munin UI.

The graph:Alt textThe code:

#!/usr/bin/env ruby  def get_total()  cmd = 'ps aux| grep capuser | grep unicorn | wc -l'  output = `#{cmd}`  num = output.match(/d+/)  return numenddef get_chillin()  cmd = "ps aux| grep capuser | grep unicorn | grep 'chillin'| wc -l"  output = `#{cmd}`  num = output.match(/d+/)  return numenddef config()       puts 'graph_title Total Unicorns in use'      puts 'inuse.type GAUGE'     puts 'inuse.label Unicorns in use'  puts 'inuse.draw LINE1'  puts 'graph_category Unicorn'     puts 'graph_args --base 1000 -l 0'      puts 'graph_scale no'    puts 'cap.label Total Unicorns'  puts 'cap.draw LINE2'  puts 'cap.type GAUGE'end    argu =  ARGV[0]     if argu == 'config'       config()     else       total = get_total()      chillin = get_chillin()      inuse = total[0].to_i - chillin[0].to_i  puts "cap.value " + total[0].to_s       puts "inuse.value " + inuse.to_s     end

3 ways to push data to graylog2

If you are a sysadmin or developer and you haven’t heard of graylog2 then your missing out.Graylog2 takes log data(or what ever you want to throw at it), stores it for you and allows you to search it.It does this by using mongodb as its backend and providing a web interface written in rails to categorize and search it.In my case its very useful. I manage servers in 4 physical locations, slice host, rackspace, rackspace cloud and EC2. I needed a way to keep all of the system logs in one place with out having to work too hard at it.Graylog2 was my solution.

So far I use 3 different methods to write data to graylog2.

  1. rsyslog over UDP
  2. piping data over net cat
  3. Using the GELF gem which is specific to graylog2

(1) rsyslog over UDPThis is the easiest one by far, and used to write system log data.On ubuntu all I had to do was disable syslog, enable rsyslog and add this one line to /etc/rsyslog.conf


Thats all I had to do.BTW if you want to send the same data over TCP do the following instead.


(2) piping data over net catThis one is also easy to use, just pipe data to net cat provided with a logging facility and hostname.In the example below I am piping a log file to facility 7(debug) with from the hostname

#!/bin/sh  tail -F -q /var/log/nginx/accesslog |   while read -r line ; do  echo "<7> $line" | nc -w 1 -u 514  done

Thats it.Once in graylog2 you can sort/search by hostname, logging level or regex on the data itself.

(3) Using the GELF gem which is specific to graylog2This method provides the most flexibility in that you are allowed to create custom fields.In the example below I am parsing the access_log before I submit to graylog2 using the GELF gem.This results in custom fields which can be used to categorize and sort such as method(GET,PUT,etc..), uri, size, referrer, etc…

#!/usr/bin/ruby  require 'rubygems'  require 'gelf'  def send_gelf(ip,method,uri,code,size,referral)  line = ip + " " + method + " " + uri + " " + code + " " + size + " " + referral  n ="", 12201)  n.notify!(:host => "prod-nginx", :level => 1, :short_message => line, :_ip => ip, :_method => method, :_uri => uri, :_code =>   code, :_size => size, :_referral => referral)  end  ARGF.each do |line|  x = line.split(/s+/)  send_gelf(x[0],x[7],x[8],x[10],x[11],x[12])  end

review: NFL Sunday ticket on PS3

I’m a huge Patriot’s fan but I live in California.

I have comcast in my home for cable and internet. I hear a lot of complaints about comcast but I’ve never had a poor experience and didn’t want to switch to direcTV just because of the Patriot’s.

Luckly with the Sunday Ticket being availble on PS3 ( a few days before the season began) I didn’t have to switch.

The good:

  • Installation and setup was really easy, just had to download the app, which was really small and add the money to my playstation wallet
  • Every game is available except for local games, there is also inline stats while the game is on.
  • Includes NFL red zone which will show all touch downs live for all games.
  • Includes the ability to pause, rewind, fast foward the game almost like a DVR. I say almost because you can’t record the game.
  • Image quality is pretty good, digital qualty at the miminum which would throttle up to HD after a few minutes.
  • I get to watch every Patriot’s game!!

The bad:

  • Cost, being a father of 2 kids with a mortgage it was a really tough decision to spend $340 to watch what amounts to 16 games. I decided not to get it until my wife reminded me that as a father I do very little for myself and to get it. I did.
  • lack of pregame and post game coverage. Didn’t think I would miss this but the broadcast only runs from kickoff to the end of the game, other than that its not avaliable. Would be nice to at least get a 15 minutes pre/post game.
  • Can’t record games. If I’m paying for the content it shouldn’t matter when I watch it, would be nice to have to ability to record.
  • Playstation network, given all the security breaches to the network I felt really uncomfortable putting in my credit card data.


The biggest problem is the cost however there is no way around it, even if you have direcTV the cost is the same. This is just a convienant work around to not having to switch your provider. If you really don’t want to miss games and can afford it, this is a pretty good solution which isn’t perfect but less expensive then going to a local sports bar at 10am on a Sunday morning.



How to scale varnish horizontally with haproxy

We currently rely on varnish to serve up our posts and other pages which are largely static.In fact 40% of requests to our site never hit our web servers as they are served out of varnish’s cache first.For redundancy and also to scale varnish we run two instances(soon to be 3).We initially used varnish’s hashing algorithm based on uri, this worked fine and specific pages were only stored on one varnish.The problem we ran into was when we had to purge a page we ended up sending the command to both varnishes. This causes several problems one is which it simple isn’t scalable, image sending the same purge command to 10+ instances.Another problem with it was the size of the purge.list was twice what is should be. If you manage varnish you know that when the purge list gets too big varnish stops working.

What we decided to do was to direct requests based on the first character of the host name. This works for us because each user has their own subdomain name. Now ont only do all pages for a single user exist on one varnish instance but we can accurately direct the purge request too. One small note if you are going to do this based on uri instead of hostname you will need to edit the regular expression to use the second character as the first will always be a forward slash. In our config below you can see that all hostnames starting with 0-9 & a-j live on web11, everything else lives on web12, in the case of requests(which are not cachable we round robin between the two webs. The haproxy config for this is below:

backend posterous_http_web11mode httpserver web11web11:80 checkbackend posterous_http_web12mode httpserver web12web12:80 checkbackend posterous_http_allmode httpserver web11web11:80 check server web12web12:80 checkfrontend posterous_httpdoor posterous_com_root hdr_beg(host) -i posterous.comacl a-q_hostnames hdr_beg(host) -i 0 1 2 3 4 5 6 7 8 9 a b c d e f g h i juse_backend posterous_http_all if posterous_com_rootuse_backend posterous_http_web11 if a-q_hostnames default_backend posterous_http_web12

How to capture all queries on a very busy MySQL server without adding further strain.

We recently had capacity problems where too large a percentage of read queries where going to our master MySQL server instead of the read-only slaves.What I needed to do was capture the queries on the very busy server without consuming more resources to the disk.I started at first using tcpdump to capture the inbound queries.

sudo tcpdump -s0 -A dst port 3306 and src host app11| strings | grep SELECT| sed 's/^.*SELECT/SELECT/'

This worked really well but I needed to run this for an hour or so to get a decent sample size and couldn’t use the local disks on because they were already at capacity. What I ended up doing was piping the output from tcpdump through ssh.

ssh $TO_HOST cat -  ">" $OUT_FILE

The whole process looks like this.

sudo tcpdump -s0 -A dst port 3306 and src host app11| strings | grep SELECT| sed 's/^.*SELECT/SELECT/' | ssh log11 cat -  ">" db11m_query_log.02AUG2011_3:30-4:00

This allowed my to capture ~100MB file without consuming more IO resources on the local disk.

create a simple graph with MySQL and munin

Munin is lacking many features that other graphing suites have but when it comes to creating custom graphs it excels.To create a custom graph all you need is a script written in any language that outputs the value you are trying to graph and when you supply the config argument it should return its configuration.In this example I’ll create a graph from a MySQL query.The query is returning the total number delayed jobs in our script with no argument to get the value:

$ ./delayed_jobs_totaltotaljobs.value 228964

run the script with the config argument:

$ ./delayed_jobs_total configgraph_title Total Delayed Jobstotaljobs.type GAUGEtotaljobs.label TotalJobs graph_category delayed_jobsgraph_args --base 1000 -l 0graph_scale no

The code:

#!/usr/local/bin/ruby  require 'rubygems'   require 'mysql'  hostname = ''    username = 'REMOVED'    password = 'REMOVED'    databasename = 'delayed_job'  my =, username, password, databasename)     def total_count(my)    rs = my.query('select count(*) from delayed_jobs')      row = rs.fetch_row    return row    end     def config()     puts 'graph_title Total Delayed Jobs'    puts 'totaljobs.type GAUGE'   puts 'totaljobs.label TotalJobs'      puts 'graph_category delayed_jobs'   puts 'graph_args --base 1000 -l 0'    puts 'graph_scale no'  end    argu =  ARGV[0]     if argu == 'config'     config()     else     total = total_count(my)    puts "totaljobs.value " + total[0].to_s     end    my.close

Managing different environments with chef attributes and a case statement.

Like everyone else using chef I manage several environments such as dev, staging, testing & production. Many times the only difference is a single config file( for me that is most daemons such as mongo, mysql, redis, varnish, etc….)There are several ways of doing it. One way is to use templates( which I find to be confusing and more than what I need).Another way is to create a different recipe for each environment (, ,etc….) This seams like too much work.What I found to be the simplest and cleanest way is to use the case statement with attributes.Don’t forget the creators of chef( opscode ) allow you to exit the DML and just write ruby so take advantage of it.
Lets so you have a recipe to install and configure mongo.The entire recipe is the same except for the config file.This can be easily managed using attributes and case statement.First create an empty role for each of the different environments (dev, prod, etc…), then assign them the same attribute in this case app_environment with a different value for the environment.In json this looks like this:


Then in the recipe use the case statement to determine which config file the node gets:

cookbook_file "/etc/mysql/my.cnf" dosource_file = case node[:app_environment]when "dev" then ""when "prod" then ""endsource source_filenotifies :restart, "service[mysql]", :immediately mode "0644"owner "root"end

mac tip, bash alias to lock your screen

I spend most of my time in the terminal, and when I walk away from my laptop I like to lock the screen.Just add this line to the .bash_profile in your home directory.

alias lock='/System/Library/CoreServices/"Menu Extras"/'

also make sure you source the file before you run the lock command.Then to lock the screen just run the lock command.