Using chef to automate internal hostnames in a VPC

 

Typically cloud servers have both an internal and external ip address.

But with VPC’s(virtual private clouds) many companies are deciding to have a few hosts proxy ports 22, 80 & 443 and the other hosts have internal ip’s only.

In both scenarios having an DNS for internal hostname’s is important but in the case of VPC its critical.

One was of doing this is to simply create a DNS entry for each interface.

ie.


hostname.foo.com

hostname.internal.foo.com

 But it doesn’t make sense having an name/ip of an internal server available on a public DNS server.

 What you can do instead is have chef set the hostnames, populate those names/ips in the /etc/hosts files and convince your OS to use that file when performing lookups.

 Setting the hostname.

I usually go with purpose.abreviated_company_name

rails01.gyk

 This is done in chef with this a recipe like this:

execute "set_hostname" do
 command "/bin/hostname -F /etc/hostname"
 user "root"
 action :nothing
end

template /etc/hostname do
 source "hostname.erb"
 notifies :run, execute[set_hostname]
end

And a template like this:

<%= node['hostname']%>.gyk

 

Ok now we have chef setting the correct hostname but hold on, after a while or a reboot (on EC2 at least) the name goes away.

This is because of this obscure setting in /etc/cloud/cloud.cfg”

# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false
If this is set to true then it will not wipe out the hostname you set.

I fixed this with my old friend sed.

execute "preserve_hostname" do
  command "sed -i  's/preserve_hostname\:\ false/preserve_hostname\:\ true/' /etc/cloud/cloud.cfg"
  not_if "grep 'preserve_hostname: true' /etc/cloud/cloud.cfg"
  user "root"
end

Okay so now we have a hostname that sticks, next step…

Populating /etc/hosts will all internal hosts.

 To do this I used the hostsfile cookbook (https://github.com/customink-webops/hostsfile)

 Basically I search chef for all hosts in my environment and write the hostname and ip to the file.

addresses = search(:node, "chef_environment:#{node.chef_environment}")
addresses.each do |dd|
  if dd.has_key?("ipaddress")
    hostsfile_entry dd.ipaddress do
      hostname  dd.hostname + “.gyk"
      action    :create
      unique    true
    end
  end
end

 

Get the OS to use the /etc/hosts file for lookups.

 

Back in the day all I have to do to get /etc/hosts to work was set this in the nsswitch.conf

hosts:          files dns

 At this point without the following step pings and telnet requests to the hosts will work but loopups(nsloopup, host, dig etc.,..) won’t

Yeah super confusing, I won’t go into the science of it but the basic reason is the commands use different libraries when performing lookups.

 However after reading and changing a lot of files with out any success I fixed the problem by installing the dnsmasq package.

Not sure what else this package does but I know it allows your /etc/hosts file to actually be useful.

Now you have a fully automated population of internal hostnames…no DNS entries needed!

centralized custom logging with rsyslog

There are a lot more robust centralized logging solutions out there but during a recent hack day I had about an hour to get logs from 24 servers to a centralized one for processing.

First you will need to get centralized logging set up, I won’t go into that here as that step is well documented but a quick view is to:

1. Configure the server to listen on udp.

2. add this line to the bottom of the rsyslog.conf file

*.* @$IP_ADDRESS:514

Okay so your servers are sending its system logs to a centralized host, now the fun part.

First you will need to create a config file for the log and place it in the /etc/rsyslog.d directory.

In my case I am going to ship a log named event.log

$InputFileName /srv/whi/shared/log/event.log
$InputFileTag eventlog
$InputFileStateFile eventlog
$InputRunFileMonitor
$InputFilePersistStateInterval 10
*.* @$IP__ADDRESS:514

Now add a config to the server to write the log to its own file.

$template ProxiesTemplate,"%msg%\n"
if $programname == 'eventlog' and $msg contains 'viewed.entr'  then /var/log/eventlog.log;ProxiesTemplate

In the first line above I am stripping the log or the timestamp and the hostname, I am only interested in the body .
The second line I am matching the program name and a particular string in the body of the message and writing them to a specific log file.

The last thing you need to do it put in a an exception to the current messages and syslog file otherwise these custom logs will also end up there.

In ubuntu I had to edit the file /etc/rsyslog.d/50-default.conf

I basically had to add this string “event.none” to the lines for syslog and messages

*.*;auth,authpriv.none,event.none    -/var/log/syslog

*.=info;*.=notice;*.=warn;auth,authpriv.none;cron,daemon.none;mail,news.none,event.none    -/var/log/messages

Thats basically it, enjoy.

if you attempt to run sudo pkill you might go insane

<rant>
Lately sidekiq has been leaving a lot of processes around in the stopping state.
My coworker asked me if there as a command to kill them all.
Of course with linux there are a lot of tools to perform this.
In this case however we had to kill the process based on the long listing of it.
ie instead of killall ruby we had to match an argument to the process
example:

sidekiq 2.17.3 whi [0 of 6 busy] stopping

The key was matching “stopping”, which would leave the other sidekiq processes running.

I used:

pkill -f stopping

and this worked perfect the first time.
I go to another box and it won’t work.
The command sudo -f stopping  does nothing, no error but the processes don’t die.
I upgrade the package, read the man page, search the internet.
Still nothing.
Am I going insane? Did I forget everything I know about linux?

Then I become root and run the command…….it works.
The difference was sudo vs. being root.
Of course when you sudo the command there is no warning and the man page doesn’t contain the word sudo.

So word to the wise, when in doubt become root!

</rant>

custom ohai plugins for bonded interfaces

We use bonded interfaces on our production hardware.
But only on production hardware, staging and dev just use the ethX interfaces.
So we needed a way for chef to identify the public & private interfaces regardless of whether they are bonded or not.

To start with I pulled down the ohai cookbook and added a few scripts to the plugins directory.
Thats all there is.
The two plugins below identify the public and private interfaces as either being eth0 || bond0 & eth1 || bond1

private_interface.rb

 provides "private_interface"
cmd = '/sbin/ifconfig bond0'
system(cmd)
if $? == 0
  private_interface "bond0"
else
  private_interface "eth0"
end

public_interface.rb

provides "public_interface"
cmd = '/sbin/ifconfig bond1'
system(cmd)
if $? == 0
  public_interface "bond1"
else
  public_interface "eth1"
end

From the chef ui you can see that public_interface and private_interface are now listed on the top level for a node.
screenshot

This allows me to specify in a template/recipe to use the public or private interface, ohai automatically discovers what the interface actually is.
Example from a recipe for ufw:

firewall_rule "http-internal" do
        port 8098
        action :allow
        interface node['private_interface']
        notifies :enable, "firewall[ufw]"
end

automated MySQL query reports

Back in the day I had automated query reports for MySQL using a perl library.
This worked okay but it only reported on the slow queries and also I would have to install a bunch of icky perl stuff.

Percona’s pt-query-digest is a much better tool and when you combine it with tcpdump you get an analysis of all your queries not just the slow ones.

When writing this script I had to solve two problems.

1. run tcpdump for a specific amount of time

I was prepared to write a loop with a sleep statement and then figure out how to kill tcpdump but I didn’t need to.
Instead I just used timeout which was already installed on ubuntu.

2. How to email the resulting report as an attachment.

When sending emails I usually just use mail but I couldn’t figure out how send an attachment.
Instead I found mutt.

BTW for an extra challenge I decided to write this in bash, loops in bash are really ugly for the record.

The script:

#!/bin/bash
hn=`hostname`
# mutt won't send the mail from the command line without prompting you 
# for the bodies content. To work around I am using a empty file as the body.
touch /tmp/blank
queries=( SELECT INSERT UPDATE )
for i in "${queries[@]}"
do
   :
# To clean up the tcpdump you have to include the pipe to sed
        /usr/bin/timeout 180 /usr/sbin/tcpdump -s0 -A -i bond0 dst port 3306 | /usr/bin/strings | /bin/grep $i | /bin/sed 's/^.*$i/$i/' &gt; /tmp/$i.log
        /usr/bin/pt-query-digest --type rawlog /tmp/$i.log &gt; /tmp/$i.txt
        /usr/bin/mutt -s "$i report from $hn" xxxx@weheartit.com  -a /tmp/$i.txt &lt;/tmp/blank
done

Speed up your backups by compressing and posting in parallel

Its pretty typical to have data stores that are several hundreds of GB’s in size and need to be posted offsite.

At weheartit our database is ~1/2 TB uncompressed and the old method of compressing and posting to S3 took 9 hours and rarely completed.

I was able to speed up this process and now it completes in < 1 hour.
53 minutes in fact.

For compression I used pbzip2 instead of gzip.

This is how I am using it along with percona’s xtrabackup.

innobackupex --user root --password $PASSWORD --slave-info --safe-slave-backup --stream=tar ./ | pbzip2 -f -p40 -c  > $BACKUPDIR/$FILENAME

The backup and compression only takes 32 minutes and compresses it from 432GB to 180GB

Next comes speeding up the transfer to S3.

In November of 2010 amazon added this feature to S3 but for some reason this functionality hasn’t been added to s3cmd.
Instead I am using s3 multipart upload
Thanks David Arther!

This is how I am using it.

/usr/local/bin/s3-mp-upload.py --num-processes 40 -s 250 $BACKUPDIR/$FILENAME $BUCKET/$(hostname)/$TIME/$FILENAME

It only takes 20 minutes to copy 180GB over the internet!
That is crazy fast.
In both cases you can play around with the number of threads for both pbzip2 and s3 multi part upload, the threads I use work for me but that depends on the size of your system.

mysql multi threaded slaves (mts) slower than single threaded

I work @ weheartit.com where we rely on MySQL.
I’ve seen very little published about mts and nothing from outside a lab so I decided to test it out.
The results weren’t good.

Our main database group has 4 active schema, is running 5.6.12 and when a slave gets our of sync its takes a while to catch back up to the master.

One of the most interesting features for MySQL 5.6 is multi threaded slaves.

Without this feature the sync speed is limited to a single thread running on a single core.

Before I start let me clear up this point about mts which is that this feature will only help if you are running more than 1 schema per host as each thread can only process one schema at a time.

That being said I went and upgraded one of my slaves to 5.6.12, restored an xtrabackup to it.

Then I added the following lines to the my.cnf and ran start slave.


binlog-format=STATEMENT
 slave_parallel_workers = 4
 master_info_repository = TABLE
 relay_log_info_repository = TABLE

Now I can just run show slave status\G and watch it catching up.

However once it was caught up I stopped replication on the mts host and single thread slave for 20 minutes.
Then I started the slaves and it turns out that the single threaded slave caught up faster.
What?
to eliminate disk and RAID configs( they were the same as I could tell) this next time I only stopped the sql_thread for 20 minutes.
Same results, the slave running mts is actually slower.

Looks like there is reason when you search for this topic the only posts are from people using it in a lab is because although it appears to function it doesn’t delivery what ultimately need to which is faster replication syncs.

I’ll keep watching the mysql releases and hope this gets fixed soon.