19 June 2015


Running Chef roles from Capistrano

Chef databags are used to pass information between sessions of Capistrano and Chef. Databags are generated on Capistrano side and then used by Chef for server and components configuration.

In my new project und Capiche I use tandem of Capistrano (or cap) and Chef-solo (chef) to deploy stacks of applications. Stacks are collections of server configuration, software packages required to run application, software configuration, and application code itself.

Capistrano + Chef

Here's a brief flow of typical deployment process in this schema:

  • locally capistrano creates configuration to be used in Chef;
  • cap copies Chef-solo and configuration to all remote hosts;
  • cap starts chef-solo on remote hosts;
  • after chef finished, cap starts standard application deployment.

Databag messenger

There are significant differences in how cap and chef executed. First and main difference is that they are invoked on different servers. Cap starts deployment from local server, but chef processes (possibly many of them) run on remote servers in parallel.

There are no standard ways (like calling method and passing arguments) for information exchange between cap and chef. I use chef databags as information medium between cap and chef.

Capistrano needs an ability to generate databags from its own configuration in the format that chef cookbooks understand; also capistrano should be able to read and search chef databags.

Hosts (AKA nodes) databag

First and very important task, is to be able to pass server configuration -- including roles -- from local Capistrano process to the remote nodes, where chef can use it. Example below shows how this could be implemented.

Suppose you have hosts configuration like the following:

server '',  :app, :web, :db, :admin, hostname: "mysql01", primary: true
server '',  :app, :web,              hostname: "web01"
server '',  :app, :web,              hostname: "web02"
server '', :logger, :dns, :security, :monitoring, no_release: true, hostname: "master"

From this configuration Capistrano recipe creates databag called :node with one item for each capistrano host. Example of a recipe:

task :roles do
    find_servers.each do |server|"#{dir}/#{server}.json", "w") do |f|
                role:      role_names_for_host(server),
                fqdn:      server.options[:hostname] ||,
                options:   server.options

Generated databag for one of the host looks like:

    "id"        : "10_0_1_11",
    "role"      : ["logger", "dns", "security", "monitoring"],
    "fqdn"      : "",
    "ipdaress"   : "",
     "options" : {
         "no_release"    : "true",
         "hostname"      : "master"

This databag can be used directly with chef recipes, using data_bag_item or search methods as for example:

servers    = search(:node, '*')
monitoring = search(:node, "roles:*monitoring*")

Note: There is one trick though. Recipe needs to be written in such a way as to support search in Chef-solo.

By default Chef-solo is only able to use data bags for data_bag and data_bag_item operations; to use search with solo you need to use extension chef-solo-search.

Note 2: In some cases this is not enough, however: if cookbook designed in such a way that search is explicitly prohibited in solo mode, some changes to the cookbook are necessary. Below is an example of the modifications I had to do for Munin cookbook to make it work with both Chef-solo and search.

# Original cookbook recipe
if Chef::Config[:solo]
   sysadmins = data_bag('users').map { |user| data_bag_item('users', user) }

# Change to
cant_search = Chef::Config[:solo] &&
if cant_search
   sysadmins = data_bag('users').map { |user| data_bag_item('users', user) }

Applying Chef roles

There are two sides to actual executing a Chef cookbook via capistrano: cap side and chef. On every server we need to execute command that applies configuration specific to that server. We could do this sequentially, by using :hosts option to run method, but this will be really slow.

Instead of this, I am using a bit of Capistrano magic and small Ruby script on remote server(s) side.

Capistrano task

Following task starts parallel run of remote script run_roles.rb on all servers with option that contains server name from cap configuration. This $CAPISTRANO:HOST$ variable is special Capistrano magic, it is internally converted by Capistrano to (Note: It is not shell variable, it is set by Capistrano before shell runs).

task :roles do
    run %Q{ #{try_sudo} #{chef_solo_remote}/run_roles.rb  $CAPISTRANO:HOST$ }

On remote (i.e. chef side) there's a small script, that takes as an argument and applies roles from node databag.

What happens here is the following:

  • script reads databag with current server configuration;
  • it reads all roles files (role.json) that are listed in the databag;
  • combines them into single run_list; and
  • runs chef-solo using combined run list.
Chef::Config[:solo] = true
Chef::Config[:data_bag_path] = "#{current_path}/data_bags"
current_host = ARGV[0].gsub(/\./,'_')

exit unless current_host

roles =  Chef::DataBagItem.load(:node, current_host)["role"]

run_lists = []{ |x| "#{ current_path}/#{x}.json"}.each do |role|
  run_lists += JSON.parse( role)["run_list"]
end"#{current_path}/#{current_host}.json", "w") do |f|
  f.print({ run_list: run_lists.compact.uniq }.to_json)

cmd = "cd #{current_path} && chef-solo --config solo.rb --json-attributes #{current_host}.json"
  PTY.spawn (cmd) do |stdin, stout, pid|
      stdin.each { |line| print line }
    rescue Errno::EIO
rescue PTY::ChildExited
  puts "The child process exited!"

Next steps

To be able to deploy full stacks of applications additionally to basic hosts information Capistrano needs to share all its configuration with Chef. In next installments I will describe how this can be achieved.


