Saturday, February 11, 2012

Configuration Management with Rails

DevOps is apparently the hot new thing, as it should be. The ideal world of sysadmin having the ability to (test-driven-) develop the configuration management of all the machines in their responsibility is here, now. Combining that with the power of cloud computing, the world is moving toward the next age of computing. The true Cloud Age.

I am a Rails guy. It will stay that way at least for a few years to come. I have been deploying different Rails application on different projects. They are all automated, all in different mindset, goal, and methodologies. Recently I have a chance to have a complete control on deploying a new code to a new server for a startup company. The experience is priceless, and I have a giant leap of understanding of why DevOps is such an important step we have to take to achieve continuous delivery.

Let's see our case study. We have requirements that our application is to be deployed on an Ubuntu 11.04 64 bit server. There are, as are a few other projects, other apps hosted on it as well. There is only one server in this startup company, and we are not about to go spend the little money we have on another box. The server box already have Apache2, MySQL, and whatever else a PHP site needs to have a running PHP site. What I need done is to configure it so that Apache2 proxies a subdomain requests to Rails, with Unicorn as the Rails server.

There are a few tools available. There's Puppet. There's Chef. There's our plain old shell script. My past experience with Puppet is not bad, but could have been better. Remembering all that DSL is a little unnecessary when I already know Ruby and Chef uses Ruby as its DSL. So Chef is a reasonable tool.

Learning Chef is a pain. At a glance, the documentation is really detailed about how things are organized, what goes where. Dig a little further, you'll see that Opscode people have a whole lot of assumption that what they know in their heads are also in yours. A simple question like where/how attributes are supposed to be defined in the node definition JSON file is not easy to answer. Even a simpler question like how do I start is also quite a hard question. Especially when you are decidedly not paying for a hosted Chef solution.

Enough rants, back to the point. Since I don't have too large an infrastructure to take care of, and I cannot afford to have a separate server running, I choose to deploy the whole thing with chef-solo. I package the whole chef repository, ship it to the target machine, and run it standalone.

With Chef, you can abstract a server box as a node. You have a whole lot of recipes to choose from. You add these recipes to the node's run list. When run, Chef will look at all the attributes you specify, for example, what user and group unicorn should run as, what the name of your Rails application is, what the virtualhost name of the Rails application is, etc. Then it reads all the recipes in the Chef repository, defining services, and other facilities along the way. Then it looks at the node's run list. It figures out what recipe to run. The recipes themselves are the code that installs package, put the right files in the right places with the right owner, group, and permission. It can dynamically create the file with erb, too. Upon completion, you have a server with all the packages installed, and ready to go. No matter how many times you run the script on any number of different boxes, the end result is the same. People with little experience on configuration management will not believe how hard it is to achieve this idempotent property by hand.

When enough nodes are defined, you will begin to see a pattern. The run list can be aggregated into groups and if you need a certain feature, the whole group of recipes need to be included. Chef allows you to define this group of run list, and other attributes, as roles. So what you can do is define the roles such as database_master, database_slave, httpd, rails, etc. So any time you want to define a new node, you can pick out the roles you need without having to specify every single recipe you need all the time. Don't-Repeat-Yourself.

Of course, nobody can achieve everything in one go without mistake. We need a way to test this whole thing. We don't want to test it on our only server with another running application. VirtualBox saves the day for me. I setup a virtual machine, take a pristine snapshot before I run anything on it. Then I start deploying chef to it. Things breaks many times, as I am no expert on configuration management. All I need to do is revert the virtual machine to the pristine state, and I have a fresh fixture to test things again. There is also chef-spec that is supposed to let you test your Chef recipes as well. Unfortunately I cannot learn Chef in concurrent with any other thing at one time. So it goes to my lab list.

All is fine and dandy, I got two nodes defined, one is the staging VM which requires everything, Apache2, MySQL, Bundler, Bluepill, Rails, and Unicorn. The second is for the production server, with only Bundler, Rails, Bluepills, and Unicorn. Ok, great. Now I can deploy the code and start having a server to show people. ... therein lies a problem.

Traditionally, Rails has its own deployment automation tool, Capistrano. It does the second half of what Chef is supposed to do. It puts Rails code in the right place, with the right symlink, etc. It also installs the bundle, runs migrations, and fires up the (monitoring process, which in turn fires up the) server. It also allows you to define other deployment tasks to be done on any number of remote machines involved with this Rails code.

The conflict begins when you have a part of Chef recipe that needs Rails deployed, but you cannot deploy Rails before Chef completes, such as the monitoring tool configuration, logging facility, and so on. A bad solution is to let Chef finish half the way, and have Rails finish the other half. This works, but I don't like the fact that my configuration is now fragmented into two places. Rails itself should not have been involved in configuration management in the first place anyway.

My plan of action is then to have Chef take care of everything, including deploying Rails itself. I will have a new rails-deployment cookbook that will: check out Rails code, with the right git tag, to /tmp, and run bundle install and cap deploy from there. Ideally, I will want to have most of the configuration files, such as  deploy.rb, database.yml, etc, in erb templates and managed by Chef as well. The end result should allow me to just change the release tag in the rails role, run chef, and the new code should be deployed and ready to serve client requests.

When that is done, I will come back and tell you how it goes.

No comments: