Monday, January 26, 2009

SimpleGate

At yoMedia we frequently have to perform simple tasks on servers that are only reachable by hopping through a number of other servers. To deploy Rails, Merb or Ruby projects, we use Capistrano and it works great. It is easy to configure and you can set it to use any number of gateway servers if that is needed to reach a server.

When deploying ruby gems or executing arbitrary commands, Capistrano does not really work for me. Don't get me wrong, the upload:deploy task is great if you want to send an updated project file to your deployed project. Sending a gem or configuration file to an arbitrary directory on the server is not as easy. It is hard to break out of its project-sized box.

Our strategy so far has been to try and minimize our interaction with servers that were more than one hop away from our laptops. When we did have to restart a daemon or look at log files, we would have to do the SSH hop, hop, hop ritual and do our thing. Then exit, exit, exit until we're back on our laptop's bash shell. Having RSA key logins to a number of servers saves the trouble of having to enter passwords, but the manual SSH-hopping does get tedious after a while.

Being the kind of programmer who rather scripts the tedious things away rather than perfecting his typing speed to speed up repetitive tasks, I figured it must be possible to automate these things. Knowing I'm not the only one with this problem, Google led me to a solution. Net-ssh-gateway (NSG). Thank you, Jamis Buck!

NSG makes it possible to establish an SSH connection through a gateway server to the next gateway server behind it using port forwarding. Repeat this multiple times until you've connected to the final server. The code for this looks like this:



Since this is rather bulky and non-DRY code, let's condense it into something that involves less repeating of code:



Capistrano uses NSG internally for its gateway connection code, but it seems kinda tightly coupled with Capistrano internals. Also, when I looked at it for the first time, I did not really get how it worked or what it did. To gain a better understanding, I decided to extract the relevant code into a script and rework it to be stand-alone.

The great thing about trying to do something with code you don't really understand is that you will understand it once you have dissected it far enough. After that, you can work with it and adapt it to suit your needs.

This resulted in an early version of SimpleGate, my attempt at creating a wrapper library around NSG to make gateway chaining easier. The first version was simply the relevant Capistrano method reworked to work in isolation. The next version improved on the gateway chaining by making at as easy as calling SimpleGate.new.through_to(%w[foo bar baz]) to connect through foo and bar to baz.

SimpleGate also has a ServerDefinition to wrap a simple YAML configuration file that stores the actual server connection information. This is useful for cronjob-activated scripts and other non-interactive code when you have a password authentication server in the chain of gateways. For command line tools using SimpleGate, it saves typing.

Capistrano has a good support for SSH, its configuration files and the various authentication schemes. Passwords and RSA keys are both not a problem. SimpleGate currently only supports passwords, as that is what NSG supports out of the box. RSA key logins are a todo item. For configuration, it does the simplest thing that can possibly work: just store the connection info as a plain YAML file in ~/.servers.yml.

After two minor version bumps, I had something that was good enough to build a script that connected through multiple gateway servers to my target and request its uptime. The next step was to execute arbitrary commands, which was a small modification.

Here I discovered another hard to reproduce feature that Capistrano executes in a really nice way: sudo passwords. For some reason I still have to discover, SimpleGate does not let me enter a password when asked for it on the remote server: it just fails the password check and quits. I guess that is another todo item.

After discovering this I wanted to check up on another server that was hidden behind a number of hops and started to change the hard-coded gateway connection sequence in my test script to connect to the new server.

Woah! Wait! Full stop.

Hard-coded connection info is not good. The server name should be a command line option and the connection sequence should be figured out by the script, not by me. Since I was not interested in manually working out all possible connection sequences, I figured it was time to add a simple path-finding Router class to the project...

After a quick read of relevant sections in Bratko, to refresh my knowledge of the topic, I decided to model the search space as a directed non-cyclical graph and search through it using a simple depth-first recursive search algorithm. Support for cyclical graphs will be another todo.

For every node in the network, all its possible connections are described in a YAML file, that is just a Hash of Arrays with server name strings. A special 'local' node represents my laptop or any arbitrary internet-connected system. The search algorithm comes down to:
  1. If we search from target to target, return that the route is target.
  2. If the current node has outgoing connection possibilities, try them all, keep the shortest and return it with the current node prepended to the returned list of nodes.
  3. If there are no outgoing nodes, return nil.
This results in a relatively naive path finder, but with only 15 nodes in my network, this is not much of a problem. Smarter algorithms can be added once I actually need them.

A version bump later I remembered by original goal: upload a gem, install it and restart the daemon associated with it. Restarting is all done in userland, so that is not a problem. Installation requires sudo, which is still a todo item. Uploading was still open.

NSG can open a normal net/ssh session. net/sftp can use this session to do file transfers. A quick copy-paste-adjust later I had a new executable for copying a single file to a single server, through an arbitrary number of hops.

Right now SimpleGate is at version 0.5.0 and it has the three items noted above as left to do. Then it has its core functionality and should be polished up for its 1.0 version.

The command line tools should get parameter support and a --help interface. Then the config files should both be documented and command-line editable. Once those things are in, multi-server support might be useful. In a Capistrano-like fashion connect to multiple servers (sequentially or in parallel) and execute commands on all of them or upload file(s) to them. The file uploader can get a better interface instead of mimicking the code.