Monday, June 22, 2009

A possible future for package management

My recent post on the Kings of code side event got too long, so I extracted the following into its own blog post. It is a collection of thoughts on a possible presentation topic.

RubyGems has been around for ages and has made it relatively easy to distribute Ruby code. Not everyone uses it, though. Some prefer to use the Debian package manager, or whatever their OS provides, instead. This is very useful if a gem has external dependencies, but it is not as portable as RubyGems.

RIP was recently released (well it is only version 0.0.1, but still) as something to use complementary to RubyGems. It does not allow relative version requirements (<, <=, >=, >) for dependencies, only exact version requirements. It borrows the concept of virtual environments from the Python world. A different approach to package management out in the wild means people will gain new insights. What can we learn here? Where lies the right balance between having rigid, version specific, dependencies and open-ended dependencies?

Thinking along the dependency management line, why do we require exact versions or do we put an upper limit on accepted versions? The only reason I can think of is incompatibilities introduced in later versions, but is it right at all to introduce backward incompatibilities in your API? Can't we learn something from functional programming here?

In FP, pure functions don't have side effects. One of the implications is that the data they receive does not get altered. You don't add a new item to an existing array; you return a new array with the new item appended to the existing array. Because of this, there is no problem when you have a multi-threaded program: there is no risk that two threads will try to modify a shared resource at the same time.

This means you don't need mutexes to lock an object to one thread while it manipulates the object. No mutexes means no deadlocks or other headaches associated with threading.

What was I talking about? Ah yes, dependencies and how they relate to functional programming. Explicit version dependencies can be seen as mutexes: only one version is allowed to be used at once. Two versions of a library can not be loaded at the same time. This is good if the two versions are incompatible. It is bad if the newer version only adds new functionality to the library.

What if you would build your library in a way that resembles the pure functions of functional programming? No side effects in this case means there are no nasty surprises when upgrading. If your program works with version 1 of the library, it will work without changes with version 1000. Existing functionality is immutable.

To make this work, new versions should only introduce new behaviour, they can not change old behaviour. I think making bugfixes would be ok, but performance enhancements are not, as it might be that you introduce negative side effects in some edge cases, thereby breaking someone's app. Maybe fixing bugs will actually break someone's app if they depended on buggy behaviour. Hmmm....

This makes dependency management rather easy. You set a minimum version requirement for the libraries you use and you can just upgrade the libraries to newer versions as they become available. New applications that use new features can co-exist with old applications that use old features from the same library.

Under this model, if you plan to radically re-architect a project, you could fork it and release it under a new name. Rails v1., Rails2 v1, Rails3 v1. A downside is that forks can have a large shared codebase, but there will no longer be conflicts between versions of one project.

Has anyone ever explored the possibilities of library development along these lines? Did it work or were there problems that I have overlooked? What good features of the 'current' systems would you lose?

No comments: