Subversion sucks, get over it

The defacto standard for open source version control systems has been Subversion for the last several years. While CVS is still in use some places, Subversion is miles ahead. While Subversion has served many people well, it has some failings that make it inappropriate for several project classes. The most important of these are open source projects. This post is going to look at why Subversion sucks for open source projects. I'll look at how these arguments also apply to internal business source code management in a future post.

The primary problem with Subversion is the centralized repository. This manifests itself in several ways. Firstly, you must have operations level access to create a new project repository. Secondly, you must have commit access to touch the history of a project. Thirdly, developers are dependent on the project infrastructure to contribute. There are probably more, but today I'll talk about these.

Creating new Subversion project repositories

Creating a new Subversion repository requires access to the svn-admin command on the box running a project's subversion repositories. This means access (possibly indirect) to a shell account. This raises the bar quite high to be able to create new repositories. This might not seem like a big deal. There's even an ugly hack pattern to work around it. Instead of creating new repositories, organizations put everything in the same Subversion repository. An example of this anti-pattern can be seen in the ASF Subversion repository. This is plain bad design. Navigating through these massive repositories is a pain, dealing with commit access becomes a much more vast security issue and the structure of the trunk/tags/branches pattern is broken.

Touching project history

Touching project history might seem like a holy right that should be reserved vetted people, but this is wrong. Users, not project leads, are the final deciders of code value. Political differences in a project should not impact what code is finally distributed. Maintaining patches out of tree violates the fundamental premise of source code management systems; That source code management should be automated, and not done by hand. Source code management systems that encourage out of tree maintainers to abandon source code management are therefore very problematic.

The other assumption is that an official project contributor is always more qualified than a non-contributor has been shown to be false several times. In fact, it's a central premise in the free software movement, the open source community's Right To Fork and the basis of any free market paradigm. Relying on a source code management system that has a centrally controlled access list therefore runs fundamentally counter to ideals that contribute to software quality. This doesn't imply that Subversion leads to worse software, or that it isn't reconcilable to these ideals through clever workarounds, but the dissonance is there and needs to be addressed.

Dependence on infrastructure

The third disadvantage of a central repository is that the lack of local history means one relies on infrastructure availability for source code management. There are primarily two situations where this is important: when the infrastructures fail or when they are unavailable. Infrastructure failure can happen if a server goes down, if a local internet connection fails or a host of other events that affect access to the central repository. Being able to continue to perform source code management under these conditions is important, because infrastructure failure will happen. For open source projects this is important because time is the most valuable asset a developer can contribute.

Other than infrastructure failure, developers are often able to code in places where infrastructure simply isn't available. Internet access is growing more and more ubiquitous, but there are still places to code that don't have access. Whether it's on an airplane, train, in a car or at a cafe without wifi, there are times when project infrastructure simply isn't available and as previously mentioned, time is the most valuable asset of an open source project.

The alternative: Distributed source code management

My distributed source code management system of choice is Git, but that doesn't mean it's right for you. The popular choices these days are Git, Mercurial and Bazaar. There are others, with tradeoffs of their own.

While distributed source code management systems don't solve how to create central project repositories, they make repository creation trivial. This is a big deal. It means that you can start an experimental project with full source code management without polluting the namespace of central repository. Instead of using the stupid One Big Repository anti-pattern, repositories are cheap things that can be created and destroyed on demand. Some work must be done to make central repository hosting easier, which has given rise to services like GitHub, BitBucket (Mercurial) and Launchpad (Bazaar). These are great ways to trivially host open source projects. Since they're offered as free services to open source projects, the need to maintain any repository oriented infrastructure simply melts away.

The way distributed source code management systems deal with commit access is ingenius. Since anyone can create history, but a project lead still owns their repository, the project lead can pick and choose history elements rather than digging through patchesets. Instead of sending a patch over email, someone can maintain a fully revisioned repository and send individual commits. This reduces the load for both contributor and project lead, as well as supporting the old commit access structure.

Distributed version control systems give people the ability to maintain a full project history along with patchsets out of tree as the default mode of operation. The issue of touching history simply goes away.

Since these distributed systems give full repository access locally, the dependence on infrastructure falls away, allowing people to continue to work during infrastructure failures or in areas without access to infrastructure and sync their changes back when they finally become available again.

There are other advantages of these systems over Subversion, but these are the ones related to the core assumption of centrally hosted revision control versus locally hosted revision systems.

The business end of things

So far the assumption has focused on open source projects, but almost all these points apply in some fashion to the business case as well. The cases are more varied and not necessarily as clear, but they are all there. I'll look at these issues in a future post.