Andreas Jacobsen’s Distraction

Another cause of procrastination

Bruce Eckel announces build.py

with 8 comments

Here was me complaining about the lack of good build tools for Python and in a wonderful synchronicity Bruce Eckel’s gone and made a new one. The post is part of a longer series on Python decorators, which is worth reading in its own right.

Aside from the unfortunate name (can we hope it gets changed to something more googlable?), the tool looks interesting. The basic idea is to use decorators to define rules and plain Python functions to fulfill them. It looks based on the older style build tools (like Make, Ant and Rake) where one uses recipes to build rather than inferring targets from project structures and compiler configurations (like Maven and Buildr). This isn’t necessarily bad, as it gives more finegrained control of what happens as a part of a build.

The key feature that build.py brings from Rake is that the build tool allows you to use a fully featured programming language to define how tasks are executed. This is important for allowing users to create tasks that aren’t predicted by plugin makers for the original build tool.

This is still just the first release, and it should be more fleshed out as he finishes the Python book he’s using build.py for. I look forward to seeing future developments and adoption.

(Incidentally, someone mentioned Vellum in the comments on the post. It’s probably worth looking into as well.)

Written by Andreas

October 29, 2008 at 10:45 am

Posted in Python

Tagged with , ,

Scala needs its own ecosystem

with 2 comments

In this post on Graceless Failures, John Kalucki points out that Scala’s RichString and Java’s String are completely uncomparable. Neither the == operator nor the equals method return correct equality when comparing RichInt with Integer either. Kalucki’s conclusion is to avoid RichStrings when possible, and coerce any back to String as soon as possible. I’ve bumped into this sort of thing a bit too often when writing Scala, and find it frustrating. The extensions to basic classes done in Scala work against the powerful typing system in my opinion, and the result is impedence rather than clarity.

At the time of this writing the only reply is from Martin Odersky, stating that this problem is going to be solved in version 2.8.0. I find it interesting that he views this as a language level problem. To a degree, it obviously is. If the language did something that made Java primitive boxing classes and Scala Rich primitives directly comparable, the problem would be solved.

This neglects the correct tool for the job, something that Scala design tends to do because of its focus on language level solutions. To me, the reliance on Java types implies a lack of appropriate Scala libraries and Scala library wrappers. One of the advantages of Scala is living in the Java ecosystem, but since Scala has its own idiomatic styles, we’re generally better off wrapping Java libraries in Scala code so that they appear to the user as Scala styled libraries. This might seem like a waste of time and effort, but the upshot is worth it. The client code becomes cleaner, more Scala-like and easier to read since it doesn’t expend noise on working around Java/Scala integration issues.

However, as Martin Odersky’s comment implies, the Scala research team intends to continue playing whack-a-mole with language design to make Java and Scala completely interoperable (or something). This is natural, solving these problems at a language level is much more interesting than writing libraries and wrapper APIs. But that doesn’t mean it’s more useful. And while the Rich primitives are one example of such a problem, it’s far from the only one. The Scala collections library is horrendously incompatible with the Java collections, and with good reason. I would never want an implicit conversion to change the semantics of my underlying types! RichStrings and Strings may be semantically alike, but a Scala immutable Set and a Java Set are not. (Yes, I could make the Java Set immutable, but I’ve yet to encounter the Java library that returns immutable collections.)

If Scala is to gain more mainstream adoption, development on libraries and frameworks needs to be the next big push, not further language improvements. This is, I guess, the disadvantage of a research institution driving development. We can rely on the EPFL continuing to work on and maintain Scala, but unlike a business value oriented company like Sun, we can’t rely on them taking on tasks without research value. This isn’t a completely bleak situation though, since this is open sourced software, the Scala community can take action and make the necessary push. Scala needs its own ecosystem. In the long run we just can’t rely on the Java ecosystem, because there’s too much cognitive dissonance going on.

Written by Andreas

October 27, 2008 at 2:59 pm

Posted in JVM Languages

Tagged with ,

Subversion sucks, get over it

with 28 comments

The defacto standard for open source version control systems has been Subversion for the last several years. While CVS is still in use some places, Subversion is miles ahead. While Subversion has served many people well, it has some failings that make it inappropriate for several project classes. The most important of these are open source projects. This post is going to look at why Subversion sucks for open source projects. I’ll look at how these arguments also apply to internal business source code management in a future post.

The primary problem with Subversion is the centralized repository. This manifests itself in several ways. Firstly, you must have operations level access to create a new project repository. Secondly, you must have commit access to touch the history of a project. Thirdly, developers are dependent on the project infrastructure to contribute. There are probably more, but today I’ll talk about these.

Creating new Subversion project repositories

Creating a new Subversion repository requires access to the svn-admin command on the box running a project’s subversion repositories. This means access (possibly indirect) to a shell account. This raises the bar quite high to be able to create new repositories. This might not seem like a big deal. There’s even an ugly hack pattern to work around it. Instead of creating new repositories, organizations put everything in the same Subversion repository. An example of this anti-pattern can be seen in the ASF Subversion repository. This is plain bad design. Navigating through these massive repositories is a pain, dealing with commit access becomes a much more vast security issue and the structure of the trunk/tags/branches pattern is broken.

Touching project history

Touching project history might seem like a holy right that should be reserved vetted people, but this is wrong. Users, not project leads, are the final deciders of code value. Political differences in a project should not impact what code is finally distributed. Maintaining patches out of tree violates the fundamental premise of source code management systems; That source code management should be automated, and not done by hand. Source code management systems that encourage out of tree maintainers to abandon source code management are therefore very problematic.

The other assumption is that an official project contributor is always more qualified than a non-contributor has been shown to be false several times. In fact, it’s a central premise in the free software movement, the open source community’s Right To Fork and the basis of any free market paradigm. Relying on a source code management system that has a centrally controlled access list therefore runs fundamentally counter to ideals that contribute to software quality. This doesn’t imply that Subversion leads to worse software, or that it isn’t reconcilable to these ideals through clever workarounds, but the dissonance is there and needs to be addressed.

Dependence on infrastructure

The third disadvantage of a central repository is that the lack of local history means one relies on infrastructure availability for source code management. There are primarily two situations where this is important: when the infrastructures fail or when they are unavailable. Infrastructure failure can happen if a server goes down, if a local internet connection fails or a host of other events that affect access to the central repository. Being able to continue to perform source code management under these conditions is important, because infrastructure failure will happen. For open source projects this is important because time is the most valuable asset a developer can contribute.

Other than infrastructure failure, developers are often able to code in places where infrastructure simply isn’t available. Internet access is growing more and more ubiquitous, but there are still places to code that don’t have access. Whether it’s on an airplane, train, in a car or at a cafe without wifi, there are times when project infrastructure simply isn’t available and as previously mentioned, time is the most valuable asset of an open source project.

The alternative: Distributed source code management

My distributed source code management system of choice is Git, but that doesn’t mean it’s right for you. The popular choices these days are Git, Mercurial and Bazaar. There are others, with tradeoffs of their own.

While distributed source code management systems don’t solve how to create central project repositories, they make repository creation trivial. This is a big deal. It means that you can start an experimental project with full source code management without polluting the namespace of central repository. Instead of using the stupid One Big Repository anti-pattern, repositories are cheap things that can be created and destroyed on demand. Some work must be done to make central repository hosting easier, which has given rise to services like GitHub, BitBucket (Mercurial) and Launchpad (Bazaar). These are great ways to trivially host open source projects. Since they’re offered as free services to open source projects, the need to maintain any repository oriented infrastructure simply melts away.

The way distributed source code management systems deal with commit access is ingenius. Since anyone can create history, but a project lead still owns their repository, the project lead can pick and choose history elements rather than digging through patchesets. Instead of sending a patch over email, someone can maintain a fully revisioned repository and send individual commits. This reduces the load for both contributor and project lead, as well as supporting the old commit access structure.

Distributed version control systems give people the ability to maintain a full project history along with patchsets out of tree as the default mode of operation. The issue of touching history simply goes away.

Since these distributed systems give full repository access locally, the dependence on infrastructure falls away, allowing people to continue to work during infrastructure failures or in areas without access to infrastructure and sync their changes back when they finally become available again.

There are other advantages of these systems over Subversion, but these are the ones related to the core assumption of centrally hosted revision control versus locally hosted revision systems.

The business end of things

So far the assumption has focused on open source projects, but almost all these points apply in some fashion to the business case as well. The cases are more varied and not necessarily as clear, but they are all there. I’ll look at these issues in a future post.

Written by Andreas

October 26, 2008 at 3:21 pm

Static vs dynamic still isn’t about typing

leave a comment »

Tony Morris claims that Java and Ruby don’t generalize to static and dynamic typing systems. He’s right, but it doesn’t really matter. What matters is that in the context of the discussion, we’re differentating between traditional system development languages (C++, Java and C#) and newer system development languages (Python, Ruby). There are other key differences between these languages, but the easiest to identify is that the former are type-checked at parse time and the latter aren’t.

For some reason, when I start talking about Python or Ruby to people who have written a lot of the traditional system development languanges, it’s the typing that trips them up. They ask how to these languages deal with things like maintainability or IDE support (for refactoring). So people who speak warmly of Python and Ruby didn’t choose to make this about static or dynamic type systems, but that’s the position we find ourselves defending.

Talking about OCaml, ML, Haskell or Scala doesn’t change the fact that we’re presenting alternatives to Java and C#. Yes, it’s possible to have terse parse time type-checking. That’s great! If only Java and C# people who get hung up on type checking knew about these languages, right?

Incidentally, Tony Morris linked to Chris Smith’s excellent article on typing. I’ve pretty much violated all the principles Smith presents, but as long as you substitute ‘static typing’ with ‘whatever Java does’ and ‘dynamic typing’ with ‘whatever Python or Ruby’ does in my writing, it should be pretty clear. Sure, there are alternatives, but they’re not present in the context of this discussion. When they are, I’ll have to revise what I say. Until then, I’ll keep using Java and ‘statically typed language’ interchangably.

Written by Andreas

October 16, 2008 at 2:02 pm

Posted in Religion

Tagged with , , , ,

Static vs dynamic isn’t about typing

leave a comment »

In this post about maintenance of applications written in dynamic languages Ola Bini stirs up the whole static vs dynamic debate (again). It’s a post well worth reading, especially the comments. One thing Bini points out is that in the context of this discussion, normally, statically typed language is a synonym for Java (and possibly C#), and dynamically typed means Python and Ruby.

This really is a key observation. If you say static typing is too verbose, anyone who’s familiar with OCaml, Haskell or Scala disagrees. But the fact is that our current selection of useful statically typed languages are verbose. C# isn’t quite as bad as Java, but Java is terrible.

The question then becomes whether Ruby or Python are less maintainable than Java. In my experience, the answer is ‘that really depends on your project’. Maintainability isn’t a language feature. No matter what language someone is writing they can create code that is difficult to maintain. The qualities of maintainable code is different in different style languages. A coding style that may be easier to deal with down the line with one language can be more difficult in another language.

In my experience tests are more far important than static type checking when doing refactorings. And personally, I find Java to be worse for reading code than Ruby or Python. People easily become thoughtless code generators when dealing with Java. Just keep hitting ctrl+space until it compiles, right? Meanwhile, languages like Ruby and Python make you focus on what the code actually does when it runs. Neither of these statements are absolutes and they’re both limited by my experience, but I have seen both sides of things and my preference is clear.

Written by Andreas

October 15, 2008 at 5:17 pm

Posted in Religion

Tagged with , , , ,

Using Python setuptools on the mac

with 5 comments

Python’s standard tool for package management is setuptools. The version of setuptools bundled with Mac OS X Leopard is 0.6c7. Unfortunately, setuptools is not self-upgrading, in that it won’t replace the easy_install script in /usr/bin, and there’s no official .dmg/.pkg to upgrade it. This is important because the easy_install script that’s used to install new packages has a hardcoded version of setuptools in it, that it reads from the Python libraries bundled with Leopard.

The hardcoded version string in easy_install became a problem when I tried to install a package that relied on a newer version of setuptools:
$ sudo easy_install -U py
Searching for py
Reading http://pypi.python.org/simple/py/
Reading http://pylib.org
Reading http://codespeak.net/py/0.9.2/download.html
Reading http://codespeak.net/py
Reading http://pypi.python.org/simple/py/XXX
Best match: py 0.9.2
Downloading http://pypi.python.org/packages/source/p/py/py-0.9.2.zip#md5=8447b2ba4c7b4062fcd08aab3377f040
Processing py-0.9.2.zip
Running py-0.9.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-PWyaOs/py-0.9.2/egg-dist-tmp-qz0KLA
The required version of setuptools (>=0.6c8) is not available, and
can't be installed while this script is running. Please install
a more recent version first, using 'easy_install -U setuptools'.

(Currently using setuptools 0.6c7 (/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python))
error: Setup script exited with 2

Installing a newer version of setuptools didn’t actually help, since easy_install doesn’t get touched by this. There are two (sensible solutions) to this. Either edit /usr/bin/easy_install to reflect the newer version of the setuptools package, or use the easy_install module from python rather than the executable. The latter is preferable since it doesn’t involve manually changing stuff in /usr/bin, which is just plain wrong.

So this is how to correctly install packages that rely on a version of setuptools newer than .6c7 on a Mac:
$ sudo python -m easy_install py
Searching for py
Best match: py 0.9.2
Processing py-0.9.2-py2.5-macosx-10.5-i386.egg
Adding py 0.9.2 to easy-install.pth file
Installing py.cleanup script to /usr/local/bin
Installing py.lookup script to /usr/local/bin
Installing py.countloc script to /usr/local/bin
Installing py.rest script to /usr/local/bin
Installing py.test script to /usr/local/bin

Using /Library/Python/2.5/site-packages/py-0.9.2-py2.5-macosx-10.5-i386.egg
Processing dependencies for py
Finished processing dependencies for py

This works because python searches sys.path, and the /Library/Python site packages are placed before the bundled packages.

The state of easy_install isn’t that great. There are basically three alternatives to installing python packages. One is to use the OS package manager, which works on Linux distros like Debian/Ubuntu, where just about everything is ported to a .deb and put in the apt repositories. Unfortunately, macports doen’t have many python packages. The other is to use easy_install, warts and all. The third is to download source distros and use distutils to install them (using python setup.py install), which has a very nice retro feel to it. Fortunately, help does seem to be on the way.

Written by Andreas

October 10, 2008 at 11:00 pm

Project structure and unit testing with Python

leave a comment »

I’ve picked up Python again recently (dangerous, I know), and solved a couple euler problems to get back in the feel of things. Being back to Python is a little bit like flying, but I have noticed one problem. There isn’t really a good build and distribution tool for it.

For each language I use to solve euler problems, I’ve set up a project with some sort of build tool that compiles the non-interpreted languages and runs unit tests that check the outputs for correctness. When I started solve problems in Python I couldn’t really find a good guide to setting up projects, both for filesystem and build tools. I realized that all the Python I’d done previously had either been self-contained scripts or structured in some sort of ad hoc fashion.

This post by Jp Calderone has some good guidelines for filesystem structure. I like that he specifies to make the application usable from the project directory, while still making it installable. The whole setup.py thing is based on distutils, a set of packages for making a python library/application installable. Distutils has its set of problems, but is generally pretty good.

Some further investigation uncovered py.test, part of the py library. It’s appealing for several reasons. First of all, it’s very non-intrusive. Tests live in any module, and are named test_something.py or something_test.py. In each of these source files, any function or method that starts with test_ is run. I put all my tests in a submodule of my source module named test and created a test_ file corresponding to each of them, with a test_ function for each of my euler solutions. I could have put them in a module of their own, to keep the tests entirely separate from the solutions module, but I preferred to keep the test module namespaced.

The second reason I like py.test is the py.test commandline runner. Executing py.test at the top level of any project (in fact, any directory) will send it recursing through subdirectories looking for candidates for testing. The simplicity (compared to, for example, setting up JUnit testing with Maven2) is very satisfying. And while it doesn’t integrate directly with distutils, being a simple, unparameterized commandline program means I could easily use it in a script for preparing releases. There’s also the buildutils project, which extends distutils with among other things, py.test integration.

While neither distutils or py.test present 360 degree solutions to packaging and testing, they have the inherently pleasant feel of many python tools and libraries in that they make dealing with their target problems very easy. These kinds of tools and libraries are part of the many things that make Python a lot like flying. Being able to write executable pseudo-code is another.

Written by Andreas

October 10, 2008 at 8:38 pm

Preload application classes on rails 2.1

leave a comment »

I’ve been playing around with deploying JRuby on Rails onto an application server a bit over the last few weeks, and one of the constant annoyances is that the first request takes a goodly amount of time to complete due to Rails not preloading the application classes. With a couple hundred ActiveRecords that need their metadata read from the DB, this can take a while.

There are a couple of solutions described for preloading applications when deploying on Phusion/Passenger, but there’s nothing really for JRuby and application servers. However, it turns out that this behavior is being introduced as default in 2.2, the next version of Rails. The specific commit that introduces it is 3bd34b6. While it doesn’t apply cleanly to 2.1, it’s pretty easy to introduce. Since we freeze rails into our application, this solution worked quite well.

The upshot is that loading the application takes a little longer, but the first request goes down from taking 30-40 seconds on JBoss on my development box to a much snappier 3-4 seconds. This is particularly a benefit when a restarted server isn’t access immediately, but only when something important needs to be done a few hours later.

Written by Andreas

October 10, 2008 at 2:51 pm

Posted in Frameworks

Tagged with , ,

Use videos to deliver bug reports

leave a comment »

Giles Bowkett found a bug in GitHub, and documented it with a video. This got me thinking about embedding videos in JIRA (or your favorite issue tracker) on bug reports. I don’t know if there’s a way to do this at the moment, and it’d certainly require a bit of infrastructure to support. But think of the possibilities. Instead of writing out what steps are necessary to recreate a bug, testers can upload a video capture…

Then again, videos still aren’t greppable.

Written by Andreas

October 9, 2008 at 5:05 pm

Posted in Issue tracking

Tagged with , ,

“Library” oriented programming

leave a comment »

Apparently Java is a library oriented programming language because code is packaged as a… library. This makes me wonder what distinguishes it from C. I used to think it was this whole classes and objects thing, but I guess I was wrong. Maybe Java should be called Garbage Collection oriented programming.

Written by Andreas

October 6, 2008 at 10:39 pm

Posted in JVM Languages

Tagged with , , ,