Development | magpiebrain

The talk I did at QCon SF 2009 is now available at infoq. Only an MP3 download is available, otherwise you’ll have to stream it from the site – but you’ll be missing a lot, as the slides are better than hearing me drone on.

March 12, 2010

I’ve been invited to speak on colleague Chris Read’s track at QCon London this March. The track itself is chock full of a number of experienced proffesionals (including two ex-colleagues) so I fully intend to raise my game accordingly. We’re lucky enough to have Michael T. Nygard speaking too, author of perhaps the best book written for software developers in years in the form of Release It!

The track – “Dev and Ops – a Single Team” – attempts to address many of the issues software professionals have in getting their software live. It will cover many aspects, both on the hardcore technical and on the softer people side. Hopefully it will provide lots of useful information you can take back to your own teams.

My talk – From Dev To Production– will be giving an overview of build pipelines, and how they can be used to get the whole delivery team focused on the end objective – shipping quality software as quickly as possible. It draws on some of my recent writing on build patterns, and a wealth of knowledge built up inside ThoughtWorks over the past few years.

My experience of QCon SF last year was excellent – I can thoroughly recommend it to any IT professional involved in shipping software. If you haven’t got your ticket already, go get them now before the prices go up!

February 16, 2010

I’ve recently been working on a Clojure application that I hope to open source soon. It’s been my first experience of using Clojure, and is almost certainly one of the most thought provking things I’ve done in a long while. One of the things that is still causing me issues is how to go about TDDing Clojure applications – or rather functional programs in general.

My natural inclination – for many reasons – is to use TDD as my process of choice for developing my code. Beyond its use as a design tool, it’s having a saftey net to catch me if I screw something up. It allows me to be a little more brave, and drastically reduces the cycle between changing some code and being happy that it works. I’m used to that saftey net – I feel lost without it.

Stuart Halloway said during his Clojure talk at Qcon SF that despite being a TDD fan he finds it hard to TDD in a new language, and I get exactly what he means. A big part of it is that you’re getting to grips with the idioms, capabilities, libraries and tools associated with your new language – and a lack of this knowledge is going to impact on your ability to write good tests, let alone worry about implementing them.

Typically, when learning a new language I try and write a small application that has a real world need. BigVisibleWall was my attempt to learn Scala – but it had a real goal. With BigVisibleWall, as with my current Clojure project, I started by implementing the system by just writing the production code. I’m pushing the limits of my knowledge constantly, attempting to understand the size and shape of the solution space that I find myself in with this new tool. Once I got BigVisibleWall working with a small set of features, I broke it down and rewrote it TDD style – at that point, I had enough Scala (and I mean *just* enough) to be able to do this without it feeling like I was wading through treacle.

I consciously decided to follow the same pattern with my Clojure project. Code the main logic, get it running, then break it down and rewrite it piece by piece using TDD. But then I hit a problem – Scala and Java are similar enough languages that my programming style didn’t have to change much from one to the other. Therefore the way I structured the code and thought about TDD didn’t have to shift much. In both cases I was driving the design of an Object Oriented system. With Clojure though it wasn’t just the language which was different, it was so many of the underlying concepts were different. Put simply, I really don’t know where to begin.

My first instinct is to start decomposing functions, passing in stubs to the functions under test. But this just feels like I’m trying to shoehorn IOC-type patterns into a functional program. But what am I left with – testing large combinations of functions? That feels wrong too.

So what about you lot out there in blogland? Any other OO types trying to make the switch and encountering the same issues? Or any FP practitioners for whom TDD is second nature? Or does TDD just not fit with FP after all?

February 16, 2010

I’ve been working on a couple of spare time projects, both of which I hope to release more formally in the next few weeks. One of them involves development of a simple web application for deployment on Google App Engine. As part of the development, I had to modify an existing open source Clojure API – my changes are now available for all.

appengine-clj was written by ThoughtWorks colleague John Hume. It provides some Clojure-esque wrappers over Google App Engine’s user authentication and low level datastore API. John outlines his use of the library in a highly useful post on using Compojure & Clojure on the App Engine – it was this post which helped immensely in getting started myself.

There were a couple of minor issues with the latest version of John’s API which stopped me from being able to use it for my latest project – so I created a fork to make the changes I wanted. First, a general issue. I love projects which make checkout and build easy and bullet-proof. For me, that means check in the build tool & all dependencies. I know this is a contentious point – I may well write a post on it later. The other issue is that since the 1.2 version of the SDK some of the APIs have changed a little, so I updated the datastore testing macro accordingly.

My Clojure skills are highly limited, and the modest modifications are probably botched, but nonetheless it seems to work. My fork can be found on GitHub.

February 14, 2010

One of the problems quickly encountered when any new team adopts a Continuous Build is that builds become slow. Enforcing a Build Time Limit can help, but ultimately if all of your Continuous Build runs as one big monolithic block, there are limits to what you can do to decrease build times.

One of the main issues is that you don’t get fast feedback to tell you when there is an issue – by breaking up a monolithic build you can gain fast feedback without reducing code coverage, and often without any complex setup.

In a Chained Continuous Build, multiple build stages are chained together in a flow. The goal is for the earlier stages to provide the fastest feedback possible, so that build breakages can be detected early. For example, a simple flow might first compile the software and run the unit tests, with the next stage running the slower end-to-end tests.

With the chain, a downstream stage only runs if the previous stage passed – so in the above example, the end-to-end stage only runs if the build-and-test stage passes.

Handling Breaks

As with a Continuous Build you need to have a clear escalation process by which the whole team understands what to do in case of a break. Most teams I have worked with tend to stick to the rule of downing tools to fix the build if any part of the Continuous Build is red – which is strongly recommended. It is important that if you do decide to split your continuous build into a chain that you don’t let the team ignore builds that happen further along the chain.

Build Artifacts Once vs Rebuild

It is strongly suggested that you build the required artifacts once, and pass them along the chain. Rebuilding artifacts takes time – and the whole point of a chained build is to improve feedback. Additionally getting into the habit of building an artifact once, and once only, will help when you start considering creating a proper Build Pipeline (see below).

And Build Pipelines

Note that a chained build is not necessarily the same thing as a Build Pipeline. A Chained Continuous Build simply represents one or more Continuous Builds in series, whereas a Build Pipeline fully models all the stages a software artifact moves from development right through to production. One or more Chained Continuous Builds may form part of a Build Pipeline, and a simplistic Build Pipeline might not represent anything other than Chained Continuous Builds, but Build Pipelines will often incorporate activities more varied than compilation or test running.

Fast Feedback vs Fast Total Build Time

One thing to note is that by breaking a big build up into smaller sections to improve fast feedback, counterintuitively you may well end up increasing overall build time. The time to build and pass artifacts from one stage to another adds time, as does dispatching calls to build processes further down the chain. This balance has to be considered – consider being conservative in the splits you make, and always keep an eye on the total duration of your build chain.

Tool Support

Tooling can be complex. Simple straight-line chains can be relatively easily build using most continuous build systems. For example a common approach is to have one build check in some artifact which is the trigger point for another Continuos Build to run. Such approaches have the downside that the chain isn’t explicitly modelled, and reporting of the current state of the chain ends up having to be jury rigged, typically through custom dashboards. More complex still is dealing with branching chains.

Continuous Build systems have got more mature of late, with many of them supporting simple Chained Continuous Builds out of the box. TeamCity, Hudson and Cruise and others all have some form of (varying) support. Cruise probably has the best support for running stages in parallel (caveat: Cruise is written by ThoughtWorks, the company I currently work for), and has some of the better support for visualising the chains, but given the way all of these tools are moving expect support in this area to get much better over time.

January 24, 2010

Have you ever watched young children play football (of Soccer for our Atlantic cousins)? During the game, you can be certain of one thing – most of the team on both sides will be doing nothing but chasing the ball. There is no thought about the bigger picture, no tactical decision making (let alone anything as grand as strategy). they only thought on everyone’s minds is that “We need to get the ball”.

This thinking in children is understandable. Less clear can be the basis for this kind of behaviour within an organisation.

Typically, the ‘lets drop everything and Chase The Ball’ mentality comes from organisations which are primarily reactive, rather than proactive. Companies which exist almost always in crisis mode typically have a reactive attitude – working in teams like this can feel like you’re being buffeted by forces outside of your control, running from one disaster to another, never making progress towards where you need to go.

Small organisations that have got big often never get out of there reactive, firefighting mentality. All that happens is that more people are fighting the same fire.

To be clear I’m not talking about the problem being a single focus. I’m talking about a team or organisation fixating on a single focus which isn’t actually aligned with the longer term objectives. A bunch of kids chasing the ball need to think about winning the game, not getting the ball. Likewise an organisation needs to have clear line of sight to what its goals are, and always understand how what they are doing now gets them to where they want to go.

There will always be urgent, short term things that need to be addressed. The important thing is that they are dealt with in proportion, without loosing sight of the overall objectives. Ring-fence team members (making sure they aren’t always the same people) to Chase The Ball, and leave the larger organisation to focus on winning the game, or better yet winning the league.

Update: Fixed typo, thanks Ben and Julian!

January 17, 2010

Anyone who has worked in a team which uses a Continuous Build inevitably starts to learn about the cost of a long running build:

More time between checkin and a report of a failure
Higher chance of Continuous Build containing multiple checkins, increasing the chance of an integration break and complicating rollback
Fixing a build related to a checkin made much earlier decreases productivity, leading to a reduction in productivity

There are other ‘build’ times to be aware of. A long Checkin Gate build leads to an increased chance of someone else checking in before you, increasing the chance of an integration break when you do checkin. It also disrupts the developers normal flow – they cannot easily work on new code, so effectively have to down tools waiting for the Checkin Gate has finished. You also need to consider the time taken to run a single test – be it a small-scoped unit test, or a larger end to end test.

No matter what the build is, a long build interrupts programmer flow, decreasing focus, and therefore decreasing productivity.

Different Builds, Different Limits

As a team, you should decide on acceptable Build Time Limit for each ‘build’ – for example individual tests, Checkin Gates, and stages in your continuous build. You may even consider failing these builds if those time limits fail. Setting the Build Time Limit at the right level – and keeping it there – will help keep productivity high.

Different builds get run with different frequencies. The more often a build is run, the faster it needs to be. Experience suggests the following time limits:

Single small-scoped unit test – sub-second
End-to-end test – a few seconds
Checkin Gate – 30 seconds to a couple of minutes at most
Continuous Build – a handful of minutes

When your Continuous Build is part of a larger Build Pipeline, you may find it useful to set Build Time Limits for each stage in the pipeline. One might argue that enforcing Build Time Limits for each stage of a Build Pipeline – manual or automated – may be overkill, but having some reporting of when a limit is exceeded will help directly highlight bottlenecks in creating production deployable software.

Team Ownership

Teams must take ownership of ensuring that the Build Time Limit is enforced. Further, they should always look for opportunities to reduce them further. Any decision to increase any Build Time Limit should be taken by the whole team – likewise any decrease in a Built Time Limit with decreases test coverage should be agreed with all. Everyone should be empowered however to look for quick wins.

Some teams find the need for a Build Tzar/Build Cop role – someone who is in charge of the health of the build. I consider such roles as being short term measures only, and should certainly be considered an anti-pattern if they exist for any length of time. At the extreme end of this spectrum is the dedicated build team. Empowering the whole team is key.

Making Things Faster

There are a number of ways of making individual tests fast, which will depend both on the nature of the technology being used and the way it is being used. Consider making a Checkin Gate fast using a Movable Checkin Gate. Long Continuous Build times can be mitigated through the use of a Chained Continuous Build, perhaps as part of a larger Build Pipeline.

You may also want to simply remove tests that are slow but provide little coverage. Often, it may even be the case that slow running tests represent a performance issue in the system itself.

Some teams have also shown significant speed improvements by using the right hardware – such as faster CPUs, RAM disks or SSD drives. However simply throwing hardware at the problem can help speed a Continuous Build up, but this does little to affect the build time on local development machines – a situation where your continuous build is faster than your local development build is the opposite of what you want.

Selection Based On Planned Work

Periodically, you assess the kinds of work coming up. If you are using an iterative development process, you may do this at the beginning of each iteration. Based on the kinds of changes the team will be working on during the next period, select tests which cover these areas of code, removing others which cover functionality unlikely to change. The theory is that you are selecting tests that cover areas of code which are most likely to get broken. The tests should be selected such that they don’t exceed your Build Time Limit.

After each movement of the site driving the Checkin Gate, you can assess the success by looking at the failure rate of the Continuous Build.

The key is to have a series of well categorized tests – tagging could work well here.

Selection Based On Build Failure

An alternative technique for selecting the makeup of the Checkin Gate can be based on build failures. If tests not in the Checkin Gate start failing in your Continuous Build, put them into the Checkin Gate suite, swapping out other tests to keep you below your Build Time Limit.

Updates

Added link to the new Build Time Limit Pattern.

January 10, 2010

magpiebrain

Sam Newman's site, a Consultant at ThoughtWorks

Posts from the ‘Development’ category