magpiebrain

Sam Newman's site, a Consultant at ThoughtWorks

Archive for ‘January, 2010’

Pipeline from Flickr user Stuck in Customs One of the problems quickly encountered when any new team adopts a Continuous Build is that builds become slow. Enforcing a Build Time Limit can help, but ultimately if all of your Continuous Build runs as one big monolithic block, there are limits to what you can do to decrease build times.

One of the main issues is that you don’t get fast feedback to tell you when there is an issue – by breaking up a monolithic build you can gain fast feedback without reducing code coverage, and often without any complex setup.

In a Chained Continuous Build, multiple build stages are chained together in a flow. The goal is for the earlier stages to provide the fastest feedback possible, so that build breakages can be detected early. For example, a simple flow might first compile the software and run the unit tests, with the next stage running the slower end-to-end tests.

With the chain, a downstream stage only runs if the previous stage passed – so in the above example, the end-to-end stage only runs if the build-and-test stage passes.

Handling Breaks

As with a Continuous Build you need to have a clear escalation process by which the whole team understands what to do in case of a break. Most teams I have worked with tend to stick to the rule of downing tools to fix the build if any part of the Continuous Build is red – which is strongly recommended. It is important that if you do decide to split your continuous build into a chain that you don’t let the team ignore builds that happen further along the chain.

Build Artifacts Once vs Rebuild

It is strongly suggested that you build the required artifacts once, and pass them along the chain. Rebuilding artifacts takes time – and the whole point of a chained build is to improve feedback. Additionally getting into the habit of building an artifact once, and once only, will help when you start considering creating a proper Build Pipeline (see below).

And Build Pipelines

Note that a chained build is not necessarily the same thing as a Build Pipeline. A Chained Continuous Build simply represents one or more Continuous Builds in series, whereas a Build Pipeline fully models all the stages a software artifact moves from development right through to production. One or more Chained Continuous Builds may form part of a Build Pipeline, and a simplistic Build Pipeline might not represent anything other than Chained Continuous Builds, but Build Pipelines will often incorporate activities more varied than compilation or test running.

Fast Feedback vs Fast Total Build Time

One thing to note is that by breaking a big build up into smaller sections to improve fast feedback, counterintuitively you may well end up increasing overall build time. The time to build and pass artifacts from one stage to another adds time, as does dispatching calls to build processes further down the chain. This balance has to be considered – consider being conservative in the splits you make, and always keep an eye on the total duration of your build chain.

Tool Support

Tooling can be complex. Simple straight-line chains can be relatively easily build using most continuous build systems. For example a common approach is to have one build check in some artifact which is the trigger point for another Continuos Build to run. Such approaches have the downside that the chain isn’t explicitly modelled, and reporting of the current state of the chain ends up having to be jury rigged, typically through custom dashboards. More complex still is dealing with branching chains.

Continuous Build systems have got more mature of late, with many of them supporting simple Chained Continuous Builds out of the box. TeamCity, Hudson and Cruise and others all have some form of (varying) support. Cruise probably has the best support for running stages in parallel (caveat: Cruise is written by ThoughtWorks, the company I currently work for), and has some of the better support for visualising the chains, but given the way all of these tools are moving expect support in this area to get much better over time.

Football - from flickr user mosilagerHave you ever watched young children play football (of Soccer for our Atlantic cousins)? During the game, you can be certain of one thing – most of the team on both sides will be doing nothing but chasing the ball. There is no thought about the bigger picture, no tactical decision making (let alone anything as grand as strategy). they only thought on everyone’s minds is that “We need to get the ball”.

This thinking in children is understandable. Less clear can be the basis for this kind of behaviour within an organisation.

Typically, the ‘lets drop everything and Chase The Ball’ mentality comes from organisations which are primarily reactive, rather than proactive. Companies which exist almost always in crisis mode typically have a reactive attitude – working in teams like this can feel like you’re being buffeted by forces outside of your control, running from one disaster to another, never making progress towards where you need to go.

Small organisations that have got big often never get out of there reactive, firefighting mentality. All that happens is that more people are fighting the same fire.

To be clear I’m not talking about the problem being a single focus. I’m talking about a team or organisation fixating on a single focus which isn’t actually aligned with the longer term objectives. A bunch of kids chasing the ball need to think about winning the game, not getting the ball. Likewise an organisation needs to have clear line of sight to what its goals are, and always understand how what they are doing now gets them to where they want to go.

There will always be urgent, short term things that need to be addressed. The important thing is that they are dealt with in proportion, without loosing sight of the overall objectives. Ring-fence team members (making sure they aren’t always the same people) to Chase The Ball, and leave the larger organisation to focus on winning the game, or better yet winning the league.

Update: Fixed typo, thanks Ben and Julian!

Clock - from flickr user laffy4k Anyone who has worked in a team which uses a Continuous Build inevitably starts to learn about the cost of a long running build:

  • More time between checkin and a report of a failure
  • Higher chance of Continuous Build containing multiple checkins, increasing the chance of an integration break and complicating rollback
  • Fixing a build related to a checkin made much earlier decreases productivity, leading to a reduction in productivity

There are other ‘build’ times to be aware of. A long Checkin Gate build leads to an increased chance of someone else checking in before you, increasing the chance of an integration break when you do checkin. It also disrupts the developers normal flow – they cannot easily work on new code, so effectively have to down tools waiting for the Checkin Gate has finished. You also need to consider the time taken to run a single test – be it a small-scoped unit test, or a larger end to end test.

No matter what the build is, a long build interrupts programmer flow, decreasing focus, and therefore decreasing productivity.

Different Builds, Different Limits

As a team, you should decide on acceptable Build Time Limit for each ‘build’ – for example individual tests, Checkin Gates, and stages in your continuous build. You may even consider failing these builds if those time limits fail. Setting the Build Time Limit at the right level – and keeping it there – will help keep productivity high.

Different builds get run with different frequencies. The more often a build is run, the faster it needs to be. Experience suggests the following time limits:

  • Single small-scoped unit test – sub-second
  • End-to-end test – a few seconds
  • Checkin Gate – 30 seconds to a couple of minutes at most
  • Continuous Build – a handful of minutes

When your Continuous Build is part of a larger Build Pipeline, you may find it useful to set Build Time Limits for each stage in the pipeline. One might argue that enforcing Build Time Limits for each stage of a Build Pipeline – manual or automated – may be overkill, but having some reporting of when a limit is exceeded will help directly highlight bottlenecks in creating production deployable software.

Team Ownership

Teams must take ownership of ensuring that the Build Time Limit is enforced. Further, they should always look for opportunities to reduce them further. Any decision to increase any Build Time Limit should be taken by the whole team – likewise any decrease in a Built Time Limit with decreases test coverage should be agreed with all. Everyone should be empowered however to look for quick wins.

Some teams find the need for a Build Tzar/Build Cop role – someone who is in charge of the health of the build. I consider such roles as being short term measures only, and should certainly be considered an anti-pattern if they exist for any length of time. At the extreme end of this spectrum is the dedicated build team. Empowering the whole team is key.

Making Things Faster

There are a number of ways of making individual tests fast, which will depend both on the nature of the technology being used and the way it is being used. Consider making a Checkin Gate fast using a Movable Checkin Gate. Long Continuous Build times can be mitigated through the use of a Chained Continuous Build, perhaps as part of a larger Build Pipeline.

You may also want to simply remove tests that are slow but provide little coverage. Often, it may even be the case that slow running tests represent a performance issue in the system itself.

Some teams have also shown significant speed improvements by using the right hardware – such as faster CPUs, RAM disks or SSD drives. However simply throwing hardware at the problem can help speed a Continuous Build up, but this does little to affect the build time on local development machines – a situation where your continuous build is faster than your local development build is the opposite of what you want.

Further Reading

For more concrete evidence on how build times can influence the productivity of teams, Graham Brook’s paper for Agile 2008, Team Pace – Keeping Build Times Down, details experiences of working with two different teams and the impact of long (and short) build times on the development team. Thanks also go to Graham for reviewing an earlier draft of this article.

The reason so many New Year Resolutions get dropped, is because people start doing something out of the ordinary (for them) in order to institute a change, but never make that change a habit. It’s the reason dieting does not work – you shouldn’t go on a diet, you should change your diet. The former implies a one-off activity that will somehow leave you better off – and it may for a short period of time. The later states that you will change your habits, so that now on you will do something differently, from this point forward.

When people embark on a Great Rewrite, they are undertaking the equivalent of a crash diet. Sure, you loose a few pounds, but when the old habits come back, and the diet has finished, all those pounds come back. Rather than kid yourself that by starting afresh you’ll learn from your old mistakes, start making changes in what you do now. In other words work to change the diet you have, the system you have – that will lead to habit change which will stick. It’ll also force you to deal with the problems you caused in the first place.

Working to change the system you have now has other benefits – other than being more likely to institute a change that sticks. It makes it easier for the whole team to be involved with the change, rather than leaving some people supporting the old system. It allows you to trade-off delivering new features and bug-fixes against architectural changes. But perhaps most importantly it helps create a team which understands that fixing a situation – improving it for the better – is possible and achievable.

It has come to this. After many years of mis-directed mail, I have finally decided to put pen to paper (well, photon to monitor, but you get the idea) and state that I Am Not Sam Newman.

World, here me now. It is possible – nay likely – that more than one person has the same combination of first and last names as another individual. We know for example that there are at least 54 Dave Gormans in the world. That bloke went to the trouble of creating an entire TV series about the fact that the whole first name/surname thing doesn’t not guarantee a unique identifier for human beings. The Chinese, to their credit, have worked this out a while ago.

Now I’m trying to be nice about it. I have decided not to publish the emails from people asking me if I want to do documentary voice overs, well-wishers hoping my testicles get better soon (well, I think they meant prostate), or the offers to speak on the corporate circuit about my hilarious non-pc anecdotes about how I once called someone a monkey. Others, in a similar position to me, have very much gone on the offensive in this regard, but I’m not quite as funny as Tony Hawks (the comedian, not the skateboarder).

So, oh blogosphere, here my cry – I Am Not That Sam Newman – the controversial Australian sports personality. And for the record, despite the fact that I live in the UK, I’m not the other Sam Newman either – the Actor known for his voice over work, appearances in Holby City and the forthcoming lead role of Prince Andrei in War & Peace starring Brenda Blethyn and Malcolm McDowell.

I Am this Sam Newman.

But yes, I am related to Paul Newman. Feel free to forward on any royalty cheques my way.

So all the cool Clojure kids keep wanting me to use Emacs. The problem is that I haven’t used Emacs for the last 10 years – since, in fact, I had to support a C application on about 7 different flavours of UNIX. As you can imagine, I’ve since expunged many of those past memories.

My IDE of choice – ever since I joined ThoughtWorks – has been IntelliJ. Yes, I had to spend my time in the wilderness with Eclipse, long enough that I feel well placed to compare the two and consider IntelliJ superior for the languages I use often. La Clojure now seems to play nicely with IntelliJ’s Community Edition, so I’m giving that a try.

Ultimately, I’m learning a new language, one which often requires my brain to work in a quite different fashion than it is used to. As such, I’m trying to limit the number of new things I have to deal with. If, however, I’m missing out on something by not using Emacs, I may be persuaded to give it a go. So can anyone out there tell me what I’m missing?

The Checkin Gate defines a set of tests which need to pass before a developer checks in. Typically, the tests are a subset of the total test suite – selected to provide a good level of coverage, whilst running in a short space of time.

There is an inherent trade-off with a Checkin Gate though – you may end up having blank spots in your coverage of the gate itself, which can increase the frequency of build breakages in your Continuous Build. By applying a Movable Checkin Gate, you attempt to offset this shortcoming by changing what is in the Checkin Gate suite.

Selection Based On Planned Work

Periodically, you assess the kinds of work coming up. If you are using an iterative development process, you may do this at the beginning of each iteration. Based on the kinds of changes the team will be working on during the next period, select tests which cover these areas of code, removing others which cover functionality unlikely to change. The theory is that you are selecting tests that cover areas of code which are most likely to get broken. The tests should be selected such that they don’t exceed your Build Time Limit.

After each movement of the site driving the Checkin Gate, you can assess the success by looking at the failure rate of the Continuous Build.

The key is to have a series of well categorized tests – tagging could work well here.

Selection Based On Build Failure

An alternative technique for selecting the makeup of the Checkin Gate can be based on build failures. If tests not in the Checkin Gate start failing in your Continuous Build, put them into the Checkin Gate suite, swapping out other tests to keep you below your Build Time Limit.

Updates

Added link to the new Build Time Limit Pattern.

I’m currently working on a personal project by way of learning Clojure – it’s actually a program to match up my itemised phone bill against my list of contacts to help me expense my calls. I find it best to have a real-world problem I need to solve to learn a new programming language. The problem itself is rather dull, but it did give me a chance to consider an issue I’ve hit with many other languages.

One of the core parts of my telephone expense program is the process of normalising phone numbers so I can match them up. What I am trying to do is something long the lines of:

Strip spaces, then add the missing area code, then internationalize it

So in Clojure there are a number of functions I’ve written, each of which take, and return, a string (the program is nowhere near finished, so consider this to be virtually pseudo code) :

(defn #^String normalize [str]
  (internationalize (add-missing-areacode (strip-spaces str))))

In Java, this would look like:

public String normalize(String str) {
  return internationalize(addMissingAreaCode(stripSpaces(str)));
}

The problem is that I, and most of the western world, read from left to right – with both Java and Clojure I’m having to read from right to left to determine what is being done. One system I use frequently has a construct which matches what I’m after – UNIX:

strip-spaces "44 1230 9183" | add-area-code | internationalize

So what other languages support this kind of construct? I suspect I could coax Scala into doing something like this, and it seems that it is right up Python’s alley (Django’s excellent templating system has filters which do exactly that). But if I want to use Clojure, am I stuck with this inside-out programming model? What other JVM-based languages would help me here – Ioke perhaps? It seems right up AINC’s alley, but that syntax makes me want to cry…

Update 11 Jan 2010: Thanks to Matt for pointing me towards Clojure’s ‘->‘ macro. This looks pretty close to what I’m after. So I *think* I should be able to do something along the lines of:

(-> phoneNumber stripSpaces addAreaCode internationalize)

Which is very cool.

There is a rustle in the posit-in notes. The water cooler ripples. USB-powered missile launchers inexplicably fire, whilst nerf guns jam mid-battle. There is the smell of sulfur in the air. The Great Rewrite Approaches.

The signs were there. Grumbling from the developers – sometimes new to the project. “This code is horrible!”, “Completely unfit for purpose!”, “If only we could start again…”.

Delays to new functionality are laid at the door of the code. The one and only solution now on offer is to rewrite the entire codebase – nothing short of this will help. Eventually, managers are won over, and The Great Rewrite begins.

It is an epic undertaking. Some poor fools have to stay behind and look after the existing system, whilst others forge ahead into a brave, new world, leaving the horrid, old, decrepit and so uncool system behind.

Morale soars – the developers have a spring in their step. The business, initially, is confident. “Don’t worry – the new version is right around the corner!” they are told. Meanwhile support for the existing system is suffering – the team maintaining the existing codebase is a fraction of the size it used to be, and most of the senior technical people have to be on the rewrite.

The natives grow restless – the system they use, day in, day out, isn’t moving on. Feature requests seem to disappear into a black hole. “Soon” they are promised. “Soon, all your dreams will come true! Once The New System is launched, what you want is top of the list!”.

Months pass. And still, the rewrite continues. But it is closer now – inching towards readiness. Finally, long overdue, The New System is ready. The users are excited – all the recent troubles are to cease, as The Great Rewrite is over.

And now, the launch day.

There are bugs. Things that used to work, don’t work any more. There are few, if any new features. The system is new, but doesn’t offer the users anything new – but they have to learn to get to grips with The New System. The disgruntled emails start.

“Don’t worry!” says the Project Manager. Now The Great Rewrite has finished, the new features will arrive any day now!

And some of them do. Initially, at least, new features are easier than before to create, and ship. But after time, the same problems with the code base emerge. It turns out that having the same group of people building the same old system without changing their approach or ideas doesn’t lead to a different type of system. They never had to deal with the old issues head-on, they just sidestepped them, pressing on into the greenfield.

More time passes. Features take longer to ship, the code is harder to deal with. And once again, talk turns to another Great Rewrite…