magpiebrain

Sam Newman's site, a Consultant at ThoughtWorks

Posts from the ‘Build And Deployment’ category

I’ll be running my new talk “Designing For Rapid Release” at a couple of conferences in the first half of this year. First up is the delightfully named Crash & Burn in Stockholm, on the 2nd of March. Then later in May I’ll be at Poznan in Poland for GeeCon 2012.

This talk focuses on the kinds of constraints we should consider when evolving their architecture of our systems in order to enable rapid, frequent release. So much of the conversation about Continuous Delivery focuses on the design of build pipelines, or the nuts and bolts of CI and infrastructure automation. But often the biggest constraint in being able to incrementally roll out new features are the problems in the design of the system itself. I’ll be pulling together a series of patterns that will help you identify what to look for in your own systems when moving towards Continuous Delivery.

Leave a comment

From http://www.flickr.com/photos/bigduke6/258262809/I’ve been playing around with both Chef and Vagrant recently, getting my head around the current state of the art regarding configuration management. A rather good demo of Chef at the recent DevOpsDays Hamburg by John Willis pushed me towards Chef over Puppet, but I’m still way to early in my experimentation to know if that is the right choice.

I may speak more later about my experiences with Vagrant, but this post is primarily concerning Chef, and specifically thoughts regarding repeatability.

Repeatability

Most of us I hope, check our code in. Some of us even have continuous integration, and perhaps even a fully fledged deployment pipeline which creates packages representing our code which have been validated to be production ready. By checking in our code, we hope to bring about a situation whereby we can recreate a build of our software at a previous point in time.

Typically however, deploying these systems requires a bit more than simply running apt-get install or something similar. Machines need to be provisioned and dependencies configured, and this is where Chef and Puppet come in. Both systems allow you to write code that specifies the state you expect your nodes to be in to allow your systems to work. To my mind, it is important therefore that the version of the configuration code needs to be in sync with your application version. Otherwise, when you deploy your software, you may find that the systems are not configured how you would like.

Isn’t It All About Checking In?

So, if we rely on checking our application code in to be able to reproduce a build, why not check our configuration code into the same place? On the face of it, this makes sense. The challenge here – at least as I understand the capabilities of Chef, is that much of the power of Chef comes from using Chef Server, which doesn’t play nicely with this model.

Chef Server is a server which tells nodes what they are expected to be. It is the system that gathers information about your configured systems allowing discovering via mechanisms like Knife, and also how you push configuration out to multiple machines. Whilst Chef Server itself is backed by version control, there doesn’t seem to be an obvious way for an application node to say “I need version 123 of the Web Server recipe”. That means, that if I want to bring up an old version of a Web node, it could reach out and end up getting a much newer version of a recipe, thereby not correctly recreating the previous state.

Now, using Chef Solo, I could check out my code and system configuration together as a piece, then load that on to the nodes I want, but I loose a lot by not being able to do discovery using Knife and similar tools, and I loose the tracking etc.

Perhaps there is another way…

Chef does have a concept of environments. With an environment, you are able to specify that a node associated with a specific environment should use a specific version of a recipe, for example:

name "dev"
description "The development environment"
cookbook_versions  "couchdb" => "11.0.0"
attributes "apache2" => { "listen_ports" => [ "80", "443" ] }

The problem here is that I think the concept of being able to access versions of my cookbooks is completely orthogonal to environments. Let’s remember the key goal – I want to be able to reproduce a running system based on a specific version of code, and identify the right version of the configuration (recipes) to apply for that version of the code. Am I missing something?

I’ve been invited to speak on colleague Chris Read’s track at QCon London this March. The track itself is chock full of a number of experienced proffesionals (including two ex-colleagues) so I fully intend to raise my game accordingly. We’re lucky enough to have Michael T. Nygard speaking too, author of perhaps the best book written for software developers in years in the form of Release It!

The track – “Dev and Ops – a Single Team” – attempts to address many of the issues software professionals have in getting their software live. It will cover many aspects, both on the hardcore technical and on the softer people side. Hopefully it will provide lots of useful information you can take back to your own teams.

My talk – From Dev To Production– will be giving an overview of build pipelines, and how they can be used to get the whole delivery team focused on the end objective – shipping quality software as quickly as possible. It draws on some of my recent writing on build patterns, and a wealth of knowledge built up inside ThoughtWorks over the past few years.

My experience of QCon SF last year was excellent – I can thoroughly recommend it to any IT professional involved in shipping software. If you haven’t got your ticket already, go get them now before the prices go up!

Pipeline from Flickr user Stuck in Customs One of the problems quickly encountered when any new team adopts a Continuous Build is that builds become slow. Enforcing a Build Time Limit can help, but ultimately if all of your Continuous Build runs as one big monolithic block, there are limits to what you can do to decrease build times.

One of the main issues is that you don’t get fast feedback to tell you when there is an issue – by breaking up a monolithic build you can gain fast feedback without reducing code coverage, and often without any complex setup.

In a Chained Continuous Build, multiple build stages are chained together in a flow. The goal is for the earlier stages to provide the fastest feedback possible, so that build breakages can be detected early. For example, a simple flow might first compile the software and run the unit tests, with the next stage running the slower end-to-end tests.

With the chain, a downstream stage only runs if the previous stage passed – so in the above example, the end-to-end stage only runs if the build-and-test stage passes.

Handling Breaks

As with a Continuous Build you need to have a clear escalation process by which the whole team understands what to do in case of a break. Most teams I have worked with tend to stick to the rule of downing tools to fix the build if any part of the Continuous Build is red – which is strongly recommended. It is important that if you do decide to split your continuous build into a chain that you don’t let the team ignore builds that happen further along the chain.

Build Artifacts Once vs Rebuild

It is strongly suggested that you build the required artifacts once, and pass them along the chain. Rebuilding artifacts takes time – and the whole point of a chained build is to improve feedback. Additionally getting into the habit of building an artifact once, and once only, will help when you start considering creating a proper Build Pipeline (see below).

And Build Pipelines

Note that a chained build is not necessarily the same thing as a Build Pipeline. A Chained Continuous Build simply represents one or more Continuous Builds in series, whereas a Build Pipeline fully models all the stages a software artifact moves from development right through to production. One or more Chained Continuous Builds may form part of a Build Pipeline, and a simplistic Build Pipeline might not represent anything other than Chained Continuous Builds, but Build Pipelines will often incorporate activities more varied than compilation or test running.

Fast Feedback vs Fast Total Build Time

One thing to note is that by breaking a big build up into smaller sections to improve fast feedback, counterintuitively you may well end up increasing overall build time. The time to build and pass artifacts from one stage to another adds time, as does dispatching calls to build processes further down the chain. This balance has to be considered – consider being conservative in the splits you make, and always keep an eye on the total duration of your build chain.

Tool Support

Tooling can be complex. Simple straight-line chains can be relatively easily build using most continuous build systems. For example a common approach is to have one build check in some artifact which is the trigger point for another Continuos Build to run. Such approaches have the downside that the chain isn’t explicitly modelled, and reporting of the current state of the chain ends up having to be jury rigged, typically through custom dashboards. More complex still is dealing with branching chains.

Continuous Build systems have got more mature of late, with many of them supporting simple Chained Continuous Builds out of the box. TeamCity, Hudson and Cruise and others all have some form of (varying) support. Cruise probably has the best support for running stages in parallel (caveat: Cruise is written by ThoughtWorks, the company I currently work for), and has some of the better support for visualising the chains, but given the way all of these tools are moving expect support in this area to get much better over time.

Clock - from flickr user laffy4k Anyone who has worked in a team which uses a Continuous Build inevitably starts to learn about the cost of a long running build:

  • More time between checkin and a report of a failure
  • Higher chance of Continuous Build containing multiple checkins, increasing the chance of an integration break and complicating rollback
  • Fixing a build related to a checkin made much earlier decreases productivity, leading to a reduction in productivity

There are other ‘build’ times to be aware of. A long Checkin Gate build leads to an increased chance of someone else checking in before you, increasing the chance of an integration break when you do checkin. It also disrupts the developers normal flow – they cannot easily work on new code, so effectively have to down tools waiting for the Checkin Gate has finished. You also need to consider the time taken to run a single test – be it a small-scoped unit test, or a larger end to end test.

No matter what the build is, a long build interrupts programmer flow, decreasing focus, and therefore decreasing productivity.

Different Builds, Different Limits

As a team, you should decide on acceptable Build Time Limit for each ‘build’ – for example individual tests, Checkin Gates, and stages in your continuous build. You may even consider failing these builds if those time limits fail. Setting the Build Time Limit at the right level – and keeping it there – will help keep productivity high.

Different builds get run with different frequencies. The more often a build is run, the faster it needs to be. Experience suggests the following time limits:

  • Single small-scoped unit test – sub-second
  • End-to-end test – a few seconds
  • Checkin Gate – 30 seconds to a couple of minutes at most
  • Continuous Build – a handful of minutes

When your Continuous Build is part of a larger Build Pipeline, you may find it useful to set Build Time Limits for each stage in the pipeline. One might argue that enforcing Build Time Limits for each stage of a Build Pipeline – manual or automated – may be overkill, but having some reporting of when a limit is exceeded will help directly highlight bottlenecks in creating production deployable software.

Team Ownership

Teams must take ownership of ensuring that the Build Time Limit is enforced. Further, they should always look for opportunities to reduce them further. Any decision to increase any Build Time Limit should be taken by the whole team – likewise any decrease in a Built Time Limit with decreases test coverage should be agreed with all. Everyone should be empowered however to look for quick wins.

Some teams find the need for a Build Tzar/Build Cop role – someone who is in charge of the health of the build. I consider such roles as being short term measures only, and should certainly be considered an anti-pattern if they exist for any length of time. At the extreme end of this spectrum is the dedicated build team. Empowering the whole team is key.

Making Things Faster

There are a number of ways of making individual tests fast, which will depend both on the nature of the technology being used and the way it is being used. Consider making a Checkin Gate fast using a Movable Checkin Gate. Long Continuous Build times can be mitigated through the use of a Chained Continuous Build, perhaps as part of a larger Build Pipeline.

You may also want to simply remove tests that are slow but provide little coverage. Often, it may even be the case that slow running tests represent a performance issue in the system itself.

Some teams have also shown significant speed improvements by using the right hardware – such as faster CPUs, RAM disks or SSD drives. However simply throwing hardware at the problem can help speed a Continuous Build up, but this does little to affect the build time on local development machines – a situation where your continuous build is faster than your local development build is the opposite of what you want.

Further Reading

For more concrete evidence on how build times can influence the productivity of teams, Graham Brook’s paper for Agile 2008, Team Pace – Keeping Build Times Down, details experiences of working with two different teams and the impact of long (and short) build times on the development team. Thanks also go to Graham for reviewing an earlier draft of this article.

The Checkin Gate defines a set of tests which need to pass before a developer checks in. Typically, the tests are a subset of the total test suite – selected to provide a good level of coverage, whilst running in a short space of time.

There is an inherent trade-off with a Checkin Gate though – you may end up having blank spots in your coverage of the gate itself, which can increase the frequency of build breakages in your Continuous Build. By applying a Movable Checkin Gate, you attempt to offset this shortcoming by changing what is in the Checkin Gate suite.

Selection Based On Planned Work

Periodically, you assess the kinds of work coming up. If you are using an iterative development process, you may do this at the beginning of each iteration. Based on the kinds of changes the team will be working on during the next period, select tests which cover these areas of code, removing others which cover functionality unlikely to change. The theory is that you are selecting tests that cover areas of code which are most likely to get broken. The tests should be selected such that they don’t exceed your Build Time Limit.

After each movement of the site driving the Checkin Gate, you can assess the success by looking at the failure rate of the Continuous Build.

The key is to have a series of well categorized tests – tagging could work well here.

Selection Based On Build Failure

An alternative technique for selecting the makeup of the Checkin Gate can be based on build failures. If tests not in the Checkin Gate start failing in your Continuous Build, put them into the Checkin Gate suite, swapping out other tests to keep you below your Build Time Limit.

Updates

Added link to the new Build Time Limit Pattern.

Recently, both Paul Julius and Chris Read pointed out that I was perhaps the first person to document the concept of build pipelines, at least in terms of how it relates to continuous integration and the like. As it turns out, the original posts on the subject are from further back than I remember:

I plan to pull together my previous posts on the subject and update them a little, but in the meantime thought I’d give a bit of background as to where much of this came from.

A Harsh Introduction

My first exposure to continuous integration was by being dropped in at the deep end during my first ThoughtWorks project. The project in question was for an electronic point of sale system, and at its peak had over 50 developers in three countries working on the project. During this time I started reading up on the topic, specifically Martin Fowler & Matt Foemmel’s paper on the topic (Martin has since created an updated version).

Much of the experiences at this first, large project were dominated by long, slow build times, caused in part by an inability to separate out activities being performed by individual teams. A full discussion as to things we learnt from that project can certainly wait for another time, but I came out of that experience liking the concept of Continuous Integration, but feeling incredibly constrained by the actual implementation.

Monkeying Around

Subsequently, I worked as what we used to call a ‘Build Monkey’ at a London-based ISP. My role (which we now tend to call an Environment Specialist) was typically to identify the causes of build failure, keep the build running smoothly, as well as manage deployments to a number of different environments. Throughout this time, discussions around the theory behind managing Continuous builds for larger software teams was continuing – primarily with colleagues like Julian Simpson, Jack Bolles & others.

The challenge we seemed to face, time and again, was how you balance the various activities associated with getting software from developers machines into production, all whilst providing the fastest feedback possible.

Typically, we came at the problem from two different directions – in the first instance from the point of view of how to hammer our tools into supporting the kind of processes involved, but the more important angle was understanding what the pipeline – from developer workstation to production – actually was. This thinking can now be best thought of in terms of Continuous Deployment – although that topic is far more nuanced that the often simplistic thinking regarding systems where 50 deployments a day is possible, or even desirable.

The Present Day

Since I wrote my original articles, many other people have done work in this area, to the extent that tools like ThoughtWork’s own Cruise builds support for build pipelining & visualisation directly into the tool.

Update 1: Corrected spelling of Paul’s surname – sorry Paul!

When using a Continuous Integration build, before long you’ll break it. Breaking a build is not a bad thing, however it is typically the team’s top priority to have such a build fixed. Beyond the shame associated with having been named as the breaker, you then have the hassle of lots of people informing you you’ve broken it.

As a way of letting the development team know that:

  • You know the build is broken
  • You are fixing it

A simple broadcast mechanism can be highly useful.

Example

Whilst I have seen high-tech solutions being recommended, the most effective example of a Build Fix Flag I’ve seen is simply using a giant paper flag. It was about 2 1/2 feet in height, and could be clearly seen above monitors. When a build failure was seen, a quick glance across the floor would indicate if someone was working on it.

What was nice was that before long, the same mechanism was used for notification of a number of development environments

Rules for the build fix flag

When using such a flag, we quickly decided on a set of rules as to how to use it:

  1. If you saw a CI build breakage, you looked for the flag
  2. If someone had the flag, you left them alone
  3. If you couldn’t see the flag, you tried to identify the person who made the last check in
  4. If you couldn’t find a likely culprit, you raised the flag and fixed it yourself

When running a suite of tests – either as part of a Continuous Integration build, or part of a check-in gate – speed is the enemy. You are always trying to find the balance between test coverage and time to complete the entire suite.

Both a check-in gate and continuous integration build have slightly different time pressures. The optimal duration for both will probably vary from team to team. A fish-eye suite is one way of formulating which tests should be included.

In a fish-eye suite, you focus on those tests which provide the best coverage for those areas of functionality currently being developed, whilst those areas not directly being affected have only minimal coverage. The logic goes that tests are there to pick up bugs, and therefore tests which cover the functionality currently being worked on are the most important.

Regularly changing the focus

In an iterative development process, working out where to change the focus of the suite can be easy. At the beginning of each iteration, the team as a whole decide which tests (or group of tests) should be run in each suite. During a development process where there is less segmentation, changing the suite on an ongoing basis may be more sensible

Reactive test suite

Rather than changing the focus of the suite at the start of each iteration, you can also decide to adapt the contents of the suite in certain situations – for example when the total time for the suite to run exceeds an agreed limit, or the number of bugs reported against a functional area increases.

For example, the team may decide that the check-in gate build should take 30 seconds, and the CI build should take no more than five minutes. The moment they take longer, the team as a whole should redefine what tests should be included. When removing tests, the team should look to remove those tests which cover areas of functionality furthest away from those currently being worked on.

Likewise if the number of bugs being raised against a certain functional area increase, the development team may decide that increasing the test coverage of a certain area makes sense. Again, the team may have to remove some tests in order to still be within the optimal build time. In which case, tests associated with areas of functionality with low defect rates make a good candidate for removal.

Source-Code Management

Being able to create and manage a fish-eye suite presupposes that you group your tests into functional areas. This can be an issue with typically packaging structures which are defined in horizontal terms, whereas the long running functional tests should cover a vertical slice of functionality.

Even if initially you aren’t going to use fish-eye suites, grouping your tests into functional (vertical) groupings rather than system (horizontal) groupings makes sense as it allows to to easily use this approach later. It also tends to make more sense to testers who see things in terms of usable functions rather than horizontal tiers.

It’s common practise within a team to define a set of tasks which should be run by each developer prior to checking in. The purpose of the Check-in Gate is to attempt to ensure that all code satisfies some basic level of quality.

Like all development standards, a check-in gate helps a team stay on the same page. It eases integration, and can help give confidence as to the quality of the code checked in.

Check-in Gate with Continuous Integration

A check-in gate is typically used prior to checking in to a system which uses Continuous Integration – where the tasks in run during the check-in gate are the same tasks used as part of the continuous integration build. The main source of embarrassment associated with breaking a CI build tends to come down to the fact that the shamed individual completely forgot to run their check-in gate build.

The importance of speed

Check-in Gates need to be fast. The longer they take to run, the less developers will want to run them – this either results in less frequent check-ins or in developers not running them at all. Fewer check-ins result in more complex (and more error prone) integrations. Not running the check-in gate at all can result in a breakdown of code quality and can be a slippery slope to the gate being abandoned altogether.

Examples

The simplest example of a check-in gate would probably be ensuring that the code compiles prior to check-in. More often, the team will decide to run either some or all of a test suite. Again the constraining factor as to what you’ll want to run as part of a check-in gate is typically time – deciding how and what to run should always be defined by the team.