Recycling your continuous integration builds

By: on October 2, 2018

Background

Having a stable integration branch and doing branch per feature is one way to run a project; we use it to ensure we do code review on every change, and (almost) never have to roll back bad merges (though there are other approaches). If you are doing feature per branch you can do extensive testing and getting branches to the point where you know they build and essentially work before their changes go into your integration branch. Developers can merge the integration branch regularly into their code without wasting time fixing up problems. You can further stabilise your integration branch into a release or service deployment quickly and confidently. If you are using git (and these days that seems to be just about everyone) you probably call your integration branch master, and for brevity I’ll talk about master to describe an integration branch below.

Jenkins supports triggering a complex set of builds and tests from a bunch of git branches and pull requests, using the Pipeline Multibranch Plugin. As of Jenkins 2, pull requests get automatically merged with each new head of their target branch (typically master) and then tested. This means you can thoroughly test every change you make to your project before accepting into master. For instance you may run thousands of unit tests, verify your code coverage, build some Kubernetes services, push images to a cloud container service such as ECR, test your external service interfaces, set up a postgres database with the latest schema, run a quick integration test and record your test reports then mark the ECR images for elevation to a production service. The pipleline would be:

To do so requires an extensive set of third party systems, as well as your code, to work perfectly every time. Your build pipeline may grow to the point where it resembles a hundred dominos all nicely lined up.

That’s awesome, but you are now very much into Lamport‘s definition of a distributed system:

A distributed system is a system that prevents you from doing any work when a computer you have never heard about, fails.

After a while, you may find yourself running out of patience waiting to run your entire pre-merge sequence. It is useful to have test results for the latest head of master, so after a pull request gets merged you may want to run the whole pipeline against the new head of master. If you want to change something that runs quite late in your pipeline, there’s a lot of work that has to be repeated before your changes are reached and tested. Since it often takes a few iterations to get changes right, that can add up to taking hours to get a change in to master. Even if the changes in your pull request are perfect you may find that some third party service has an error. Then, you can just try the whole pipeline again, and waste a lot of time. Integration can become a huge part of your development effort; that’s often the case.

This article describes various ways we mitigated this overhead on a project that was suffering from this kind of problem. We’re now producing our master builds in two minutes, down from 20+ minutes before we discovered these techniques. Our pull request builds only build what’s actually been changed in that pull request.

Recycling stashes

(easy change, small reward)

First, let’s talk about an as easy approach which can help. If a step of your build use a Jenkins stash to store files, you can simply use the pipeline preserveStashes() option. Then, if sometimes an intermediate step fails, you can use the Jenkins UI to rerun that step. You’ll need to know what all of your pipleline does and be paying attention, so this might not help all  your developers. So, the pipeline looks the same as standard, with some new start points to let you jump to a specific point.


You can’t recycle a stash from between, for instance, different PR builds that happen to build exactly the same component, so this only saves you from running early steps. Overall, sometimes this is worthwhile but it is still far from the best you can do if you can live with more sophistication.

Recycling coverage from master

(fairly easy change, moderate reward)

If you know what files in your repositories contribute to each component of your build then, when you are building pull requests, you can detect which sections of your build have changed relative to master and only run those sections when their input files have changes. For each component:

Your build can run: git diff --name-only origin/master

All our mobile app source files are in a directory called frontend, so we can determine if the frontend component has changed when building a pull request by checking the exit code of: git diff --name-only origin/master | grep ^frontend

Then, only build the frontend in a pull request build if that command returns non-zero. You now have a new risk—that your component build depends on files you don’t check using git diff. If you miss any dependency you’ll find that errors only show up in master where they are much more expensive to fix. So, we arrange to run on a separate build node so that Jenkins isolates our build tree from other builds that may be running, and we delete all the files we don’t think we need.  In my case, I wrote a script which uses a config file per section that lists out all the files in each section of the build. I then invoke that script early on in the pipeline and it produces a list of section names that have changed if it is run on a pull request branch, otherwise it lists all sections.  In essence, the test and the delete operations can be done in single lines of shell code. In your Jenkinsfile you’d have a stage to detect if the frontend has changed:

stage('Check for frontend changes') {
  steps {
    script {
      env.HAVE_FRONTEND_BUILD = sh script: 'git diff --name-only origin/master | grep ^frontend', returnStatus: true
    }
  }
}

Then, later when you add a when guard to your frontend build, like this:

stage('Frontend') {
  when {
    expression {
      return HAVE_FRONTEND_BUILD != "0"
    }
  }
  steps {
    timestamps {
      node("ec2-additional") {
        checkout scm
        script {
          sh 'find . -not -path "*/frontend*" -a -not -path "*/.git/*" -type f -delete'
          sh "make -C frontend"
        }
      }
    }
  }
}

Note that the grep returns success if it finds changes, so that’s exit code zero. The ‘checkout scm’ line tells Jenkins to get a fresh copy of the code, and then the find command deletes the files, If you use the Blue Ocean view on Jenkins the Frontend build is greyed out if it does not run due to the when expression.

A colleague recently pointed me at mbt which looks promising though I haven’t figured out how to integrate that with Jenkins yet.

Recycling builds between branches

(more effort, bigger reward)

Neither of the tricks above stop you having to rebuild the same code on different branches. For instance, let’s suppose I make a change to one component of a build on a pull request. I shouldn’t have to build all the other components since they’ll be the same source as the last build of master. For the component I do change, that pull request’s source branch will be merged against master, then built, then merged. If no one else touches the section in the mean time, the section will still get built again on master, and you’ll be building the exact same code again. Wouldn’t it be nice if Jenkins spotted that it’s already built that exact code and recycled the build results and artifacts?

So, let’s put aside the techniques from the previous two sections and start again from scratch.  We want a way to get a fingerprint of one component of a build. Then you can store the build product and/or test results using that fingerprint, and in the common case where Jenkins has seen that source code before you can skip over the build:

git ls-tree is basically instant, and a file server check can be very fast as well, so you can get through the common path where everything is already on the server in milliseconds.

You might think to use the commit ID as the key to look up products on the server. However, if your git repository merges the pull request into master creating a new commit then they’ll be a new commit ID. And, different commits on different branches that happen to have the exact same content for some sections of your build will have different commit IDs, so you won’t get much benefit from your cache.

Continuing the example of accelerating our frontend build above, in git you can do this:

$ git ls-tree HEAD -- frontend
040000 tree 81c423b27ae58c9a39ac078394138e16e1c2ba82 frontend

That 81c423b27... is the secure hash of all the files in the frontend directory, including its subdirectory. If you need multiple directories or you want specific files, add them to git command line and then pipe the result into a secure hash program. (An alternative is to use git log -1 –format=format:%H frontend which returns the commit ID of the last change to the frontend. However that will find merge commits, so depending on how you handle merges it is likely that you’ll have to rebuild after a merge).

Now we need a place to store build artifacts. I don’t know how to share artifacts between builds in Jenkins, and so we used Sonatype Nexus but a simple local file server or cloud storage provider would also suffice. The check step becomes:

stage('Check frontend changes build archive') {
  steps {
    script {
      env.FRONTEND_CONTENT_HASH = sh script: 'git ls-tree HEAD -- frontend | shasum | cut -d ' ' -f 1' returnStdout: true
      env.HAVE_FRONTEND_BUILD = sh script: 'curl -o /dev/null --user $(NEXUS_USER):$(NEXUS_PASSWORD) --head --fail $(NEXUS_URL)/repository/build-products/`id -un'-frontend-${env.FRONTEND_CONTENT_HASH}`, returnStatus: true
    }
  }
}

And the build step becomes:

stage('Frontend') {
  when {
    expression {
      return HAVE_FRONTEND_BUILD != "0"
    }
  }

  steps {
    timestamps {
    node("ec2-additional") {
    checkout scm
    script {
      sh 'find . -not -path "*/frontend*" -a -not -path "*/.git/*" -type f -delete'
      sh "make -C frontend ci"
      sh "tar cvfz build/frontend.tar.gz build/app.ipk build/tests.tap"
      sh "curl --fail --user $(NEXUS_USER):$(NEXUS_PASSWORD) --upload-file build/frontend.tar.gz $(NEXUS_URL)/repository/build-products/`id -un'-frontend-${ENV.FRONTEND_CONTENT_HASH}`"
    }
  }
}

(although I do the script actions in makefiles since I don’t like too much detail in the Jenkinsfile)

Now, you can expect the check step to run in milliseconds; git pre-computes directory hashes as part of what goes into the commit ID, and can retrieve them very quickly since it won’t even need to examine the entire frontend directory tree. The curl check, with the –fail –head options, is also very quick even if the build product is huge. So you can run many of these at the start of the build, once Jenkins has checked out the source on the master.

Recycling a Kubernetes namespace

(an example of avoiding a step that is fast to probe)

We deploy Kubernetes pods for each PR so we can do integration testing. That requires a build stage to set up a namespace. So, we check it like this:

script {
  // code to set up access to Kubernetes environment
  env.HAVE_NAMESPACE= sh script: "kubectl get ns ${env.NAMESPACE} | grep Active", returnStatus: true
}

Then, we guard the kubectl command to create a namespace with a when {  block much like the one above.  In this case, we could simply tell Kubernetes to create the namespace and rely on it doing nothing if the namespace already exists, so the reason for this check is that is simple and quick to do and provides consistency with other parts of the build pipeline which only run when needed, in contrast to our previous build pipleline which ran everything every time.

Recycling database migrations

(an example of avoiding repeating a step that is hard to probe)

We’d love to avoid setting up a database and its schema where possible, but that’s hard to probe. Instead, looking at our pipeline we know that some pieces only run later after the migration is complete. So we check Kubernetes for a running service which is deployed in the pipeline right after a successful migrations, which means we know that if we see the service running that the database was deployed successfully. Rather than put too much directly in in the Jenkinsfile, the check looks like this:

script {
  // code to set up access to Kubernetes environment
  env.HAVE_MIGRATIONS= sh script: 'make -C deploy/backend check-migrations', returnStatus: true
}

which is calling a Makefile rule which is:

IMAGE_TAG=$(shell id -un)-content-$(BACKEND_CONTENT_HASH)
check-migrations:
  if [ "$$(kubectl get deployments aservicethatusesdatabase --ignore-not-found -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="VERSION")].value}')" = "$(IMAGE_TAG)" ]; then \
    echo "Migrations have been done since corebank is up at the right version"; \
  else \
    exit 1; \
  fi

Recycling ECR images

Amazon Web Services Elastic Container Repository  is a place to store build artifacts that can be picked up by Docker systems such as Kubernetes. My project currently deploys ~15 services, each with its own container image in ECR. It is taking a couple of seconds to check ECR for each image, so I chose not to do that for each image at the start of the build. Instead, I upload the inputs for the images (an archive of JAR files in my case) to Nexus, keyed by source code content hash, to avoid 30 seconds overhead from each build. Then, when it is time to deploy to Kubernetes and we need images, we download the archive, and then run a makefile rule:

IMAGE_TAG=$(shell id -un)-content-$(BACKEND_CONTENT_HASH)

.PHONY: %-image
%-image: buildstuffjava build/libs/%-all.jar

aws ecr describe-images --region $(AWS_REGION) --repository-name $* --image-ids imageTag=$(IMAGE_TAG) && echo ECR already has this $* image || ( \
  docker build -f $*.Dockerfile -t $(REGISTRY)$* -t $(REGISTRY)$*:$(IMAGE_TAG) . && \
  docker push $(REGISTRY)$*:$(IMAGE_TAG))

This approach does mean you don’t get visibility on Jenkins Blue Ocean of which images get built and pushed to docker, and which deployed. We could check the 15 images in parallel using different nodes or Makefile parallelism. What’s holding me back is not wanting to make my Jenkinsfile too huge.

Recycling Kubernetes deployments

Finally, we can use the source code content hash to check whether services are already running. Again, in a Makefile we have:

IMAGE_TAG=$(shell id -un)-content-$(BACKEND_CONTENT_HASH)

.PHONY: %-deploy-if-not-running
%-deploy-if-not-running:
  @if [ "$$(kubectl get deployments $* --ignore-not-found -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="VERSION")].value}')" = "$(IMAGE_TAG)" ]; then \
    echo "$* has been deployed at the right version $(IMAGE_TAG)"; \
  else \
    echo "$* has NOT been deployed at the right version $(IMAGE_TAG) so deploying now"; \
    $(MAKE) $*-deploy; \
  fi

.PHONY: %-deploy
%-deploy: %-image build/secrets.yaml
  helm dep update $*-charts --skip-refresh
  helm upgrade ... $*-charts

There’s then a Makefile rule which triggers recursive makes of that rule for each service in the appropriate order, then runs an integration test. Finally we upload the integration test results to Nexus, and we have the Jenkinsfile
run that make rule and upload the test report, if that test report isn’t already on the server.

Share

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*