A year with Bitbucket Cloud

By: on January 7, 2019

For the whole of 2018 we used Bitbucket Cloud to host a few git repositories on a project with around 40 developers working full time, creating 20000 commits. The service worked pretty well, though:

  • We did end up spending some time developing extra checks and automation.
  • We have concerns around the lack of service availability guarantees.
  • We found it awkward to keep large files out of the repository.

An initial preference for Bitbucket Cloud over Github was because we had already chosen Atlassian’s Confluence and JIRA cloud services. We hoped to find strong integration between Bitbucket Cloud and other Atlassian services. Also, we wanted to avoid dealing with too many vendors. And we’d had positive experiences with Bitbucket Cloud on smaller projects.

Stability

Bitbucket Cloud had good enough uptime for us in 2018. It’s not perfect, and Atlassian are good at communicating their status. We noticed a significant fraction of the outages Atlassian announced, the worst of which affected us for a painful few hours.  Realistically I’d guess the availability was better than we’d get if we’d used a single home-made server ourselves, without excellent administration plus redundant power, storage and networking.

Atlassian’s position on not guaranteeing uptime is problematic since, while we can keep working during downtime we nevertheless rely on Bitbucket for continuous integration, which is a vital part of our release process.  That exposes us to the danger that we won’t be able to make a fast bug fix to our service.

Pricing

At the time of writing Bitbucket Cloud Premium pricing is $5/user/month, plus $3/user/month for support and SAML. We needed multi-factor authentication and merge checks so the cheaper tiers were not appropriate.

We also considered the pricing of Github. We’d need the Business Cloud tier at $21/user/month, which also includes an agreement about uptime. Amazon AWS CodeCommit is also coming up on my radar and has attractive pricing.

Performance

The speed of cloning, pushing and pulling commits has always been outstanding; we’ve always been limited in this by our internet connections rather than the service itself.

A minor frustration was that even at its best, Bitbucket Cloud page load times feel a little slow. Sometimes page loads take a few seconds. Previous experience I’ve had with a locally hosted Bitbucket on fast servers has been very smooth.

The 2GB repository limit and trouble with large files

Bitbucket Cloud has a hard repository size limit of 2GB; once you exceed that you can’t push anything new so when we hit that our project ground to a halt. If your repository size exceeds 1GB you get a warning in the web interface. While 1GB is a lot of source code, it is easy to fill it up if people check in large binaries without considering the consequences. There’s no warning in the Bitbucket web interface when a pull request introduces large new binary files. The first thing we noticed on hitting this problem was that git pull ran slowly. By that point, the big file was in master and 20+ people had pulled the big file on to their machine, and we wondering why the office WiFi was running slowly. So, we eliminated the file from master which required a rewrite of master to remove it from history. However, over the next few days various branches appeared from developer machines that included the large file in the history, so the repository size and pull times shot up, so master had to be rewritten again half a dozen more times.  A common confusion across the team was that the working tree can be small after merging in changes that introduced then delete the large files, but the history will still be cumbersome. In the end, I added three new checks to our continuous integration builds.

First, we run:

git ls-tree -l -r HEAD | grep  -E '^\d+ \w+ [0-9a-f]+\s+(\d{6}\d+)\s+(.*)$'

which lists file larger than a megabyte. We then check matches against a whitelist, and fail the build if any large files not on the whitelist are introduced.

Then, we run:

git log --stat | grep -E '^ (.*)\| Bin \d+ -> ([\d]{6}[\d]+) bytes$'

which searches for files larger than a megabyte in history not already in master. If people hit this, many developers want help clearing up their branch history from a friendly git expert with a tool such as BFG. Or, they carefully repeat the work from scratch based on the latest master version.

Finally, we check for certain commits which we know are large and have been eliminated from being reintroduced, and fail the build if the bad commit is in history, i.e. if we get output from:

git merge-base --is-ancestor $BADCOMMIT HEAD

Bitbucket Cloud does not offer a way to force an immediate compaction of the repository. So, when we cleared up large files that had got into history on various branches, we had to wait a few hours before the repository size would drop, which is stressful when you’ve potentially got 40 developers blocked from collaborating due to hitting the hard 2GB repo size limit.

One option that Bitbucket Cloud does support is git LFS (large file support). While we haven’t tried it, on first glance it does still require one to use LFS for specific filenames, so wouldn’t help with people accidentally introducing huge files.

No user-defined pre-receive hooks

My first thought after someone accidentally pushed an 800MB file into our git repository was to configure the server to refuse to receive very large commits. However, after some research I was disappointed to discover Bitbucket Cloud neither supports commit size checks nor not support user supplied pre-receive hooks. This means that you cannot arrange to enforce standards, such as no large files, on the git server itself, except for a few built-in checks such as the use of a commit ID in each commit message.  As a workaround, we check for large commits and commit signing in the continuous integration builds, and require pull request build passes before merging to master. Traditional git pre-receive hooks introduce a way that pushing can fail (or run slowly), and I can understand the Bitbucket Cloud developers not wanting to provide a sandbox for running user supplied code even though now we see functions as a service getting a lot of attention an aversion to executing user supplied code seems a bit old-school.

It seems that the standard Github service does not support pre-receive hooks either, though you can use pre-receive hook on Github Enterprise where you host the entire server yourself. Or, Amazon AWS CodeCommit supports Lambda triggers though I haven’t tried it so it isn’t clear if a Lambda can be used to reject incoming commits.

No support for GPG signing

There’s no way to upload some public keys to Bitbucket Cloud and have it check if commits are GPG signed with keys you’ve specified you trust. This is available if you choose to host Bitbucket Server yourself, and having found the self-hosted server support I was disappointed that my assumption that we’d get all the Bitbucket Server features was mis-founded. I also didn’t find a way for Bitbucket Cloud to sign merge commits it creates itself, though the value of such signatures would be questionable. Since the tree resulting from the merge is completely predictable, you can at least repeat the merge locally to verify that you get the same result.

Pull request verification

We believe it is important to keep our master branch stable.  Therefore, we wanted all changes to master to be reviewed and tested against the latest version of master. So, we arranged for Jenkins to build pull requests automatically and post their status back to Bitbucket, which worked well. Bitbucket can be set to disregard the pull request build success and require additional review if new commits were added to the source branch of a pull request. The remaining risk became clear when  we root-caused one master failure being down to 2 separate commits that did not textually overlap and both work independently failing to work when combined. Unfortunately we did not find a good way to address that risk in Bitbucket. We ended up implementing a Python program that scans using the well designed and documented Bitbucket API to check for approved pull requests and then dives into the Jenkins build log and if build worked with the latest source and target versions would merge the change.

Even if we didn’t care about the target branch version since Bitbucket Cloud does not support automated pull request merging we’d still have to either manually come back and merge pull requests once the build are done or automate that outside Bitbucket somehow (e.g. with the Jenkins Git plugin).

Lack of service API keys

Service API keys were experimentally supported then deprecated. Instead you can get API access using a key associated with an Atlassian account. This associates all API actions with specific users. Ideally I’d like to differentiate between actions done by people and those by automation, and we didn’t want automation failing when an individual left a project. So, since my API key was used by automation I wrote to merge all pull requests a number of people got confused into thinking it was actually me manually accepting each pull request. We can potentially create complete new Atlassian accounts for automation, with their own passwords and second factor authentication. However we decided to only give access to people with email addresses in certain domains and it was not easy to get service email address created in those domains.

Cleaning up old branches

After a year we ended up with hundreds of branches on the server and no idea who created each one. Part of the proliferation is that you need a branch for each pull request and it’s easy to miss the tick box on pull requests to delete branches when the pull request is closed. On my next project I’d look at locking down branch creation and instead having people create branches in per-user forks of the repository, and arrange for continuous integration to build from the forks. Hopefully it’d be much easier for people to delete old branches they know they created. Lots of branch clutter makes it harder to spot important branches and slowed down some automation we wrote which scanned over branches.

Would we use Bitbucket Cloud again?

Both Github and Bitbucket Cloud are solid, have the features we need and most of what we want, and perform well enough. Having access to robust source code management servers run by professional organisations at a reasonable price compared to the effort of running your own infrastructure gives me one less worry when setting up a project team.

Early in 2018 we were missing federated authentication support in Bitbucket Cloud, but now SAML support is available one of my main concerns with Bitbucket Cloud may have been addressed.  The solutions to accidentally adding large files aren’t ideal but are easy to implement again.

Assuming a cloud service is appropriate for project source code, the key question then becomes primarily whether a project needs the srvice level agreement on uptime that Github provide, at nearly three times the cost. I’d also consider CodeCommit due to its attractive pricing, authentication integration with AWS IAM, and support for Lambda triggers.

Share

2 Comments

  1. Ethan Murray says:

    I’m curious how the actual uptime of BitBucket Cloud vs. Github compares.

    In general in my experience as an IT administrator, I’ve found a slight negative correlation between contractual SLAs and actual quality of service. I find that companies that live or die by their reputation for quality service and uptime can do a pretty good job at it, while those who sell their services by reassuring nervous CIOs with SLAs are actually able to use contractual fine print and under-delivery to make the SLA penalties minimal, so they can get away with poor service and no meaningful recourse from the SLAs.

  2. Ian Rogers says:

    Dikon,

    your solution to the 800MB file problem is characteristically epic! Given what project that was on I can imagine that week was sooo joyful 🙂 When I first read there was a 2GB hard limit I thought “ouch” – but the alternative of git merely running slowly would have been a drag on everyone but would have never prompted a proper solution I think.

Leave a Reply

Your e-mail address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*