How software systems learn

By: on March 14, 2018

As part of a recent LShift tech meet­ing, we watched the first episode of Stewart Brand’s series How Build­ings Learn, as a way to prompt dis­cus­sion on what it means for soft­ware sys­tems to be ‘liv­able’.

Flex­ible found­a­tions

So, build­ings, like soft­ware sys­tems, have parts that are more or less change­able. For ex­ample, as build­ings have re­l­at­ively fixed struc­tures like found­a­tions, or load bearing walls, soft­ware has storage sys­tems like data­bases, or de­ploy­ment in­fra­struc­tures like server farms.

Con­versely, there are parts that can (and do) change more fre­quently, such as fur­niture ar­range­ments, dec­or­a­tions such as wall hangings or sea­sonal items, just as soft­ware has the im­ple­ment­a­tions of it’s in­ternal mod­ules, or front-end designs. Or even just how people will make use of the struc­tures presen­ted.

For ex­ample, soft­ware that tends to form eco­sys­tems like Emacs, the Bourne shells or Kuber­netes tend to provide a kind of plat­form, or enough core func­tion­ality and ex­tens­ib­ility that they’re easily cus­tom­isable to how the user wants to use the sys­tem, such as de­fining key­board macros or modes in emacs, or cus­tom­ising shell prompts.

They can also be built on top of to create such things as the Gnus email cli­ent, or ad-hoc pro­cessing tools (eg: Doug McIl­roy’s word count pro­gram).

Kuber­netes is an­other good ex­ample of a system that can or­gan­ised and mod­i­fied to suit it’s users needs. The user spe­cifies how their soft­ware can be de­ployed in terms of de­clar­ative re­sources, such as a de­ploy­ment of a set of con­tain­ers, per­sistent volumes and claims for stor­age, and ser­vices that provide a uni­form way for ser­vices to find each-other.

It also provides easy ex­tens­ib­ility so custom re­sources can manage third party re­sources, or op­er­ators can be used to provide higher level soft­ware ab­strac­tions over stateful ser­vices.

When you com­pare this to sys­tems that use dock­er­-­com­pose files based around the con­tainer as primary ab­strac­tion, ad­di­tional func­tion­ality like ser­vice dis­covery or storage can feel like a second class cit­izen, and can make it more dif­fi­cult to manage a cluster of stateful ser­vices.

Ap­plic­a­tions as items in place

As sys­tems go, ap­plic­a­tions typ­ic­ally seem to be­have more like fur­niture than a build­ing; at least from the point of view of someone who mainly deals with in­fra­struc­ture con­cerns. For one, rather than being de­signed and then built in place, they’re typ­ic­ally built on a de­velopers work­sta­tion, and then de­ployed into a sta­ging and then pro­duc­tion en­vir­on­ments.

Given that it’s often dif­fi­cult to rep­licate a pro­duc­tion en­vir­on­ment on a work­sta­tion (al­though sys­tems like minikube can help), and that pro­duc­tion en­vir­on­ments can in­volve more moving parts and in­dir­ec­tion (eg: pack­ging an an ap­plic­a­tion into a docker image be­fore it can be spun up by Kuber­netes) this can often add over­head to the de­vel­op­ment work­flow that pro­longs the cycle time, and makes de­vel­op­ment less com­fort­able.

After all, no matter how useful it is to visit the building site and un­der­stand how the build­ings are as­sembled, an ar­chi­tect typ­ic­ally doesn’t spend their drawing and thinking time on site, as the noise and other activity can provide un­wanted dis­trac­tions. In the same way, fur­niture de­signers won’t need to spend all of their time in the home where it is to be used.

However, for an ap­plic­a­tion that has a lot of de­pend­en­cies on it’s en­vir­on­ment (eg: uses mul­tiple ex­ternal ser­vices that don’t have stubs), at­tempting to rep­licate pro­duc­tion loc­ally may seem like the only choice. However, ex­per­i­en­cing the dis­com­fort this brings can help to movitate solu­tions, by en­cour­aging the de­coup­ling of com­pon­ents, or by re­moving hard de­pend­en­cies.

Grand, el­egant but un­us­able designs

And just as how ar­chi­tects can be mo­tiv­ated to pro­duce build­ings that look glam­orous and modern on the out­side but don’t suit their pur­pose. One ex­ample Brand men­tions is the French central lib­rary re­quired so much re-­work to mit­igate the win­dows that trapped heat that would ad­versely af­fect the books, they had to save money by buying fewer books.

In the soft­ware space, this is re­flected by de­velopers building on the latest and greatest tech­no­logy (eg: using Ha­doop or Spark to pro­cess a few hun­dred Mega­bytes or even Giga­bytes of data), which often de­livers more value to their CV’s than it does to the end cus­tomer.

There are also more subtle ex­amples such as using Rails and in par­tic­u­lar, Act­iveRecord for ap­plic­a­tions with a re­l­at­ively com­plex do­main model. Whilst it’s per­fectly pos­sible to write high quality soft­ware using these, it’s easy to con­flate your per­sist­ence mech­anism with the ap­plic­a­tion lo­gic. So writing tests around that logic is made more dif­fi­cult as they are implicitly coupled to the data­base, which will re­quire it’s own setup and tear­down, and mech­an­isms to en­sure tests do not in­ter­fere with each other.

These kind of sys­tems can even seem great to work with for the first few days or even weeks of use, but often char­ac­ter­istics only emerge after they have been used in pro­duc­tion for some time. If you’re used to building mono­lithic sys­tems that are de­ployed onto a single ma­chine, then having to ac­count for the tran­sient fail­ures pos­sible in dy­namic sys­tems like Kuber­netes can come as quite a shock.

So, pro­vi­sioning a volume via say, EBS may tran­si­ently fail, and some­times the only way to fix that seems to be re­start or re­place the node. If you haven’t chosen a storage mech­anism that uses rep­lic­a­tion to mit­igate node fail­ures, this will mean an outage for any­thing that de­pends dir­ectly or in­dir­ectly on that stor­age.

Ock­ham’s heur­istic

So, what this means, is that we should all choose the right tool for the job. Whilst that’s easy to say, it comes with the vast pit­fall that un­der­standing the right tool for the job usu­ally seems to come with ex­per­i­ences of using the wrong tool for the job. However, re­duct­ively, this comes down to Ock­ham’s razor, a heur­istic stating that a model with fewest as­sump­tions is prob­ably the best one.

And in soft­ware, as­sump­tions can range from coup­ling between com­pon­ents, or to the nature of the en­vir­on­ment, or even de­vel­op­ment tool­ing. This isn’t to say that we should try to be strictly min­im­al­ist, just that we should en­sure that we un­der­stand and even re­cord the trade-offs we make when de­ciding how to create sys­tems.


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>