Building rule-based machine learning systems from scratch

By: on July 18, 2019

Sometimes, it is obvious that a project needs machine learning, but you can tell that simply pumping the data through all the algorithms in a popular library (and picking the one algorithm that performs least badly) is not the answer. Machine learning libraries cannot cover all algorithms, trade-offs and heuristics specific to arbitrary problem domains. By picking a particular focus and deciding what to include / exclude, such libraries can be fantastically productive for a range of applications—at the price of being less appropriate for things outside of that range.

This year, I have had the opportunity to work on two feasibility-phase projects where variants of rule-based machine learning turned out to be a great fit.

Rule-based machine learning methods have been established for a long time. There exists a large body of research describing many different approaches, which are offering various combinations of interesting properties. Among these properties are, for example:

  • good interpretability,
  • the ability to incrementally refine rules or sets of rules, and/or
  • allowing rules to be expressed in a formalism like Datalog or Horn logic, i.e. going beyond conjunctions of simple conditions based on attribute-value pairs.

Yet, most libraries don’t go beyond decision trees and random forests. It can be argued that decision trees are not generally easy to interpret, and random forests in their quest for improved predictive performance only exacerbate this. Representation-wise, most libraries are geared towards learning from a single table of data (e.g. Pandas / R ‘data frames’; Weka’s arff). You might consider modifying or extending a library to bend it to your will, but is that likely to be better than writing a small, focussed rule learner from scratch?

In a previous life, I have built a fairly complex rule-learning system for NLP tasks, so my point of view is that you can absolutely build your own, and especially so when it is intended for solving just one particular problem.

Two recent projects

The first project revolved around mapping free text (rife with inconsistencies and spelling mistakes) from maintenance logs of heavy machinery to a standardized catalogue of machine part ids. The particular structure of the standardized catalogue led me to choose an approach known as Ripple Down Rules, which can be automatically induced from labelled training data. Another interesting point about this project is that in the pieces of free text, word order and multi-word units are highly significant, and that these could be captured by dynamically constructing features during the RDR-learning process. Enumerating all candidates upfront would have resulted in a prohibitively large number of features. A previous prototype by a different team was using document vectors to represent text, thus losing information about word order. Using an SVM learner from a popular library, the previous prototype had no freedom to construct features on the fly. It took twelve weeks to get from zero knowledge about the project to a convincing system with demonstrable advantages.

The second project was about comparing the outputs of a legacy system and its independent reimplementation, and clustering in some way the cases where both systems disagreed, such that the largest classes of errors could be investigated with high priority. The systems deal with highly specialised business logic and the cases that pass through them are associated with variable numbers of various entities as well as transaction logs comprising on average 500 events. Here, a coverage loop-driven rule-set learner with a custom hypothesis language not only fulfilled the requirements, but also provided explanations of the error clusters. It took just under five weeks to build from scratch; there’s a possible short follow-up project for integrating the system into a production environment.

Both projects have in common that:

  • their specific rule-learning algorithms are not included in popular machine learning libraries, and
  • there was no obvious, appropriate choice of algorithm from a popular library, and
  • there was freedom to choose the implementation language.

Building your own …

The basic ingredients for rule learning algorithms are fairly straightforward and highly adaptable. On my two projects, developing rule learners from scratch allowed me to exploit application-specific heuristics, enumerate features dynamically, and choose an adequate representation language.

… in Elixir

For both projects, I chose Elixir as the implementation language for the following reasons:

  • many aspects of rule-learning are recursive in nature, so a functional language is a good fit
  • Elixir code is generally very compact
  • the basic Erlang data types (atoms, tuples, lists, maps) let you build interesting structures with little fuss (just imagine how many small classes you might have to write in a Java-like language)
  • Elixir’s REPL (iex) allows for efficient experimentation: you can make a code change and reload your module without losing previously constructed data
  • term_to_binary, binary_to_term are convenient for persisting data structures on disk
  • ets (erlang term storage) is great for storing data and efficiently accessing it from multiple processes
  • Enum and Stream modules make it easy to create data transformation pipelines
  • Task.async_stream can be used to distribute work over multiple cores
  • should you outgrow the capacity of a single machine, there are official Elixir projects for distributing/scaling transformation pipelines: GenStage, Flow and Broadway

Potential disadvantages of using Elixir:

  • other environments offer more raw computational power
  • lack of a static type system

I like working with Elixir. In my experience it’s an effective language for prototyping as well as putting systems into production. However, other languages can be just as appropriate, depending on your project, experience, preferences and external constraints.

Conclusion: risks and possible fallout

What risks and fallout can you expect when you decide to develop your own rule-learning implementation?

  • First of all, you now have two problems (actual task + custom learner) to solve at the same time. There is no upfront guarantee that your chosen approach will be successful or that your implementation will be adequate—no pressure!
  • Your project sponsor may request that your final approach be re-implemented in a different language, for example so that it can be absorbed into the code-base of the parent project it serves. Pair-programming with a target-language expert can be an effective way for carrying out such a transfer.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>