Ripple Down Rules (RDR) provide a structure for rule-based classifiers and an incremental construction method.
Ripple Down Rules are organized as trees:
- a case (data to be classified) enters the root node and ripples down a particular path to receive its classification
- each node comprises a list of rules
- a rule has a set of conditions and an associated classification
- if a case satisfies all of a rule’s conditions, the rule matches and the associated classification is assigned to the case
- each rule also has an optional child-node that expresses the known exceptions to the rule
- if there is an exceptions-node, the case is passed on to it
- the rules at a node are tried in the specified list order until a match is found
- if no rule matches, the classification of the parent node is the final classification
- the root node typically comprises a single rule with an empty set of conditions and a default classification (e.g. most prevalent class in the data set)
A hypothetical RDR tree is shown below:
A case with attributes [a1=b, a2=d, a3=h, a4=e] would:
- enter the root (node 1),
- get passed to node 2, where it satisfies the second rule
- get passed to node 5, where it satisfies the third rule
- get passed to node 6, where it does not satisfy any rules
- be assigned class 5 as the third rule in node 5 is the last successful rule application
In the same manner, a case with attributes [a1=a, a2=b, a3=d, a4=e] would be assigned class 5 by node 4. A case with attributes [a1=d, a2=c, a3=b, a4=a] would be assigned class 1 as it only satisfies the conditions of the root node.
The RDR approach is incremental in the sense that an existing RDR tree can be refined in the light of new data. Case [a1=a, a2=b, a3=c, a4=d] is assigned class 3 by the example RDR. If the correct class is different (domain expert confirms it is 7), the RDR tree can be refined (corrected) by adding a new rule to node 4 as shown below.
Refinements (corrections) are limited to specific contexts, so that previously correct model behaviour outside of the context is left intact. The alternative to performing incremental refinements is to build a new model from scratch, which may result in a different structure every time, which in turn can potentially affect previously correct behaviour.
Another advantage of the incremental RDR approach is that it can be used even if there is no pre-existing corpus of labelled training data: Starting with an empty RDR, an expert would repeatedly feed a new case into the RDR to obtain a classification. If the expert disagrees with the RDR’s classification, he or she can provide the true classification and refine the RDR. After a sufficient number of iterations, this results in an improved RDR as well as a corpus of labelled training data.
When a corpus of labelled training data is available, RDRs can be constructed fully automatically using the following recipe (from Algebraic Foundation and Improved Methods of Induction of Ripple Down Rules):
- construct a rule using a standard technique (rule is allowed to make mistakes)
- recursively learn the exceptions (i.e. correct the mistakes)
- repeat with the remaining examples
RDRs have seen a range of applications from chemical pathology to e-mail classification to natural language processing tasks to problems in machine vision. An in-depth discussion of RDR including history, applications and variants is given in: Two decades of Ripple Down Rules research.