Balancing trees

By: on April 30, 2011

Have tree. Will use zipper to mutate. Much rejoicing. Only, when you have a binary search tree, you’d like to have a balanced tree so you get the O(lg n) search rather than a list masquerading as a tree (so all nodes only have left or right children), which would give you O(n) search. What to do? Well, obviously, you need to rebalance your tree.

Chris Okasaki shows how to implement a functional red-black tree, but today I’m going to look at a different beast. In the land of mutable structures Day, Stout and Warren came up with the Day/Stout/Warren algorithm. It’s a constant space, in-place, linear time algorithm. Nice!

DSW’s such an attractive algorithm that I couldn’t resist it. But of course, it works on a mutable tree, which flies in the face of my restriction: I want immutable structures. So I thought of a compromise: copy my immutable tree, mutate it using DSW, and then create a fresh, immutable, tree. If you look at the code you’ll see BinarySearchTree>>#rebalanced and BinarySearchTree>>#rebalancedDSW. Each returns a newly balanced BST that, from the outside, appears to be fully functional. These just wrap the internal ZTree methods, which look like this:

    balance
        "Return a balanced copy of self."
        ^ self asOrderedCollection asBalancedBinaryTree.

    balanceDSW
        "Return a balanced copy of self."
        ^ self asMutableTree balance asImmutableTree.

First, let’s look at how to implement DSW. So in MutableZTree (the type of thing that #asMutableTree returns) we have:

    balance
        "Return myself, only balanced. Use the DSW algorithm."
        | pseudoroot size |
        pseudoroot := MutableZTree new
                right: self;
                yourself.
        size := pseudoroot asVine.
        pseudoroot asTreeOfSize: size.
        ^ pseudoroot right.

    asVine
        "The first half of the DSW algorithm. Turn self into a vine, and return the
        number of nodes in the vine."
        | vineTail remainder tempPtr size |
        vineTail := self.
        remainder := vineTail right.
        size := 0.
        [ remainder isEmpty ] whileFalse:
            [(remainder left isEmpty)
                ifTrue: "move vineTail down one"
                    [vineTail := remainder.
                    remainder := remainder right.
                    size := size + 1.]
                ifFalse: "rotate"
                    [tempPtr := remainder left.
                    remainder left: tempPtr right.
                    tempPtr right: remainder.
                    remainder := tempPtr.
                    vineTail right: tempPtr]].
        ^ size.

    asTreeOfSize: size
        "Convert my vine-self to a balanced tree."
        | sz leafCount internalNodeExp |
        internalNodeExp := ((size + 1) ln / 2 ln) floor.
        sz := size.
        
        leafCount := size + 1 - (2 raisedTo: internalNodeExp).
        self compressTimes: leafCount.
        sz := sz - leafCount.
        [sz > 1] whileTrue:
            [self compressTimes: sz / 2.
            sz := sz / 2].

    compressTimes: count
        | scanner child |
        scanner := self.
        1 to: count do:
            [:i | child := scanner right.
            scanner right: child right.
            scanner := scanner right.
            child right: scanner left.
            scanner left: child].

Not terribly Smalltalk-y. In brief, we flatten the tree in a pointer-swizzling version of an in-order traversal, yielding a “vine” – a tree with only right subtrees. Then we iteratively “compress” the vine to yield a route-balanced tree (i.e., a tree which minimises the maximum depth of its nodes and minimises the average depth of the nodes). All perfectly balanced trees are route-balanced, but the converse is not true. (A tweak to DSW permits the construction of perfectly balanced trees; Stout and Warren don’t see the point in preferring perfectly balanced over route-balanced, but you can still have constant space (additional) space and linear time.

Now of course we’re mutating a copy of the tree, so we’ve changed the overall algorithm to a linear space one. Oh well.

Changing the tree back to an immutable structure’s easy, if tedious: without black magic trickery like changing the class of the objects, we make a whole new copy of the tree:

    asImmutableTree
        "Return an immutable version of self."
        self children isEmpty ifTrue: [^ ZTree value: self value].

        ^ ZTree
            value: self value
            children: (self children collect: #asImmutableTree).

By this stage we’ve turned a constant space algorithm into an O(2n) one.

I briefly considered trying to rewrite DSW using a pair of zippers – one walking the original tree while the other built up the vine. Then I realised that (a) I was crazy, and (b) there was no point: you’re completely destroying the structure of the tree, so the new tree will never share any part of the old tree.

We can also approach the problem from a different angle: the special thing about a BST is that an in-order traversal of the tree yields a sorted enumeration. So we might say “let’s balance the tree by enumerating its contents and Constructing the Tree Right(tm).” Expressed another way, we might say “fold a tree into a list, and unfold the list into a tree.”

So on ZTree we can implement the tree->list transformation:

    asOrderedCollection
        ^ self collect: #yourself

    collect: aBlock
        ^ InOrderTraversal new traverse: self collecting: aBlock.

And, on SequenceableCollection the reverse, list->tree, transformation:

    asBalancedBinaryTree
        ^ self asBalancedBinaryTreeFrom: 1 to: self size.

    asBalancedBinaryTreeFrom: startIdx to: endIdx
        | mid |
        (startIdx > endIdx) ifTrue: [ ^ ZTree empty ].
        (startIdx = endIdx) ifTrue: [ ^ ZTree value: (self at: startIdx) ].
        
        mid := ((startIdx + endIdx) / 2) floor.
        ^ ZTree value: (self at: mid)
            children:
                {self asBalancedBinaryTreeFrom: startIdx to: mid - 1.
                self asBalancedBinaryTreeFrom: mid + 1 to: endIdx.}.

I’ve found it really convenient having the traversals separated out into separate objects. Of course, the traversals are implemented using zippers. Each kind – pre-, post- and in-order traversal – implement a core method, #collectNodes:using:, which walks the tree executing a block on each node. For instance, InOrderTraversal looks like this:

    collectNodes: aBlock using: aZipper
        "A naive depth-first traversal. It fails in the presence of cycles."
        | collected left move right |
        aZipper focus isEmpty
            ifTrue: [ collected := #() asOrderedCollection ]
            ifFalse: [ collected := OrderedCollection with: (aBlock value: aZipper focus) ].
                
        move := aZipper safeDown.
        
        left := (move first = #success)
            ifTrue: [ self collectNodes: aBlock using: move second ]
            ifFalse: [ #() asOrderedCollection ].
        
        right := ((move first = #success) and: [(move := move second safeRight) first = #success])
            ifTrue: [self collectNodes: aBlock using: move second]
            ifFalse: [ #() asOrderedCollection ].

        ^ left copy
            addAll: collected;
            addAll: right;
            yourself.

So what’s the point? We’re making a trade-off (as always). With a self-balancing tree like a red-black tree, we’re paying a small price every time we insert a node into the tree. Using the above approach we pay no cost (beyond walking the tree) for inserts, but every now and then (when we decide), we blow up the whole worldtree and set things to rights. Death by a thousand cuts, or Armageddon, you will always pay the piper.

Share

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*