The algebraic origins of fold and unfold

By: on June 5, 2006

A while back, an email from a collaborator proposed a coalgebraic view
of the grammar for a new process calculus. This prompted me to do
some reading up on algebras versus coalgebras – something I had been
meaning to do for a long while.

Luckily, there is a very good [tutorial by Bart Jacobs and Jan
Rutten][tutorial]. At 38 pages, it’s a bit intimidating but it’s very
accessible for the practicing computer scientist (essentially, exactly
what it says in its introduction). The examples are helpful but it
still deserves several read-throughs to really get the most out of it
— each time I’ve scanned through it, I’ve found something new!

So what’s it about? For a few decades now, computer scientists have
been familiar with tools such as [BNF][bnf] for defining language
grammars, and

1. defining functions over such grammars by cases, and
2. proving their properties by structural induction.

Viewed algebraicly, the BNF grammar induces a functor `T`, for which
there exists an initial algebra `a: T X –> X`. [Initiality][initial]
of `a` gives us the `fold` function, which allows us to define
functions *by cases*, as well as the proof-by-induction proof
principle. These algebraic concepts are, in general, better known
than the coalgebraic ones.

In this article, I’d like to share with you, a concrete illustration
of the first of the two points above, namely `fold`, and its dual
`unfold`.

Let’s turn to a concrete example: lists over `int`. In this case, we
have the grammar

list ::= nil | cons(n, list)

This induces a functor `T` defined by `T X = 1 + int * X`, where `1`
denotes the singleton set, `+` denotes disjoint union and `*` denotes
cartesian product.

An algebra for this `T` (also called a `T`-algebra) is a carrier set
`X` with function `a: 1 + int * X –> X`. For the remainder of this
example, for convenience, we’ll work with the equivalent *pair* of
functions of types `1 –> X` and `int * X –> X`.

The functional programmers amongst you will instantly recognise these
type signatures and think of the algebra `X = list`, the type of
integer lists, with functions `nil: 1 –> X` and `cons: int * X –>
X`. In fact, `list`, `nil` and `cons` make up the initial algebra for
this `T`.

Let’s consider another algebra `int`, `knil`, `kons`:

knil: 1 –> int
kons: int * int –> int

Since `nil`, `cons` is an initial algebra, and `knil`, `kons` is also
an algebra, by initiality, there is a unique function `f` of type
`list –> int` which makes the following diagram commute.

T f
T list = 1 + int * list ————–> 1 + int * int = T int
| |
| |
| nil,cons | knil,kons
| |
| |
V V
list ———————> int
f

This `f` is usually called `fold knil kons`. (This `fold` is actually
`fold-right` rather than `fold-left`.)

Now if it so happens that

knil = 0
kons x y = x + y

then `fold knil kons` is precisely what you’d expect, namely the *list
sum* function of type `list –> int`. Initiality allows us to define
*list sum* by cases, aka structural recursion: we give the nil case
and the cons case.

A coalgebra is a carrier set `X` with a function `c` of type `X –> 1
+ int * X`, i.e. `X –> T X` (note that the `T` has moved to the other
side of the arrow). Function `c` takes an element of `X` and returns
either something in the left of the disjoint union (i.e. the element
in `1`) or something in the right (i.e. a pair in `int * X`). One
intuition is to think of `c` as a destructor — it takes apart
elements from `X`.

The function `c` can equivalently be considered as three functions `p:
X –> bool`, `f: X –> int` and `g: X –> X` related to `c` as
follows: if `p x` is true then `c` maps `x` to the element in the left
of the disjoint union; if `p x` is false then `c` maps `x` to the pair
`(f x, g x)` in the right. We assume without loss of generality that
`p`,`f`,`g` is equivalent to `c`.

The final coalgebra (which is the dual of initial algebra) for our
example is the type of possibly infinite lists, `stream`, and the
triple `isnil?`,`head`,`tail`:

isnil?: stream –> bool
head: stream –> int
tail: stream –> stream

Now consider `nat`, the type of natural numbers, as a coalgebra with
functions `iszero?`, `odd`, `rshift`:

iszero?: nat –> bool
odd: nat –> int
rshift: nat –> nat

Finality of `isnil?`, `head` and `tail` means there is a unique
function `f` of type `nat –> stream` which makes the following
diagram commute.

f
nat ———————> stream
| |
| |
| iszero?,odd,rshift | isnil?,head,tail
| |
| |
V V
T nat = 1 + int * nat ————> 1 + int * stream = T stream
T f

The function denoted by the top horizontal arrow in the commuting
square above is usually called `unfold iszero? odd rshift`.

Now if it so happens that `odd` returns `1` on odd inputs and `0`
otherwise, and `rshift` returns the binary right shift of its input,
then `unfold iszero? odd rshift` is the function which takes a natural
number and returns its binary digits in reverse (but note that `0`
gives the empty list). (Compare this example to Tony’s [base
conversion code][baseconv].)

So wrapping up, we’ve illustrated the origins of `fold` and `unfold`
for integer lists from the algebraic perspective. It is possible to
generalise this example and thus derive `fold` and `unfold` for other
recursive data types such as binary trees.

[tutorial]: http://www.cs.ru.nl/B.Jacobs/PAPERS/JR.pdf “A Tutorial on (Co)Algebras and (Co)Induction”

[bnf]: http://en.wikipedia.org/wiki/Backus-Naur_form “Backus-Naur form”
[initial]: http://en.wikipedia.org/wiki/Initial_object “Initial Object”

[baseconv]: https://www.lshift.net/blog/2006/03/08/base-conversions-in-scheme “Base conversions in scheme”

Share

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*