Fun with Blocks

By: on May 30, 2012

We’ve already seen that Smalltalk has a very lightweight syntaxes for closures: [:x | x] for the identity function, for instance. We’ve seen them form an essential part of Smalltalk’s structure, allowing us to have all control structures part of the library rather than baked into the language. For what else might we use them?

One of the most essential properties of blocks is that of delayed evaluation. Statements wrapped in a block are compiled, but only execute when we ask the block for its value:

    | a b |
    a := 1.
    b := [{a := 2. 1}].
    a. " => 1"
    b value. " => #(2 1)"
    a. " => 2"

OK, so we have these things that contain Smalltalk code. If we have a decompiler and compiler available, maybe we can play around with that code. We need to be a bit careful though: [1. 2. 3] decompile printString = [3] decompile printString, showing that Squeak’s compiler does a bit of cleanup. Those first two statements are clearly useless, and are discarded. I should add an important caveat: not all Smalltalks have an online compiler handy – Gemstone, for instance – so the tricks we discuss herein may not apply to all Smalltalks.

Suppose we have a class whose instance variables will contain expensive calculations that we don’t wish to pay for right now. We also want to have a clean interface, and not want to explicitly have to force the evaluations. Thus, we’d like to prep the instance with some thunks, and memoise the results. First, a class definition. (Why, yes, the class category is a clue to a later post.)

Object subclass: #LazyClass
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Parsing-Derivatives'.

LazyClass class >> #new: aSymbol withFields: someSymbols in: aString
    | cls |
    cls := self
        newSubclassNamed: aSymbol
        withInstVars: someSymbols
        in: aString.
    someSymbols do: [:name | cls addAccessor: name].
    ^ cls.

LazyClass class >> newSubclassNamed: aSymbol withInstVars: someSymbols in: aString
    ^ LazyClass
        subclass: aSymbol
        instanceVariableNames: ((someSymbols collect: #asString)
            inject: '' into: [:acc :ea | acc , ' ' , ea])
        classVariableNames: ''
        poolDictionaries: ''
        category: aString.

LazyClass class >> addAccessor: aSymbol
    | src |
    "We really ought not to use a string here, but a proper parse
     tree. It's almost begging for something like a macro, isn't

    src := '{1}
    {1} isBlock ifTrue: [{1} := {1} value].
    ^ {1}.'
format: {aSymbol asString}.

    ^ self
        compile: src
        classified: 'accessing'
        notifying: nil.

That sets the scene. Note that #addAccessor: commits the cardinal sin of using strings to contain structured data. In a proper implementation, we’d work with a parse tree. At any rate, we can see that we have a very simple memoised accessor: the first time someone invokes the accessor, the object runs the block, and returns the value. Subsequent calls just return the value.

OK, so we have a means of creating new LazyClass subclasses: LazyClass new: #Union withFields: #(this that) in: 'Parsing-Derivatives'. Still, how do we create an instance, and how do we feed it thunks?

Let’s try do so with thunks. Assume that 1 and 2 are, in fact, rather expensive calculations. We’d like to say something short and pithy: Union with: [{1. 2.}]. Hm, that looks a bit weird. Well, remember how we saw earlier that Squeak’s compiler will remove junk statements like the first in [1. 2]. However, wrapped up in an Array like {1. 2.}, we can return multiple values. How could we make this work, though?

LazyClass >> #with: aMagicBlock
    "The magic block contains a statement containing an Array that we
     turn into thunks. Those thunks we then assign, in order, to our

    | thunks |
    thunks := aMagicBlock decompile statements first elements
        collect: [:stmt | (Compiler evaluate: (BlockNode withJust: stmt)) first].
    self class allInstVarNames with: thunks do: [:iv :thunk |
        self instVarNamed: iv put: thunk]

We take that Array-returning block, decompile it, rip out its (sole) statement. Then we wrap up each element in a BlockNode – that node in Smalltalk’s AST representing a block/closure, in can you didn’t already guess – and turn those ASTs into normal thunks. Note the magic #allInstVarNames and #instVarNamed:put:. I normally try avoid these kinds of methods: they’re powerful, they violate encapsulation, and in this case they provide a simple and generic way of doing what we want. And we’re done!

| a cls p |
cls := LazyClass new: #Pair withFields: #(first second) in: 'Pairs'.
a := #unassigned.
p := cls with: [{a := 1. 2.}].
a. "=> #unassigned"
p first. "=> 1"
a. "=> #unassigned"

Wait, what? #unassigned? What happened there? Well, remember we passed in that thunk [{a := 1. 2.}]? It contained a reference to a. Only, we decompiled the block, mutated it, and recompiled it. The new block now has a reference to something called “a”, but not to the top level variable. Ah, well, we’ve discovered a limitation to the technique: side effects like assignment will not, in general, be preserved. (And that’s probably a good thing.)


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>