Over on the morning paper, Adrian’s recently covered a number of papers looking at trying to detect bugs in code using slightly unusual means (i.e. not the usual combination of lots of buggy tests and lots of static checks). So that’s been on my mind lately, at least when it gets a chance in between the collapse of civilisation.
I’m a very big proponent of the actor paradigm. Given the choice, I would write everything using actors. Pretty much regardless of the language I’m using, I try to shoehorn in actors one way or another. Actors have many advantages; possibly most important is the principle an actor has its own private state which can’t be seen by other actors and so there is only ever one writer of that state. You share data by sending messages between actors but once you have sent a piece of information you can no longer modify it, so it’s not really sharing. So in essence each actor is a little server, waiting for messages to arrive, doing some processing of that message in combination with its current state, and then maybe the state gets modified, and maybe some other messages get sent out too. Because memory isn’t shared between actors (at least at the language level; implementations are free to be as crazy/fancy as they want), you see very very few traditional locks. (Yes, the actual message queue between actors is a shared data structure, but provided your language gives you a thread-safe multiple-publisher queue, you can just rely on that, and so it’s something that’s provided to you rather than something you have to write and manage yourself.)
I was recently writing some Java and wrote myself a deadlock without realising it which annoyed me because I didn’t see it coming. In actor languages, yes, you can deadlock, but I find it easier to spot ahead of time that I’m about to block in actor A waiting for a message from actor B and I know that in some scenarios actor B can block waiting for a message from A. That’s probably because I tend to have say, a dozen distinct types of actors (but perhaps many thousands of each type) and so it’s fairly easy to keep in your head the general message flows that are going on.
So what’s the difference with Java? Why did I not see this coming? Well I’m sure a good chunk of the reason is practice: 10 years ago I was writing a lot of multi-threaded Java and was probably better at it then than I am now. But with the above papers sort-of in mind I was wondering whether there was any other reason? I think I’ve come to the conclusion that it’s all to do with call-stack depth. In Java (and all languages where you tend to have big frameworks and no actors), your call-stacks get very deep. Whilst we like to pretend that we test and mock and keep classes encapsulating state and so forth, the truth is it’s very easy to lose track of what’s calling what, especially when you have callbacks going on and other similar opaque control-flow mechanisms. It’s often only when you break out the debugger that you realise that the same thread that did X is now further down that call stack doing Y and that’s causing your deadlock (for example, it’s blocked waiting for some state to change, but that state can only be changed by the return path back up the stack). Trying to work with some of the modern frameworks in Java, I frequently find myself searching through the docs trying to answer questions about which thread is calling what.
Now I’m guessing here: due to the previously mentioned apocalypse, I’ve not had time to gather any stats on this, but my guess is that the stack depth of code written using actors is much shallower, which makes the code easier to reason about. This “making the code easier to reason about” claim is obviously vague, but if you think of the call stack as a path down a tree and the code is going to make some sort of descent, traversal across and then ascent of the tree, the bigger and more complex the tree is, the more likely it is you’re going to forget something important about where in the tree it’s previously been. So why is it different in actor languages? Well I think it’s because in order to gain performance (and generally doing “good architecture”) you normally have more actors so you tend to set up something which resembles a pipeline. That in itself encourages you do less work on each message in each actor and so you finish your bit of processing, send off the results to the next stage and then unwind your stack up to the base actor loop, awaiting the next message. Now, I’m certainly not claiming this is a unique property of actors: you could definitely use this principle all over the place without calling anything an actor.
Can we measure this? Well, Cyclomatic complexity is:
a software metric (measurement), used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program’s source code. It was developed by Thomas J. McCabe, Sr. in 1976. Cyclomatic complexity is computed using the control flow graph of the program: the nodes of the graph correspond to indivisible groups of commands of a program, and a directed edge connects two nodes if the second command might be executed immediately after the first command. Cyclomatic complexity may also be applied to individual functions, modules, methods or classes within a program.
so that seems related to what I’m on about, but equally:
However the correlation between cyclomatic complexity and program size has been demonstrated many times and since program size is not a controllable feature of commercial software, the usefulness of McCabes’s number has been called to question.
The essence of this observation is that larger programs (more complex programs as defined by McCabe’s metric) tend to have more defects
So maybe it’s more subtle. If a single function is very complex (and has good reason to be) then at least it’s hopefully all on the screen in front of you and you can stare at it and prove properties of it and soak test it to death. But when the complexity is an emergent property of an amalgamation of many modules across many files, it’s harder to see (both in front of your eyes, and mentally) transitively what is calling what, and it’s that invisible complexity that is so hard to cope with and predict. So if there are any analyses that look at whether or not actor programs tend to have shorter call stacks, and any analyses that look at the effect of call stack size (especially calls across files/modules/classes) on defect rates, I’d be very curious to read them.